门控循环单元(GRU)

标题: 门控循环单元(GRU)
作者: MelodicTechno
创建于: 2024-09-03 22:53:25
更新于: 2025-07-22 13:19:35
链接: https://melodictechno.github.io./2024/09/03/gru/
版权声明: 本文章采用 <a class="license" target="_blank" rel="noopener" href="https://creativecommons.org/licenses/by-nc-sa/4.0">CC BY-NC-SA 4.0 进行许可。

单元

RNN Encoder & Decoder

This is the rnn encoder & decoder.

the hidden unit

z: update gate, selects whether the hidden state would be updated by ; r: reset gate, decides whether the previous hidden state would be ignored

The hidden unit managed to drop information that is irrelevant

SMT: statistical machine learning

the difference between rnn encoder decoder and Zou’s work is that, rnn ignores the order of the words.

evaluate on WMT’14
dataset includes: Europarl, news commentary, UN, two crawled corpora of 90M and 780M words.
All the word counts refer to French words after tokenization
one should focus on the most relevant subset of the data for a given task.
selected a subset of 418M words out of more than 2G words for language modeling and a subset of 348M out of 850M words for training the RNN Encoder–Decoder
test set newstest2012 and 2013 for data selection and weight tuning with MERT, andnewstest2014 as our test set.
The baseline phrase-based SMT system was built using Moses with default settings.
All the weight parameters in the RNN Encoder– Decoder were initialized by sampling from an isotropic zero-mean (white) Gaussian distribution with its standard deviation fixed to 0.01, except for the recurrent weight parameters.
We used Adadelta and stochastic gradient descent to train the RNN Encoder–Decoder with hyperparameters and .
All the weight parameters were initialized uniformly between −0.01 and 0.01, and the model was trained until the validation perplexity did not improve for 10 epochs.