# PyTorch实现Seq2Seq机器翻译

## Seq2Seq简介

Seq2Seq由Encoder和Decoder组成，Encoder和Decoder又由RNN构成。Encoder负责将输入编码为一个向量。Decoder根据这个向量，和上一个时间步的预测结果作为输入，预测我们需要的内容。

Seq2Seq在训练阶段和预测阶段稍有差异。如果Decoder第一个预测预测的输出就错了，它会导致“蝴蝶效应“，影响后面全部内容。为了解决这个问题，在训练时，Decoder每个时间步的输入不全是上一个时间步的输出，而以一定的概率选择真实值作为输入。

## Attention机制

She doesn't like soccer.

$$score(h_t,\bar h_s)= \begin{cases} h_t^T\bar h_s & \text{dot} \\ h_t^TW_a\bar h_s & \text{general} \\ v_a^T\tanh (W_a[h_t;\bar h_s]) & \text{concat} \end{cases}$$

a_t(s)=align(h_t, \bar h_s)=\frac {exp(score(h_t, \bar h_s))}{\sum_{s'} exp(score(h_t, \bar h_{s'}))}

$$c_t=\sum_s a_t(s) \cdot \bar h_s$$

$$\tilde h_t = \tanh (W_c[c_t;h_t])$$

$$y = \text{softmax}(W_s\tilde h_t)$$

## 部分代码

https://gitee.com/dogecheng/python/blob/master/pytorch/Seq2SeqForTranslation.ipynb

class Decoder(nn.Module):
def forward(self, token_inputs, last_hidden, encoder_outputs):
...
# encoder_outputs = [input_lengths, batch, hid_dim * n directions]
attn_weights = self.attn(gru_output, encoder_outputs)
# attn_weights = [batch, 1, sql_len]
context = attn_weights.bmm(encoder_outputs.transpose(0, 1))
# [batch, 1, hid_dim * n directions]

gru_output = gru_output.squeeze(0) # [batch, n_directions * hid_dim]
context = context.squeeze(1)       # [batch, n_directions * hid_dim]
concat_input = torch.cat((gru_output, context), 1)  # [batch, n_directions * hid_dim * 2]
concat_output = torch.tanh(self.concat(concat_input))  # [batch, n_directions*hid_dim]
output = self.out(concat_output) # [batch, output_dim]
output = self.softmax(output)
...

if self.predict:
"""
预测代码
"""
...

else:
max_target_length = max(target_lengths)
all_decoder_outputs = torch.zeros((max_target_length, batch_size, self.decoder.output_dim), device=self.device)

for t in range(max_target_length):
use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False
if use_teacher_forcing:
# decoder_output = [batch, output_dim]
# decoder_hidden = [n_layers*n_directions, batch, hid_dim]
decoder_output, decoder_hidden, decoder_attn = self.decoder(
decoder_input, decoder_hidden, encoder_outputs
)
all_decoder_outputs[t] = decoder_output
decoder_input = target_batches[t]  # 下一个输入来自训练数据
else:
decoder_output, decoder_hidden, decoder_attn = self.decoder(
decoder_input, decoder_hidden, encoder_outputs
)
# [batch, 1]
topv, topi = decoder_output.topk(1)
all_decoder_outputs[t] = decoder_output
decoder_input = topi.squeeze(1).detach()  # 下一个输入来自模型预测

loss_fn = nn.NLLLoss(ignore_index=PAD_token)
loss = loss_fn(
all_decoder_outputs.reshape(-1, self.decoder.output_dim),  # [batch*seq_len, output_dim]
target_batches.reshape(-1)               # [batch*seq_len]
)

Seq2Seq在预测阶段每次只输入一个样本，输出其翻译结果，对应forward()函数中的内容如下，当Decoder输出终止符或输出长度达到所设定的阈值时便停止。

class Seq2Seq(nn.Module):
...
def forward(self, input_batches, input_lengths, target_batches=None, target_lengths=None, teacher_forcing_ratio=0.5):
...
if self.predict:
# 一次只输入一句话
assert batch_size == 1, "batch_size of predict phase must be 1!"
output_tokens = []

while True:
decoder_output, decoder_hidden, decoder_attn = self.decoder(
decoder_input, decoder_hidden, encoder_outputs
)
# [1, 1]
topv, topi = decoder_output.topk(1)
decoder_input = topi.squeeze(1).detach()
output_token = topi.squeeze().detach().item()
if output_token == EOS_token or len(output_tokens) == self.max_len:
break
output_tokens.append(output_token)
return output_tokens

else:
"""
训练代码
"""
...

## 参考资料

NLP FROM SCRATCH: TRANSLATION WITH A SEQUENCE TO SEQUENCE NETWORK AND ATTENTION

DEPLOYING A SEQ2SEQ MODEL WITH TORCHSCRIPT

Practical PyTorch: Translation with a Sequence to Sequence Network and Attention

1 - Sequence to Sequence Learning with Neural Networks

posted @ 2020-05-10 20:50  那少年和狗  阅读(4119)  评论(0编辑  收藏  举报