Sequence Model - Sequence Models & Attention Mechanism

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

use CNN(AlexNet) first to get a 4096-dimensional vector, feed it to a RNN

Picking the Most Likely Sentence

translate a French sentence \(x\) to the most likely English sentence \(y\) .

it's to find

\[\argmax_{y^{<1>}, \dots, y^{<T_y>}} P(y^{<1>}, \dots, y^{<T_y>} | x) \]

Why not a greedy search?

(Find the most likely words one by one) Because it may be verbose and long.

Beam Search

set the \(B = 3 \text{(beam width)}\), find \(3\) most likely English outputs
consider each for the most likely second word, and then find \(B\) most likely words
do it again until \(<EOS>\)

if \(B = 1\), it's just greedy search.

Length normalization

\[\argmax_{y} \prod_{t = 1}^{T_y} P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

\(P\) is much less than \(1\) (close to \(0\)) take \(\log\)

\[\argmax_{y} \sum_{t = 1}^{T_y} \log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

it tends to give the short sentences.

So you can normalize it (\(\alpha\) is a hyperparameter)

\[\argmax_{y} \frac 1 {T_y^{\alpha}} \sum_{t = 1}^{T_y} \log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

Beam search discussion

large \(B\) : better result, slower
small \(B\) : worse result, faster

Error Analysis in Beam Search

let \(y^*\) be human high quality translation, and \(\hat y\) be algorithm output.

\(P(y^* | x) > P(\hat y | x)\) : Beam search is at fault
\(P(y^* | x) \le P(\hat y | x)\) : RNN model is at fault

Bleu(bilingual evaluation understudy) Score

if you have some good referrences to evaluate the score.

\[p_n = \frac{\sum_{\text{n-grams} \in \hat y} \text{Count}_{\text{clip}}(\text{n-grams})} {\sum_{\text{n-grams} \in \hat y} \text{Count}(\text{n-grams})} \]

Bleu details

calculate it with \(\exp(\frac{1}{4} \sum_{n = 1}^4 p_n)\)

BP = brevity penalty

\[BP = \begin{cases} 1 & \text{if~~MT\_output\_length > reference\_output\_length}\\ \exp(1 - \text{reference\_output\_length / MT\_output\_length}) & \text{otherwise} \end{cases} \]

don't want short translation.

Attention Model Intuition

it's hard for network to memorize the whole sentence.

compute the attention weight to predict the word from the context

Attention Model

Use a BiRNN or BiLSTM.

\[\begin{aligned} a^{<t'>} &= (\vec a^{<t'>}, \overleftarrow a^{<t'>})\\ \sum_{t'} \alpha^{<i, t'>} &= 1\\ c^{<i>} &= \sum_{t'} \alpha^{<i, t'>} \alpha^{<t'>} \end{aligned} \]

Computing attention

\[\begin{aligned} \alpha^{<t, t'>} &= \text{amount of "attention" } y^{<t>} \text{ should pay to } a^{<t'>}\\ &= \frac{\exp(e^{<t, t'>})}{\sum_{t' = 1}^{T_x} \exp(e^{<t, t'>})} \end{aligned} \]

train a very small network to learn what the function is

the complexity is \(\mathcal O(T_x T_y)\) , which is so big (quadratic cost)

Speech Recognition - Audio Data

Speech recognition

\(x(\text{audio clip}) \to y(\text{transcript})\)

Attention model for sppech recognition

generate character by character

CTC cost for speech recognition

CTC(Connectionist temporal classification)

"ttt_h_eee___ ____qqq\(\dots\)" \(\rightarrow\) "the quick brown fox"

Basic rule: collapse repeated characters not separated by "blank"

Trigger Word Detection

label the trigger word, let the output be \(1\)s

posted @ 2021-08-23 23:21 zjp_shadow 阅读(143) 评论(0) 收藏举报

刷新页面返回顶部

zjp_shadow

世界是个回音谷，念念不忘，必有回响，你大声喊唱，山谷雷鸣，音传千里，一叠一叠，一浪一浪，彼岸世界都收到了。凡事念念不忘，必有回响。因为它在传递你心间的声音，绵绵不绝，遂相印于心。

Sequence Model - Sequence Models & Attention Mechanism

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

Picking the Most Likely Sentence

Beam Search

Length normalization

Beam search discussion

Error Analysis in Beam Search

Bleu(bilingual evaluation understudy) Score

Bleu details

Attention Model Intuition

Attention Model

Computing attention

Speech Recognition - Audio Data

Speech recognition

Attention model for sppech recognition

CTC cost for speech recognition

Trigger Word Detection

公告

zjp_shadow

世界是个回音谷，念念不忘，必有回响，你大声喊唱，山谷雷鸣，音传千里，一叠一叠，一浪一浪，彼岸世界都收到了。凡事念念不忘，必有回响。因为它在传递你心间的声音，绵绵不绝，遂相印于心。

Sequence Model - Sequence Models & Attention Mechanism

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

Picking the Most Likely Sentence

Beam Search

Refinements to beam search

Length normalization

Beam search discussion

Error Analysis in Beam Search

Bleu(bilingual evaluation understudy) Score

Bleu details

Attention Model Intuition

Attention Model

Computing attention

Speech Recognition - Audio Data

Speech recognition

Attention model for sppech recognition

CTC cost for speech recognition

Trigger Word Detection

公告