N- Gram

N - Gram

Objective

In order to build a probabilistic language model, we need to assign a probability to each sentence.

For instance, for a sequence of words \(w=w_1w_2w_3...w_k\), we want to compute

\[p(w)=p(w_1w_2w_3...w_k) \]

Chain rule

According to chain rule, we have:

\[p(w) = p(w_1)p(w_2|w_1)p(w_3|w_1w_2)...p(w_k|w_1...w_{k-1}) \]

However the last term is difficult to compute, since the suffix \(w_1...w_{k-1}\)is to long.

Markov assumption

Then we have Markov assumption:

\[p(w_i|w_1...w_{i-1})= p(w_i|w_{i-n+1}...w_{i-1}) \]

It means the probability of word \(w_i\) only depends on the \(n_1\) words before it.

Start & End token: bigram language model

For n = 2 we have:

\[p(w) = p(w_1)p(w_2|w_1)p(w_3|w_2)...p(w_k|w_{k-1}) \]

However this is not enough, since some words have a high probability to begin the sentence while others do not. Then we change the first term to:

\[p(w) = p(w_1|start)p(w_2|w_1)p(w_3|w_2)...p(w_k|w_{k-1}) \]

Similarly for the words who end the sentence, we need to add an end token :

\[p(w) = p(w_1|start)p(w_2|w_1)p(w_3|w_2)...p(w_k|w_{k-1})p(end|w_k) \]

PS: It is different for the start and the end, because we know when the sentence begins, but we do not know when it ends.

Summary: n-gram language model

n = 1, unigram model

\[p(w)=\prod^{k}_{i=1}p(w_i) \]

n = 2, bigram model

\[p(w)=\prod^{k}_{i=1}p(w_i|w_{i-1}) \]

n gram

\[p(w)=\prod^{k}_{i=1}p(w_i|w_{i-n+1}...w_{i-1}) \]

Compute the probability: Bayes Theorem:

For bigram:

\[p(w_i|w_{i-1}) = \frac{c(w_{i-1}w_i)}{\sum_{w_k}c(w_{i-1}w_k)} = \frac{c(w_{i-1}w_i)}{c(w_{i-1})} \]

For n-gram

\[p(w_i|w_{i-n+1}...w_{i-1}) = \frac{c(w_{i-n+1}...w_{i-1}w_i)}{c(w_{i-n+1}...w_{i-1}) } \]

Usage of N-gram:

  • google search suggestions
  • Input method suggestions

posted on 2019-08-23 16:54  Cirual  阅读(110)  评论(0)    收藏  举报