N- Gram
N - Gram
Objective
In order to build a probabilistic language model, we need to assign a probability to each sentence.
For instance, for a sequence of words \(w=w_1w_2w_3...w_k\), we want to compute
Chain rule
According to chain rule, we have:
However the last term is difficult to compute, since the suffix \(w_1...w_{k-1}\)is to long.
Markov assumption
Then we have Markov assumption:
It means the probability of word \(w_i\) only depends on the \(n_1\) words before it.
Start & End token: bigram language model
For n = 2 we have:
However this is not enough, since some words have a high probability to begin the sentence while others do not. Then we change the first term to:
Similarly for the words who end the sentence, we need to add an end token :
PS: It is different for the start and the end, because we know when the sentence begins, but we do not know when it ends.
Summary: n-gram language model
n = 1, unigram model
n = 2, bigram model
n gram
Compute the probability: Bayes Theorem:
For bigram:
For n-gram
Usage of N-gram:
- google search suggestions
- Input method suggestions
浙公网安备 33010602011771号