机器翻译评价指标——BLEU

BLEU (BiLingual Evaluation Understudy) 是机器翻译任务的评价指标。BLEU根据n-gram的不同分为:\(\text{BLEU}_{1}\)\(\text{BLEU}_{2}\)\(\text{BLEU}_{3}\)\(\text{BLEU}_{4}\)

1.1 BLEU 数学公式

  • Step 1:计算 precision

\[p_{n} = \frac{\sum_{C \in \text{Candidate}} \sum_{\text{n-gram} \in C} \text{Count}_{\text{clip}}(\text{n-gram})}{\sum_{C \in \text{Candidate}} \sum_{\text{n-gram}^\prime \in C^\prime} \text{Count}(\text{n-gram}^\prime)} \]

其中,\(\text{Count}_{\text{clip}}(\text{n-gram})\) 表示既在候选的译文中又在参考译文中的 \(\text{n-gram}\)

分子:模型翻译的句子中出现在标准译文中的n-gram个数
分母:模型翻译的句子中所有的n-gram个数

  • Step 2:计算 BP (Brevity Penalty)

\[\left.\mathbf{BP}=\left\{\begin{array}{ll}1&\quad\mathrm{if~}&c>r\\e^{(1-r/c)}&\quad\mathrm{if~}&c\leq r\end{array}\right.\right. \]

其中,\(c\) 是候选翻译的长度;\(r\) 是有效参考语料的长度

  • Step 3:计算 BLEU

\[\text{ВLЕU}_N=\text{BP}\cdot\exp\left(\sum_{n=1}^Nw_n\log p_n\right) \]

1.2 BLEU 代码实现

import math
from collections import Counter


def cal_precision(reference, candidate, n):
    candidate_ngrams = [tuple(candidate[i:i + n])
                        for i in range(len(candidate) - n + 1)]
    reference_ngrams = [tuple(reference[i:i + n])
                        for i in range(len(reference) - n + 1)]

    candidate_ngram_counts = Counter(candidate_ngrams)
    reference_ngram_counts = Counter(reference_ngrams)

    # Count the number of n-grams that appear in both candidate and reference
    overlap_ngrams = sum(
        min(candidate_ngram_counts[ngram], reference_ngram_counts[ngram])
        for ngram in candidate_ngram_counts
    )

    return overlap_ngrams / len(candidate_ngrams)


def cal_bleu(reference, candidate, max_n=4):
    if len(candidate) == 0:
        return 0.0

    brevity_penalty = 1 if len(candidate) > len(reference) else math.exp(1 - (len(reference) / len(candidate)))

    term = math.exp(sum(1 / n * math.log(cal_precision(reference, candidate, n)) for n in range(1, max_n + 1)))
    
    bleu = brevity_penalty * term

    return bleu


if __name__ == "__main__":
    # 定义参考翻译和预测翻译
    reference = ['this', 'is', 'a', 'test']
    candidate = ['this', 'is', 'a', 'test', 'too']

    # 计算BLEU分数
    bleu_score = cal_bleu(reference, candidate, max_n=4)

    print(f'BLEU Score: {bleu_score}')
posted @ 2024-05-29 23:56  RenjieW  阅读(326)  评论(0)    收藏  举报