$\color{black}\rule{365px}{3px}$
The BLEU score (Bilingual Evaluation Understudy) is a widely used metric to evaluate the quality of machine translation or other text generation tasks.
The BLEU score measures how similar a machine-generated translation (or text) is to a reference (human-generated) translation. It’s essentially asking:
A higher BLEU score means the machine translation is closer to the human reference.
At its core, BLEU computes the precision of n-grams (sequences of nn consecutive words) between the generated text and the reference text.
<aside> <img src="/icons/bookmark-outline_lightgray.svg" alt="/icons/bookmark-outline_lightgray.svg" width="40px" />
What is Precision of N-grams (or N-gram Precision):
$$ \text{Precision}_n=\frac{\text{Number of matching n-grams}}{\text{Total n-grams in the candidate}} $$
So, if we tokenize by spaces,
Example
Step 1: Extract 1-grams (individual words)
["I", "love", "you", "so", "much"]["I", "hate", "you", "not", "much"]Step 2: Count matching 1-grams
Matching words: ["I", "you", "much"] (3 matches)
Step 3: Calculate 1-gram precision
$$ 1-\text{gram Precision}=\frac{\text{Number of matching 1-grams}}{\text{Total 1-grams in candidate}}=\frac{3}{5}=0.6 $$
1-gram Precision=Total 1-grams in candidateNumber of matching 1-grams=53=0.6
Step 1: Extract 2-grams (consecutive word pairs)
["I love", "love you", "you so", "so much"]["I hate", "hate you", "you not", "not much"]Step 2: Count matching 2-grams
Matching 2-grams: None (0 matches).
Step 3: Calculate 2-gram precision
$$ 2-\text{gram Precision}=\frac{0}{4}=0 $$
Step 1: Extract 3-grams (consecutive triplets)
["I love you", "love you so", "you so much"]["I hate you", "hate you not", "you not much"]Step 2: Count matching 3-grams
Matching 3-grams: None (0 matches).
Step 3: Calculate 3-gram precision
$$ 3-\text{gram Precision}=\frac{0}{3}=0 $$
| N-gram | Matches | Total Candidate N-grams | Precision |
|---|---|---|---|
| 1-gram | 3 | 5 | 0.6 |
| 2-gram | 0 | 4 | 0.0 |
| 3-gram | 0 | 3 | 0.0 |
| </aside> |
Why n-grams?
Because a good translation isn’t just about getting the right words; it’s also about getting the right phrases in the right order.
Example:
Reference Sentence: "The cat is sitting on the mat."
Candidate 1: "The cat is on the mat."
Candidate 2: "The mat is sitting on the cat."