$\color{black}\rule{365px}{3px}$

The BLEU score (Bilingual Evaluation Understudy) is a widely used metric to evaluate the quality of machine translation or other text generation tasks.

The BLEU score measures how similar a machine-generated translation (or text) is to a reference (human-generated) translation. It’s essentially asking:

A higher BLEU score means the machine translation is closer to the human reference.

N-Gram Precision


At its core, BLEU computes the precision of n-grams (sequences of nn consecutive words) between the generated text and the reference text.

<aside> <img src="/icons/bookmark-outline_lightgray.svg" alt="/icons/bookmark-outline_lightgray.svg" width="40px" />

What is Precision of N-grams (or N-gram Precision):


$$ \text{Precision}_n=\frac{\text{Number of matching n-grams}}{\text{Total n-grams in the candidate}} $$

So, if we tokenize by spaces,

Example


1-gram Precision (Individual Words)

Step 1: Extract 1-grams (individual words)

Step 2: Count matching 1-grams

Matching words: ["I", "you", "much"] (3 matches)

Step 3: Calculate 1-gram precision

$$ 1-\text{gram Precision}=\frac{\text{Number of matching 1-grams}}{\text{Total 1-grams in candidate}}=\frac{3}{5}=0.6 $$

1-gram Precision=Total 1-grams in candidateNumber of matching 1-grams​=53​=0.6


2. 2-gram Precision (Word Pairs)

Step 1: Extract 2-grams (consecutive word pairs)

Step 2: Count matching 2-grams

Matching 2-grams: None (0 matches).

Step 3: Calculate 2-gram precision

$$ 2-\text{gram Precision}=\frac{0}{4}=0 $$


3. 3-gram Precision (Three Consecutive Words)

Step 1: Extract 3-grams (consecutive triplets)

Step 2: Count matching 3-grams

Matching 3-grams: None (0 matches).

Step 3: Calculate 3-gram precision

$$ 3-\text{gram Precision}=\frac{0}{3}=0 $$


Summary Table of Precision:

N-gram Matches Total Candidate N-grams Precision
1-gram 3 5 0.6
2-gram 0 4 0.0
3-gram 0 3 0.0
</aside>

Why n-grams?


Because a good translation isn’t just about getting the right words; it’s also about getting the right phrases in the right order.

Example:

Reference Sentence: "The cat is sitting on the mat."

Candidate 1: "The cat is on the mat."

Candidate 2: "The mat is sitting on the cat."