$\color{black}\rule{365px}{3px}$Perplexity is a metric used to evaluate the quality/performance of a language model. It measures how well a model predicts a sequence of words.

Intuition


Perplexity tells us, on average, how "perplexed" or uncertain the model is when predicting each token in the sequence. It can be interpreted as the "average branching factor"(The average number of samples that the model’s wavering between on average) the model has to choose from at each step. For example:

Definition


$$ PP(W) = {P(w_1,w_2,...w_N)}^{-\frac{1}{N}} = \prod_{t=1}^{N}{P(w_{t+1}|w_1, ..., w_{t})} $$

$$ \begin{align*}PP(W) = {P(w_1,w_2,...w_N)}^{-\frac{1}{N}} &= \text{P(Getting The Correct Sequence)}^{-\frac{1}{N}} \\ &= \left(\frac{1}{\text{P(Getting The Correct Sequence)}}\right)^{\frac{1}{N}} \end{align*} $$

<aside> 💡

Why take reciprocal and geometric mean?


Connection to Equally Likely Options:

Scaling to Sequences:

<aside> <img src="/icons/bookmark-outline_lightgray.svg" alt="/icons/bookmark-outline_lightgray.svg" width="40px" />

Geometric Mean (Power of $\frac{1}{N})$


$N= 2 :$

image.png

$$ \text{Geometric Mean} = (2\times18)^{\frac{1}{2}} = 6 $$

$N=3:$

image.png

$$ \text{Geometric Mean} = (10\times51.2 \times 8)^{\frac{1}{3}} = 16 $$

<aside> <img src="/icons/light-bulb_lightgray.svg" alt="/icons/light-bulb_lightgray.svg" width="40px" />

So, Geometric Mean in perplexity is basically to find the average(normalized) probability at each time-stamp of the sequence the model is predicting.

</aside>

</aside>

Relation to Cross-Entropy Loss