$\color{black}\rule{365px}{3px}$Perplexity is a metric used to evaluate the quality/performance of a language model. It measures how well a model predicts a sequence of words.
Perplexity tells us, on average, how "perplexed" or uncertain the model is when predicting each token in the sequence. It can be interpreted as the "average branching factor"(The average number of samples that the model’s wavering between on average) the model has to choose from at each step. For example:
$$ PP(W) = {P(w_1,w_2,...w_N)}^{-\frac{1}{N}} = \prod_{t=1}^{N}{P(w_{t+1}|w_1, ..., w_{t})} $$
$$ \begin{align*}PP(W) = {P(w_1,w_2,...w_N)}^{-\frac{1}{N}} &= \text{P(Getting The Correct Sequence)}^{-\frac{1}{N}} \\ &= \left(\frac{1}{\text{P(Getting The Correct Sequence)}}\right)^{\frac{1}{N}} \end{align*} $$
<aside> 💡
Why take reciprocal and geometric mean?
Connection to Equally Likely Options:
Suppose the model assigns a probability $P=0.2$ to the correct token.
The reciprocal, $\frac{1}{P}=5$, can be interpreted as:
"The model's uncertainty is equivalent to having 5 equally likely options for the next token."
This makes sense because if each option were equally likely, the probability of any one token would be $1/5=0.2$.
Scaling to Sequences:
<aside> <img src="/icons/bookmark-outline_lightgray.svg" alt="/icons/bookmark-outline_lightgray.svg" width="40px" />
Geometric Mean (Power of $\frac{1}{N})$
$N= 2 :$

$$ \text{Geometric Mean} = (2\times18)^{\frac{1}{2}} = 6 $$
$N=3:$

$$ \text{Geometric Mean} = (10\times51.2 \times 8)^{\frac{1}{3}} = 16 $$
<aside> <img src="/icons/light-bulb_lightgray.svg" alt="/icons/light-bulb_lightgray.svg" width="40px" />
So, Geometric Mean in perplexity is basically to find the average(normalized) probability at each time-stamp of the sequence the model is predicting.
</aside>
</aside>