Bias & Variance

$\color{black}\rule{365px}{3px}$

Very good reading about stats behind this:

https://ds100.org/course-notes-su23/probability_2/probability_2.html

Observation Variance ($\epsilon$)

Observation variance refers to the inherent variability in the data itself. Even if you had a perfect model, the observations (data points) you collect may vary due to noise, measurement errors, or randomness in the process you are modeling. High observation variance means that even repeated measurements under the same conditions can yield different results.

Observation variance is the variance $\sigma^2$ in the data itself. It is present in the observed data distribution:

$$ \text{Data} = g(x) + \epsilon $$

where:

$g(x)$ is the true underlying function,
$\epsilon$ is the noise term or observational error, often assumed to be normally distributed with $\epsilon \sim N(0, \sigma^2)$

The variance $\sigma^2$ represents the observation variance, which is the inherent noise in the data that no model can eliminate (”irreducible”).

</aside>

Model Variance $(\mathbb{E}[\hat{Y}(x)] - \hat{Y}(x))$

Variance measures how much the model’s predictions change if we use different training datasets. A model with high variance tends to overfit the data, meaning it is too sensitive to fluctuations in the training data and performs poorly on unseen data. High variance models are too complex and capture noise along with the signal.

Variance is the variability of the model’s prediction due to different training sets:

$$ \text{Variance} = \text{Var}(\hat{Y}(x))=\mathbb{E}\left[ \left( \hat{Y}(x) - \mathbb{E}[\hat{Y}(x)] \right)^2 \right] $$

Where:

$\hat{Y}(x)$ is the prediction made by the model,
$\mathbb{E}[\hat{Y}(x)]$ is the average of the model’s predictions over different training datasets.

High variance indicates that the model’s predictions fluctuate significantly with changes in the data, leading to overfitting.

</aside>