1. Motivation

$\color{black}\rule{365px}{3px}$

<aside> 💡 High-Dimensional Data:

Modern datasets often have a large number of features (dimensions). Analyzing high-dimensional data can be computationally expensive and challenging due to the “curse of dimensionality.”

</aside>

<aside> 💡 Simplification:

Reducing the number of dimensions can simplify models, making them easier to interpret and faster to train.

</aside>

2. Covariance

$\color{black}\rule{365px}{3px}$

<aside> 💡 Motivation

In one dimensional data it is easy to understand the idea of variance both conceptually and visually.

<aside> <img src="/icons/question-mark_red.svg" alt="/icons/question-mark_red.svg" width="40px" /> However, how do we understand the variance in higher dimensions?

</aside>

<aside> <img src="/icons/exclamation-mark-double_red.svg" alt="/icons/exclamation-mark-double_red.svg" width="40px" /> It involves understanding how the spread or variability of data is distributed across multiple features (dimensions).

</aside>

Before delving into technical parts of covariance, recall the concepts:

Variance: Variance measures the spread of a single *variable. It quantifies how much the values of the variable deviate from the mean of the variable.
Covariance: Covariance measures the directional relationship between two variables. It indicates how much two variables change together.

*Note that, variable = feature → # of variables = dimension of features

In PCA, we use covariance matrix as a linear transformation and use the eigenvectors with greatest eigenvalues as the principal components in the order. But why is the eigenvector of greatest eigenvalues representing the axis where the variance is greatest when data are projected on?