$\color{black}\rule{365px}{3px}$
Table of Contents
$\color{black}\rule{365px}{3px}$

The Gini Index and Entropy are both impurity measures used in decision tree algorithms to determine how well a dataset is split, but they have slightly different interpretations and properties.
<aside> π‘
Impurity quantifies how mixed or homogeneous the classes within a dataset are
But how are they different?
Both metrics aims to measure the impurity of a dataset but in different impressions.
$\color{black}\rule{365px}{3px}$
<aside> <img src="/icons/bookmark_lightgray.svg" alt="/icons/bookmark_lightgray.svg" width="40px" />
Technical Definition
$$ \text{Gini}(D)=1β \sum_{i=1}^C{p_i}^2
$$
where:
Example
Imagine a dataset of color $D=$ $\{$Red, Blue$\}$ where:
Then,
$$ \text{Gini}(D)=1β(0.7^2+0.3^2)=1β(0.49+0.09)=1β0.58=0.42 $$
</aside>
<aside> <img src="/icons/bookmark_lightgray.svg" alt="/icons/bookmark_lightgray.svg" width="40px" />
Intuitive Explanation
The Gini Index evaluates the probability of picking two random samples and finding that they belong to different classes. The formula achieves this by subtracting the probability of picking two random samples to be same class from 1 (total probability).

Letβs revisit the formula and interpret it.
$$ \text{Gini}(D)=1β \sum_{i=1}^C{p_i}^2
$$
$$ \begin{align*} \text{Gini}(D) &= P(\text{two random samples being different class}) \\&= 1 - P(\text{two random samples being same class})\\ &= 1 - \sum_{i=1}^Cp_i^2 \end{align*} $$
So, in the end, Gini Index aims to measure how likely it is to misclassify a randomly chosen sample. It does this by examining how uneven the distribution of classes is in a node. Imagine, if there was one dominant class, there is absolutely NO chance to misclassify the data in that node. However, if there were a lot of classes with equally likely distribution, it is very likely to misclassify the data!
</aside>