$\color{black}\rule{365px}{3px}$

1. Binary Cross Entropy Loss

$\color{black}\rule{365px}{3px}$

When we have two classes, with the label y being either $0$ or $1$, and the predicted probability $p$ for class $1$, the binary cross-entropy loss can be written as:

For one sample:

$$ Loss=−(y⋅\log⁡(p)+(1−y)⋅\log⁡(1−p)) $$

For the entire sample:

$$ Loss=−\sum_{i=1}^N(y_i⋅\log⁡(p_i)+(1−y_i)⋅\log⁡(1−p_i)) $$

where,

$y$: True label, where $y=1$ for the positive class and $y=0$ for the negative class.
$p$: Predicted probability of the positive class, which should ideally be close to $1$ when $y=1$ and close to $0$ when $y=0$.
$N$: Number of samples in the dataset the loss is calculated for.

Why Separate Formula from Multi-Class?

Because in binary classification, we only have one probability prediction assigned to positive class. (i.e. Not a $P(pos) = 0.6$ and $P(neg) = 0.4$, but $P(pos) = 0.6$)

Thus, in order to calculate the likelihood for negative classes, which can be calculated as $1-p$, we add up all the likelihoods no matter its positive or negative class.

Why $1 -p$ ?

Because we need to calculate the likelihood from the standpoint of negative class, we need to subtract from $1$ as the probability $p$ is the probability of being “positive” class, and we want the probability predicted as “negative” class.

(e.g .probability of $0.6$ being positive is equal to probabilty of $0.4$ being negative)

</aside>

2. Multi-Class Cross Entropy Loss

$\color{black}\rule{365px}{3px}$

For multi-class classification with more than two classes, the formula generalizes to: