$\color{black}\rule{365px}{3px}$
$\color{black}\rule{365px}{3px}$
A metric-learning loss function used in machine learning, particularly in tasks like face verification, image retrieval, and Siamese neural networks. Its primary purpose is to learn a feature space where similar samples are pulled closer together and dissimilar samples are pushed further apart. The loss is designed to enforce a contrast between "positive pairs" (similar samples) and "negative pairs" (dissimilar samples).
Formulation
For a pair of input samples $(x_1,x_2)$, their corresponding labels $y$ indicate whether the pair is similar $(y=1)$ or dissimilar $(y=0)$. Let $D$ be the distance between their embeddings in the feature space (usually computed as the Euclidean distance or cosine similarity).
$$ L=(1−y)⋅max(0,m−D)^2+y⋅D^2 $$
Where:

<aside> <img src="/icons/subtitles_lightgray.svg" alt="/icons/subtitles_lightgray.svg" width="40px" />
IF Positive Pairs $(y=1)$
The term $y⋅D^2$ minimizes the distance $D$ between embeddings of similar pairs, encouraging them to be closer in the feature space.
</aside>
<aside> <img src="/icons/subtitles_lightgray.svg" alt="/icons/subtitles_lightgray.svg" width="40px" />
IF Negaive Pairs $(y=0)$
The term $(1−y)⋅max(0,m−D)^2$ penalizes embeddings of dissimilar pairs only when their distance $D$ is less than the margin $m$. This encourages dissimilar pairs to be at least $m$ units apart.