$\color{black}\rule{365px}{3px}$
“Attention Is All You Need” - 2017
Link to Paper:
$\color{black}\rule{365px}{3px}$
First model to rely solely on attention mechanisms for sequence modelling.
Attention Mechanism:
Computes relevance between tokens using the formula:
$$ \text{Attention}(Q,K,V)=\text{softmax}(\frac{QK^T}{\sqrt{d_k}})V $$
Self-Attention:
Multi-Head Attention:
$\color{black}\rule{365px}{3px}$

.png)