$\color{black}\rule{365px}{3px}$
“BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding” - 2019
Link to Paper:
arxiv.org
Table of Contents
1. Introduction
$\color{black}\rule{365px}{3px}$
Contributions
- BERT uses bidirectional self-attention for deeper contextual understanding. (Because it doesn’t mask future tokens, it has full context!)
- Masked Language Modeling and Next Sentence Prediction form the core pre-training tasks.
- The pre-trained BERT can be fine-tuned for various downstream tasks, such as sentiment analysis, text classification, and more.
2. Understanding BERT
$\color{black}\rule{365px}{3px}$