$\color{black}\rule{365px}{3px}$

“BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding” - 2019

Link to Paper:

Table of Contents

1. Introduction

$\color{black}\rule{365px}{3px}$

Contributions

BERT uses bidirectional self-attention for deeper contextual understanding. (Because it doesn’t mask future tokens, it has full context!)
Masked Language Modeling and Next Sentence Prediction form the core pre-training tasks.
The pre-trained BERT can be fine-tuned for various downstream tasks, such as sentiment analysis, text classification, and more.

2. Understanding BERT

$\color{black}\rule{365px}{3px}$