1. Introduction

$\color{black}\rule{365px}{3px}$

Motivation

From ”Narrow Experts” to ”Competent Generalists”
- A single model that can generalize to multiple tasks without task-specific tuning. (Yet, GPT-1 requires some task-specific fine-tuning.)
Zero-Shot and Few-Shot Capabilities
- Unlike most earlier approaches requiring fine-tuning or specialized architectures for different tasks, GPT-2 emphasizes zero-shot performance—i.e., applying the same model directly to new tasks without fine-tuning.

Contributions

Language models can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification.