1. Introduction

$\color{black}\rule{365px}{3px}$

Motivations

Flexible Input-Output Mapping: Traditional RNNs process sequences of fixed lengths, but many real-world problems involve input and output sequences of variable lengths (e.g., translating a sentence from English to French).
Contextual Understanding: Tasks like machine translation and text summarization require models to capture and generate contextually coherent sequences.

Contributions

Encoder-Decoder Architecture:
- Encoder: Processes the input sequence and compresses it into a fixed-length context vector (also known as a thought vector), which captures the input’s semantic and syntactic information.
- Decoder: Generates the output sequence step by step based on the context vector and its own previous outputs.
Attention Mechanism (introduced in later versions):
- Addresses the bottleneck of compressing all input information into a single context vector by allowing the decoder to focus on relevant parts of the input sequence at each time step.
- Significantly improves the model’s performance, especially for long sequences.