$\color{black}\rule{365px}{3px}$
“Sequence to Sequence Learning with Neural Networks” - 2014
Link to Paper:
https://arxiv.org/pdf/1409.3215
Table of Contents
1. Introduction
$\color{black}\rule{365px}{3px}$
Motivations
- Flexible Input-Output Mapping: Traditional RNNs process sequences of fixed lengths, but many real-world problems involve input and output sequences of variable lengths (e.g., translating a sentence from English to French).
- Contextual Understanding: Tasks like machine translation and text summarization require models to capture and generate contextually coherent sequences.
Contributions
- Encoder-Decoder Architecture:
- Encoder: Processes the input sequence and compresses it into a fixed-length context vector (also known as a thought vector), which captures the input’s semantic and syntactic information.
- Decoder: Generates the output sequence step by step based on the context vector and its own previous outputs.
- Attention Mechanism (introduced in later versions):
- Addresses the bottleneck of compressing all input information into a single context vector by allowing the decoder to focus on relevant parts of the input sequence at each time step.
- Significantly improves the model’s performance, especially for long sequences.