$\color{black}\rule{365px}{3px}$

Table of Contents

1. Introduction

$\color{black}\rule{365px}{3px}$

Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are advanced RNN variants designed to address these limitations and improve sequence modeling.

Motivations

Vanishing Gradient Problem: Standard RNNs struggle to learn long-term dependencies because gradients diminish as they are backpropagated through time.
Long-Term Dependency: As the context length grows, useful information get lost as towards the end of the sequence.
Efficient Memory Control: A need for mechanisms to selectively retain or forget information during sequence processing.

Contributions

LSTM:

Introduced a gated architecture to control the flow of information through its cells.
Consists of three gates:
- Input Gate: Decides which information to update in the cell state.
- Forget Gate: Determines which information to discard from the cell state.
- Output Gate: Regulates what information to output from the cell state.
Enables learning of long-term dependencies by maintaining a constant error flow through the cell state.