1. Introduction

$\color{black}\rule{365px}{3px}$

Significance of the Paper

Fast R-CNN trains the very deep VGG16 network 9× faster than R-CNN, is 213× faster at test-time, and achieves a higher mAP on PASCAL VOC 2012.
Compared to SPPnet, Fast R-CNN trains VGG16 3× faster, tests 10× faster, and is more accurate.

Motivation From R-CNN’s Ineffectiveness

Training is a multi-stage pipeline.
- R-CNN first fine-tunes a ConvNet on object proposals using log loss(Entropy loss).
- Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning.
- In the third training stage, bounding-box regressors are learned.