1. Introduction

$\color{black}\rule{365px}{3px}$

VGGNet is a ConvNet architecture proposed by Karen Simonyan and Andrew Zisserman from the Vision Geometry Group at University of Oxford in 2014.

Contributions

2nd Place in classification task in ImageNet Large Scale Visual Recognition Challenge(LSVRC) in 2014
However, won the 1st place for localization task.

To correctly predict in the case of top-5, the following criteria need to be met:
- Class Accuracy:
  
  The predicted class for the bounding box must match one of the true classes present in the image.
- Bounding Box Accuracy:
  
  The bounding box must accurately localize an object of the predicted class. This usually means the bounding box must have a certain degree of overlap with the ground truth bounding box (typically measured by the Intersection over Union (IoU) metric).
- Example Scenario
- IOU
Developed methodology for exploring the impact of network depth on performance by systematically adding more convolutional layers while keeping other design parameters constant, facilitated by the use of small convolution filters.

<aside> <img src="/icons/conversation_red.svg" alt="/icons/conversation_red.svg" width="40px" /> “In this paper, we address another important aspect of ConvNet architecture design: its depth. To this end, we fix other parameters of the architecture and steadily increase the depth of the network by adding more convolutional layers. This approach is feasible due to the use of very small (3 × 3) convolution filters in all layers.”

</aside>

2. Architecture

$\color{black}\rule{365px}{3px}$