Selective Search Algorithm

$\color{black}\rule{365px}{3px}$

Selective Search is an algorithm used in object detection to generate region proposals, which are likely to contain objects.

<aside> <img src="/icons/bookmark_green.svg" alt="/icons/bookmark_green.svg" width="40px" /> Idea:

Use bottom-up grouping of image regions to generate a hierarchy of small to large regions.

</aside>

Algorithm

Step 1:

Generate initial sub-segmentation

Step 2:

Recursively combine similar regions into larger ones.

From set of regions, choose two that are most similar.
Combine them into a single, larger region.
Repeat.

<aside> <img src="/icons/bookmark_green.svg" alt="/icons/bookmark_green.svg" width="40px" /> Similarity Metrics

Color Similarity

Measures how similar the colors of two regions are.

Each region is described by a color histogram, which counts the number of pixels for each color.
The color similarity between two regions is calculated by comparing their color histograms. Specifically, it looks at the intersection of the histograms, which tells you how much the color distribution of one region overlaps with the other.

$$ s_{color}=\text{Color similarity} = \sum_{i=1}^{n} \min(h_i^A, h_i^B) $$

where $h_i^A$ and $h_i^B$ are the counts in the i$^{th}$ bin of the color histograms for regions $A$ and $B$, respectively. $n$ is the number of bins in the histogram.

Texture Similarity

Measures how similar the textures of two regions are.

Like color, texture can also be represented using histograms, but instead of color, the histogram counts different texture patterns, such as edges in various directions.
The texture similarity is calculated similarly to color similarity, by comparing the texture histograms of the two regions.

$$ s_{texture}=\text{Texture similarity} = \sum_{i=1}^{m} \min(t_i^A, t_i^B) $$

where $t_i^A$ and $t_i^B$ are the counts in the $i^{th}$ bin of the texture histograms for regions $A$ and $B$, respectively. $m$ is the number of bins in the texture histogram.

Size Similarity

Measures how similar the sizes of two regions are, with a preference for merging smaller regions.

This metric favours the merging of small regions because they are more likely to belong to the same object.
The size similarity is defined by the combined size of two regions relative to the entire image size. Smaller regions give a higher similarity value.

$$ s_{size}=\text{Size similarity} = 1 - \frac{\text{size}(A) + \text{size}(B)}{\text{size(image)}} $$

where $\text{size}(A)$ and $\text{size}(B)$ are the sizes of regions $A$ and $B$, respectively. $\text{size(image)}$ is the total number of pixels in the image.

Shape Compatibility (Fills Gap)

Measures how well the two regions fit together, considering how they would look if merged.

Shape compatibility looks at the bounding boxes of the two regions. If two regions fit together well within a bounding box, they are considered more similar.
This metric penalizes regions that, when merged, would leave a lot of empty space (indicating they don’t fit together well).

$$ s_{fill}=\text{Shape compatibility} = 1 - \frac{\text{size(bounding box of } A \cup B \text{)}}{\text{size(image)}} $$

where $A \cup B$ represents the bounding box that would enclose both regions $A$ and $B$.

Total Similarity

Finally, measure the similarity between two patches(segmentations/regions) as a linear combination of the four given metrics:

$$ \begin{align*} s(A,B) &= a_1\cdot s_{color}(A,B)+a_2\cdot s_{texture}(A,B)\\&+a_3\cdot s_{size}(A,B)+a_4\cdot s_{fill}(A,B)\end{align*} $$

where $A \text{ and } B$ are regions to to combine and $a_1, a_2, a_3, a_4$ are weights to be determined to create a diverse collection of region-merging strategies by considering different weighted combinations in different color spaces.

</aside>

By considering the sizes also, the algorithm will indeed combine the segmentations gradually and hierarchically from smaller to larger in a bottom-up fashion.