$\color{black}\rule{365px}{3px}$

Selective Search is an algorithm used in object detection to generate region proposals, which are likely to contain objects.

<aside> <img src="/icons/bookmark_green.svg" alt="/icons/bookmark_green.svg" width="40px" /> Idea:


Use bottom-up grouping of image regions to generate a hierarchy of small to large regions.

</aside>

Algorithm


image.png

Step 1:

Generate initial sub-segmentation

Step 2:

Recursively combine similar regions into larger ones.

<aside> <img src="/icons/bookmark_green.svg" alt="/icons/bookmark_green.svg" width="40px" /> Similarity Metrics


Color Similarity


Measures how similar the colors of two regions are.

$$ s_{color}=\text{Color similarity} = \sum_{i=1}^{n} \min(h_i^A, h_i^B) $$

where $h_i^A$ and $h_i^B$ are the counts in the i$^{th}$ bin of the color histograms for regions $A$ and $B$, respectively. $n$ is the number of bins in the histogram.

Texture Similarity


Measures how similar the textures of two regions are.

$$ s_{texture}=\text{Texture similarity} = \sum_{i=1}^{m} \min(t_i^A, t_i^B) $$

where $t_i^A$ and $t_i^B$ are the counts in the $i^{th}$ bin of the texture histograms for regions $A$ and $B$, respectively. $m$ is the number of bins in the texture histogram.

Size Similarity


Measures how similar the sizes of two regions are, with a preference for merging smaller regions.

$$ s_{size}=\text{Size similarity} = 1 - \frac{\text{size}(A) + \text{size}(B)}{\text{size(image)}} $$

where $\text{size}(A)$ and $\text{size}(B)$ are the sizes of regions $A$ and $B$, respectively. $\text{size(image)}$ is the total number of pixels in the image.

Shape Compatibility (Fills Gap)


Measures how well the two regions fit together, considering how they would look if merged.

$$ s_{fill}=\text{Shape compatibility} = 1 - \frac{\text{size(bounding box of } A \cup B \text{)}}{\text{size(image)}} $$

where $A \cup B$ represents the bounding box that would enclose both regions $A$ and $B$.

Total Similarity


Finally, measure the similarity between two patches(segmentations/regions) as a linear combination of the four given metrics:

$$ \begin{align*} s(A,B) &= a_1\cdot s_{color}(A,B)+a_2\cdot s_{texture}(A,B)\\&+a_3\cdot s_{size}(A,B)+a_4\cdot s_{fill}(A,B)\end{align*} $$

where $A \text{ and } B$ are regions to to combine and $a_1, a_2, a_3, a_4$ are weights to be determined to create a diverse collection of region-merging strategies by considering different weighted combinations in different color spaces.

</aside>

By considering the sizes also, the algorithm will indeed combine the segmentations gradually and hierarchically from smaller to larger in a bottom-up fashion.