$\color{black}\rule{365px}{3px}$

Term Frequency - Inverse Document Frequency


TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus).

<aside> <img src="/icons/light-bulb_red.svg" alt="/icons/light-bulb_red.svg" width="40px" /> The main idea behind TF-IDF is to quantify the importance of a word based on how frequently it appears in a document (term frequency) and how unique or rare it is across all documents (inverse document frequency).

</aside>

Components of TF-IDF


<aside> <img src="/icons/triangle-one-third_red.svg" alt="/icons/triangle-one-third_red.svg" width="40px" /> Term Frequency (TF):

$$ ⁍ $$

</aside>

<aside> <img src="/icons/triangle-two-thirds_red.svg" alt="/icons/triangle-two-thirds_red.svg" width="40px" /> Inverse Document Frequency (IDF):

$$ IDF(t, D) = \log \left( \frac{\text{Total number of documents}}{\text{Number of documents containing term } t} \right) $$

Here, $D$ is the total number of documents in the corpus.

</aside>

<aside> <img src="/icons/triangle-alternate_red.svg" alt="/icons/triangle-alternate_red.svg" width="40px" /> TF-IDF Calculation:

$$ \text{TF-IDF}(t, d, D) = TF(t, d) \times IDF(t, D) $$

</aside>

Example Calculation


Let’s walk through an example using a small corpus.

Example Corpus:

•	Document 1: “the cat sat on the mat”
•	Document 2: “the dog sat on the log”
•	Document 3: “the cat chased the dog”

Step-by-Step Calculation: