Inverse Document Frequency (IDF )

Inverse Document Frequency (IDF) is a calculation often used in conjunction with Term Frequency. The problem with term frequency is that frequent terms aren’t always the most important. For example, on the MarketMuse blog, the term “content” is likely to be found on virtually every page. IDF is a way of reducing the weight of terms that appear frequently within a corpus (collection of documents).

IDF is calculated by dividing the total number of documents by the number of documents in the collection containing the term.