Skip to Content

Bag of Words (BoW)

The Bag of Words is a method often used for document classification. It turns text into fixed-length vectors by counting the number of times a word appears in a document. This process is called vectorization.

Alternatively, one can calculate the frequency with which each word appears in a document, typically using TF-IDF vectorization. 

Although these vectorization methods are easy to compute, it lacks any contextual information. It literally is a bag of words – there is no order, it’s only the word counts that matter.

Bag of Words