Skip to Content

Bag of Words (BoW)

The Bag of Words is a method often used for document classification. This method turns text into fixed-length vectors by simply counting the number of times a word appears in a document, a process referred to as vectorization.

Alternatively, one can calculate the frequency with which each word appears in a document, typically using TF-IDF vectorization. 

Although these vectorization methods are easy to compute, it lacks any contextual information. It literally is a bag of words – there is no order, it’s only the word counts that matter.

Bag of Words