Skip to Content

Latent Semantic Analysis (LSA)

Latent Semantic Analysis is a natural language processing method that analyzes relationships between a set of documents and the terms contained within. It uses singular value decomposition, a mathematical technique, to scan unstructured data to find hidden relationships between terms and concepts.

Latent Semantic Analysis is an information retrieval technique patented in 1988, although its origin dates back to the 1960s.

LSA is primarily used for concept searching and automated document categorization. However, it’s also found use in software engineering (to understand source code), publishing (text summarization), search engine optimization, and other applications.

There are a number of drawbacks to Latent Semantic Analysis, the major one being is its inability to capture polysemy (multiple meanings of a word). The vector representation, in this case, ends as an average of all the word’s meanings in the corpus. That makes it challenging to compare documents.

In the world of search engine optimization, Latent Semantic Indexing (LSI) is a term often used in place of Latent Semantic Analysis. Some marketers believe using LSI can improve on-page SEO.  However, given that there are more recent and elegant approaches to natural language processing, the effectiveness of LSI in optimizing content for search is in doubt.

Latent Semantic Indexing