Do you remember that scene in The Matrix where Morpheus (Laurence Fishburne) holds out a capsule on each of his palms, describing the choice facing Neo (Keanu Reeves)?
He says, “This is your last chance. After this, there is no turning back. You take the blue pill — the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill — you stay in Wonderland, and I show you how deep the rabbit hole goes. Remember: all I’m offering is the truth. Nothing more.”
This blog post – it’s kind of like that. In this case, we’re going to talk about latent semantic indexing, specifically:
- What it is
- Why it’s so popular among the SEO crowd
- If Google uses it
- Whether it helps your SEO efforts
At the end, just like Neo, you can choose to believe whatever you wish. So, let’s get started. (queue suspenseful theme music)
What is Latent Semantic Indexing (LSI)?
In the world of SEO, latent semantic indexing (LSI) and latent semantic analysis (LSA) are interchangeable terms. We’re not going to split hairs over their difference, so we’ll follow the same practice in this blog post.
LSI is a technique that analyzes the relationships between a set of documents and the terms contained within. There is an underlying assumption that words close in meaning will appear in similar pieces of text (known as the distributional hypothesis). It relies on a mathematical technique called singular value decomposition to identify those relationships.
For those interested, here’s a great latent semantic indexing example (pdf) to which you can refer.
Don’t worry if math is not your thing. You don’t need to appreciate the technical nuances of the process.
What’s important to understand is that LSI was created to index the content of document collections with infrequent updates.
Why does that matter? We’ll get to that later.
First, let’s try to understand how LSI became such a big deal in the world of SEO.
LSI Keywords and Their Popularity Among SEO Experts
It seems the world of search engine optimization goes through phases where a particular strategy becomes popularity. At one point it was keyword stuffing, a practice where certain phrases would be repeated ad nauseam within a blog post. Admittedly, it did nothing to improve the content. But it worked until search engines caught on to the tactic.
The next “advancement” of on-page SEO concerned itself with keyword density. Instead of being indiscriminate with keyword stuffing, every SEO expert proselytized their opinion as to how many keyword phrases could be stuffed into an article without getting caught.
In hindsight, you can’t help but laugh at the futility of it all. But it was serious business back then. Surprisingly, this absolutely useless SEO strategy continues to attract interest!
Here’s another one of those SEO techniques I want to warn you about.
There are SEO experts who claim that Google uses some form of LSI technology. While I’m not sure who started this SEO equivalent of an urban legend, one thing is for sure. A lot of people talk about it as though it were a fact.
I guess that’s not surprising. When a few well-known SEO industry influencers commandeer the term and start claiming how LSI optimization helps drive organic traffic, the herd is sure to follow.
On the surface, the idea that LSI leads to SEO success does seem plausible. We know Google is interested in semantics – understanding natural language and grasping the hidden meaning behind words on a page. Here, they say as much.
Plus, if you’re an SEO company trying sell run-of-the-mill SEO services at a premium price, using terms and phrases like latent semantic indexing makes you sound real smart (and expensive).
Besides, how many topic modeling algorithms is the average digital marketing team going to understand? Probably somewhere between zero and one.
Also, latent semantic indexing sounds way sexier than Term Frequency-Inverse Document Frequency (TF-IDF), or Latent Dirichlet Allocation (LDA).
Still the question remains.
Do Search Engines Use Latent Semantic Indexing?
Though it can’t be proven, it’s very unlikely that Google uses LSI. I know, there will always be those want to believe otherwise – just like there are people that believe the earth is flat, America never went to the moon, and Elvis is still alive.
While we know that Google conducts semantic analysis of text, one cannot conclude that therefore they use latent semantic analysis. That assertion is a major jump in logic.
Here’s the other problem.
A major challenge of LSI technology is the issue of scalability and performance. This early attempt at natural language processing was designed to work on a comparatively small set of static documents. It was never created to deal with large amounts of constantly changing content like we have on the Web. In fact, the patent for latent semantic analysis was filed in 1988, nearly three years before the web went live.
Not to mention that there are some drawbacks to using latent semantic analysis:
- The model has difficulty dealing with polysemy (multiple meanings of a word). For example, a crane could be a piece of construction or a long-necked bird.
- It ignores word order, thus missing out on syntactic relations, logic and morphology.
- It assumes a particular distribution (Gaussian) of terms in documents that may not be true in all instances.
- It is computationally intensive and difficult to continuously update with new data.
Here’s something else to consider. Google patented a word vector approach (pdf), granted in 2017, capable of dealing with a corpus of billions of words with millions of words in the vocabulary. That’s exactly the kind of fire power you need when analyzing content on the web!
Safe to say that the technologies Google uses to index web content and understand that content has advanced considerably since its early days.
LSI is training wheels for search engines.
Why use old technology when you can use something not only better but faster?
Can Your Search Engine Optimization Efforts Benefit from Latent Semantic Analysis?
Not really. First, we’ll look at why that is and then we’ll look at a better approach.
You may have come across websites offering free “LSI keywords.” Unfortunately, they don’t provide any information how they use LSI to generate their results. From what I’ve seen, the quality of the output isn’t that great. What they provide aren’t necessarily related topics as much as they are variants.
There’s another issue to consider as well.
When it comes to latent semantic indexing and SEO, the advice I’ve seen basically comes down to “sprinkling” your content with some of these LSI keywords, which are really just synonyms. The rationale behind this advice is that using these synonyms strengthens the thematic relevance of your content. That’s gotta be good for SEO, right?
Not so fast.
This thing about swapping out words smells a whole lot like the keyword stuffing/density rabbit hole many SEOs ventured down not so long ago.
As previously mentioned, there’s no evidence from which we can conclude that Google uses LSI. In fact, the search engine is certainly light years ahead in the technology it uses to understand web pages and establish semantic relevance.
So, stop relying on 80’s technology. Don’t be that person who won’t give up their Betamax VCR.
Do this instead.
- Stop looking for ways to make Google think your content is better than it actually is.
- Focus on your audience and create great content.
- Employ a better topic modeling platform (try MarketMuse).
- Use its Research Application to derive a list of semantically related topics. Just type in a topic (typically a search query) and you’ll get a list of 50 related topics ordered by relevance.
- Carefully examine that list to determine the story behind the topic. Structure the blog post using subheadings that capture the essence of the topic.
- Fill in the details of each section, addressing all the relevant concepts and adding context.
- Use the Questions Application to help better understand user intent behind the search term you’re targeting.
- Get a new perspective on your on-page optimization efforts. MarketMuse’ Optimize Application indicates how well you’ve covered the topic. Get immediate feedback on your writing to determine whether you’ve achieved your target content score and word count. You’ll know the potential of your content, before you even hit the publish button.
Your SEO strategy shouldn’t run on 1980’s technology, unless you’re the type that enjoys using an IBM-XT running MS-DOS. Who am I to say? A few years from now, we may all be laughing at LSI just like we do at keyword density.