Search engines like Google have a vested interest in concealing exactly how they rank content, but there’s only so much you can hide in the information age. It’s known that search algorithms use topic models to sort and prioritize the 130 trillion pages on the web, and while we may never eliminate the unknowns of SEO, we can use what we do know to an advantage.
The introduction of Hummingbird made it clear that search algos are getting increasingly intelligent and that writing high-ranking content is no longer a matter of using as many keywords as possible. Instead, the algorithm employs topic models that can measure the topical comprehensiveness of a page and match it to a search query.
As a result, comprehensiveness has become a proxy by which search engines measure content quality. Moreover, Hummingbird made it easier to determine the methods by which algorithms rank content because it provided a baseline for experimentation. Comparing rankings before and after the update proved to be insightful.
How Do We Know Search Engines Use Topic Modeling?Using a relatively nascent version of MarketMuse, Neil Patel and his team of analysts and data scientists assessed the rankings of nearly 10 million words of content to see how Hummingbird was prioritizing pages. They found that even more than page authority and backlinks, topic comprehensiveness was the No. 1 factor that predicted high rankings.
Because MarketMuse uses topic modeling to accurately predict what a high-ranking page contains, we can confidently say that the method is an integral part of the majority of search algorithms.
We’re not the only ones who think so. If you’ve got some time on your hands, you can read this extensive research paper by the University of Maryland that details the many applications of topic models, including query expansion, information retrieval, and search personalization.
Given the vast number of pages online, the complex and infinite ways in which queries are entered, and the various on-page SEO factors taken into account for each search, it’s difficult to envision a way to efficiently produce SERPs without topic modeling. It’s safe to assume that the technology is a requirement for providing fast, relevant results.
Content marketers should care because understanding how search engines rank content can help you develop a content strategy that produces results, and you don’t need to be a data scientist to crack the code. (Although, later on we’ll discuss the history of topic modeling and dive into the details of different types of algorithms for data-curious content marketers.)
What SEOs Need to Know About Topic Models
If Google’s algorithm utilizes topic modeling to prioritize pages that have deep coverage of a given subject, then the best way to rank is to A) make your content easily readable by the algorithm and B) create in-depth, broad coverage of your focus topics.
Enter, topic clusters: a group of content that begins with pillar pages that broadly cover your focus topics, supported and linked to by pages that deeply cover topics related to your pillars. Topic clusters give you breadth and depth in a way that’s easily navigated by both humans and search algos.
HubSpot did an experiment last year which showed that interlinked topic clusters resulted in better SERP rankings and higher impressions. It’s likely that this occurred because the clusters made HubSpot’s content more easily crawlable, allowing the algorithm to quickly find the pages relevant to a query.
The interlinked clusters signal breadth and depth of a topic and can lead users through a seamless journey that answers their questions - after all, that’s the whole point of search. Getting those questions answered is called searcher task accomplishment. It contributes to higher ranking by increasing the authority of your pages each time a user visits and doesn’t bounce.
Topic Clusters and User IntentSearcher task accomplishment is a relatively new industry term, but the concept itself is not new. It’s what happens when you focus on user intent and aim to provide as many answers as possible with your content in an easily navigable way. In other words, creating topic clusters.
Optimizing your content around user intent involves thinking critically about the potential questions a person might have about your industry. However, as many content marketers have found out the hard way, throwing stuff at the wall to see what sticks isn’t a great way to strategize.
Creating topic clusters—and the pages within them—is best done with a solution that thinks the same way search algorithms do. MarketMuse takes one keyword (i.e., your focus topic for a page or a section of a page) and analyzes tens of thousands of related pages to identify subtopics, questions to answer, and user personas to address with your content, using artificial intelligence to generate detailed content suggestions.
The software helps to produce an outline of what your content should look like, removing much of the guesswork for your writers. We’re not the only company that provides this value, but we do it better than the competition thanks to an ensemble of natural language processing algorithms, information theory, neural networks, semantic analysis, and more.
Like Google, we’re not about to give away our trade secrets, but we can break down for you how more rudimentary topic modeling algorithms work so you can better understand the differences between specific tools and more sophisticated software platforms.
Term Frequency-Inverse Document FrequencyIntroduced in 1972, TF-IDF analyzes keyword frequency in a document compared to a set of documents. It measures the number of times a word or combination of words appears in a body of text and determines the degree of relevance the text has to that term by comparing it to a collection of other documents. Its greatest downfall, and the reason it’s not very useful for the complex world of SEO today, is that it can’t account for relationships, semantics, or syntactics.
Latent Semantic AnalysisDeveloped in 1988, latent semantic analysis (LSA) looks at the relationship between a set of documents and the terms they contain by producing a set of concepts related to the document and terms. LSA gets us closer to discovering synonyms and semantically related words, but still can’t identify relationships between topics.
Latent Dirichlet AllocationThis topic model was created in 2003 and is commonly used to identify topical probability and relationships between topic and subtopics. Latent Dirichlet Allocation (LDA) analyzes the connections between words in a corpus of documents and clusters words with similar meaning, providing a more in-depth semantic analysis than earlier topic models. LDA also utilizes a Bayesian inference model to identify the terms related to a topic within a document and improve those assumptions each time a new document is analyzed. Using LDA, you can get a reasonably precise assessment of the topics discussed in a document.
The MarketMuse Difference
Some inexpensive or free tools frequently use these topic models. However, they can only provide a coarse-grain analysis that gives you vast amounts of data that you’ll need to sift through manually. There isn’t a magic bullet algorithm that gives you relevance, relationships, semantics, syntactics and keyword variants.
We know because we’ve tried to create it, and we’ve ended up with a robust solution that combines applications of data science to provide the most advanced content solution for SEO on the market today.
With each experiment and update we conduct, we better understand how Google operates and how our software can help content marketers not only improve search performance but improve user experience and fulfill searcher intent.
We know that our clients’ websites are much more than marketing tools consisting of strategically developed keywords and phrases. They’re platforms where companies can display transparency, establish expertise, and help people find solutions to their problems.
Using MarketMuse, you can plan your content strategy, analyze existing content, optimize your site structure and linking, and confidently answer your viewers’ most pressing questions. Contact us when you’re ready to see the future of content analysis and optimization.
(Click infographic to enlarge.)