MarketMuse NLG Technology vs. GPT-3
With all the interest in GPT-3 lately, we decided to take a look at how it compares to MarketMuse NLG Technology, similar to the process taken when evaluating GPT-2.
Before diving into the examples, here’s a quick overview of the differentiating factors of MarketMuse NLG Technology.
- Unlike GPT-3, given a topic, we build an article piece-by-piece, using a content brief as a ‘backbone’.
- Given a topic, we generate a content brief, structured into subheadings and related topics, we use it as a guide.
- For each brief section, we use the related topics and the subheading as a prompt, and we keep generating until we produce output that passes our quality filters.
- Our filters include of course the content score and presence of relevant topics that we expect to see in the output, but we also check for grammatical errors, lexical diversity, plagiarism, and other readability measures.
Three Content Examples
GPT-3 is bigger and supposedly better than its predecessor, but it’s unlikely to take over the Internet. OpenAI published a detailed paper (PDF) on their language model. For our purposes, we’re not interested in studying the science behind natural language generation. Instead, we’re taking an empirical approach.
Below, you’ll find three excerpts about the importance of being on Twitter, written by MarketMuse NLG Technology, GPT-3, and a human with the help of MarketMuse NLG Technology. Can you tell which is which?
Version 1
Version 2
Version 3
Which Piece of Content Was Created By a Human?
First, let’s see if it passes the “sniff” test. Do these articles seem as though they were written by a human?
Only one was. Can you guess which?
The first one was written by GPT-3, the second by a human, and the third by MarketMuse NLG Technology.
How Good Are They at Conveying Information
Let’s examine the output of all three approaches against Content Score, Writer Score, and Grade Level.
MarketMuse Content Score evaluates how well the piece has covered the topic as compared to the topic model. Higher is better, and for this article the Suggested Content Score is 42, although there’s no thing as a perfect score.
Writer Score is a score assigned by Writer.com and is based on spelling and grammar, terms, style, clarity, inclusivity, and delivery – higher is better. Grade level denotes the expected educational level required to comprehend the content. The grade level of your writing should generally match that of your audience.
MarketMuse NLG Technology
As to be expected, MarketMuse NLG Technology did the best in terms of addressing the topics in the model. It’s designed to ensure it meets two important KPI, word count and Content Score.
MarketMuse NLG Technology did surprisingly well when it came to Writer Score. There were a few issues with spelling and grammar, term usage, style, and clarity. The Grade Level is within the range of the intended audience of this article.
GPT-3
GPT-3 is like a person that talks a lot but says very little.
There’s a really simple explanation for its Content Score of 4. The article fails to address the important issues that an expert does when discussing the importance of being on Twitter. Granted, the post may be cute and entertaining, but it’s void of any substance.
Not once in nearly 2,400 words about Twitter did the article speak about or explain anything to do with:
- social media
- tweets
- Twitter followers
- Twitter marketing
- trending hashtags
Not to mention the 45 other topics that you’ll find in the MarketMuse topic model. The problem is that the article lacks structure and any inherent meaning.
If a human submitted that article what would you do?
The post says nothing insightful about the importance of being on Twitter. As a result, it’s infinitely more difficult to edit and polish that draft into a valuable piece of publishable content. It’s the same issue that we discovered when evaluating GPT-2.
There’s a word for this type of article. It’s called “fluff.”
It also suffered from the lowest Writer Score. That’s the result of a large number of spelling and grammar issues along with others involving clarity, inclusivity, and style.
Writing at Grade Level 4 is a concern here. It’s always best to write at the level of your audience. You risk losing them if your writing is either too complicated or too simple. In this case, GPT-3 is writing at a level far too basic for a business audience.
Human
The human, your’s truly, did pretty decent job, if I may say so myself. The article sits comfortably above the target, with a Content Score of 45. The Writer Score, at 99, is almost perfect, which it should be. I use the Writer for Chrome plugin so I catch any errors upfront. A Grade Level of 8 is still within range of a business audience.
The MarketMuse NLG Technology Advantage
GPT-3 is a solution in search of an application. The only way to access the API is to join a waitlist in which your use case muse be described. Even with access, you’ll still be limited in using what’s provided through the Application Programming Interface.
MarketMuse NLG Technology was created to solve a specific use-case, in particular, generating long-form SEO-quality articles for content marketers. Here are the advantages is has to offer.
- Coherence and Structure – MarketMuse NL Technology output is dictated by MarketMuse Content Briefs so drafts are coherent and structured out of the box. GPT-3 starts with prompt text but lacks guardrails, leading to unstructured output unsuitable for SEO-quality content.
- Control – Users can build their own MarketMuse Content Briefs before ordering a draft. Specify the topics the article should mention, questions it should answer, and the sections of the article. GPT-3 offers little control over which topics generations mention and which questions the content answers.
- Publication-Ready – MarketMuse NLG Technology output can be edited into publication-ready content in 1-2 hours. GPT-3 output takes several hours to be edited into publication-ready content.
- Degradation, Plagiarism, Repetition – MarketMuse NLG Technology produces text that is free of degradation at length, plagiarism, and repetition. GPT-3 output doesn’t check for degradation, plagiarism, or repetition.
- Training – MarketMuse NLG Technology is trained on articles from a curated dataset (that excludes sexist, racist, and adult content) to improve the outcome of generations. GPT-3 is trained on the entire web, including low-quality, explicit, and hateful content, leading to low-quality generations.
- Configuration – MarketMuse NLG Technology can be configured to write in your style or one you wish to emulate, as well as learn new vocabulary over time. GPT-3 can only generate text based on the parameters of the model, with little to no configurability.
- Article Length – MarketMuse NLG Technology can generate articles up to 5,000 words based on the length of the MarketMuse Content Brief. GPT-3 can only generate up to ~1,200 words.
The Takeaway
Scale your content creation without scaling your costs and headaches. MarketMuse NLG Technology accelerates content creation by using AI to create complete drafts of articles based on MarketMuse Content Briefs. Keep your content costs predictable and your quality consistent by letting AI do the work of getting you a strong initial draft.
What you should do now
When you’re ready… here are 3 ways we can help you publish better content, faster:
- Book time with MarketMuse Schedule a live demo with one of our strategists to see how MarketMuse can help your team reach their content goals.
- If you’d like to learn how to create better content faster, visit our blog. It’s full of resources to help scale content.
- If you know another marketer who’d enjoy reading this page, share it with them via email, LinkedIn, Twitter, or Facebook.
Stephen leads the content strategy blog for MarketMuse, an AI-powered Content Intelligence and Strategy Platform. You can connect with him on social or his personal blog.