With all the interest in GPT-3 lately, we decided to take a look at how it compares to MarketMuse First Draft, similar to the process taken when evaluating GPT-2.
Before diving into the examples, here’s a quick overview of the differentiating factors of First Draft.
- Unlike GPT-3, given a topic, we build an article piece-by-piece, using a content brief as a ‘backbone’.
- Given a topic, we generate a content brief, structured into subheadings and related topics, we use it as a guide.
- For each brief section, we use the related topics and the subheading as a prompt, and we keep generating until we produce output that passes our quality filters.
- Our filters include of course the content score and presence of relevant topics that we expect to see in the output, but we also check for grammatical errors, lexical diversity, plagiarism, and other readability measures.
Three Content Examples
GPT-3 is bigger and supposedly better than its predecessor, but it’s unlikely to take over the Internet. OpenAI published a detailed paper (PDF) on their language model. For our purposes, we’re not interested in studying the science behind natural language generation. Instead, we’re taking an empirical approach.
Below, you’ll find three excerpts about the importance of being on Twitter, written by MarketMuse First Draft, GPT-3, and a human with the help of MarketMuse First Draft. Can you tell which is which?
Which Piece of Content Was Created By a Human?
First, let’s see if it passes the “sniff” test. Do these articles seem as though they were written by a human?
Only one was. Can you guess which?
The first one was written by GPT-3, the second by a human, and the third by MarketMuse First Draft.
How Good Are They at Conveying Information
Let’s examine the output of all three approaches against Content Score, Writer Score, and Grade Level.
MarketMuse Content Score evaluates how well the piece has covered the topic as compared to the topic model. Higher is better, and for this article the Suggested Content Score is 42, although there’s no thing as a perfect score.
Writer Score is a score assigned by Writer.com and is based on spelling and grammar, terms, style, clarity, inclusivity, and delivery – higher is better. Grade level denotes the expected educational level required to comprehend the content. The grade level of your writing should generally match that of your audience.
MarketMuse First Draft
As to be expected, MarketMuse First Draft did the best in terms of addressing the topics in the model. It’s designed to ensure it meets two important KPI, word count and Content Score.
MarketMuse First Draft did surprisingly well when it came to Writer Score. There were a few issues with spelling and grammar, term usage, style, and clarity. The Grade Level is within the range of the intended audience of this article.
GPT-3 is like a person that talks a lot but says very little.
There’s a really simple explanation for its Content Score of 4. The article fails to address the important issues that an expert does when discussing the importance of being on Twitter. Granted, the post may be cute and entertaining, but it’s void of any substance.
Not once in nearly 2,400 words about Twitter did the article speak about or explain anything to do with:
- social media
- Twitter followers
- Twitter marketing
- trending hashtags
Not to mention the 45 other topics that you’ll find in the MarketMuse topic model. The problem is that the article lacks structure and any inherent meaning.
If a human submitted that article what would you do?
The post says nothing insightful about the importance of being on Twitter. As a result, it’s infinitely more difficult to edit and polish that draft into a valuable piece of publishable content. It’s the same issue that we discovered when evaluating GPT-2.
There’s a word for this type of article. It’s called “fluff.”
It also suffered from the lowest Writer Score. That’s the result of a large number of spelling and grammar issues along with others involving clarity, inclusivity, and style.
Writing at Grade Level 4 is a concern here. It’s always best to write at the level of your audience. You risk losing them if your writing is either too complicated or too simple. In this case, GPT-3 is writing at a level far too basic for a business audience.
The human, your’s truly, did pretty decent job, if I may say so myself. The article sits comfortably above the target, with a Content Score of 45. The Writer Score, at 99, is almost perfect, which it should be. I use the Writer for Chrome plugin so I catch any errors upfront. A Grade Level of 8 is still within range of a business audience.
The MarketMuse First Draft Advantage
GPT-3 is a solution in search of an application. The only way to access the API is to join a waitlist in which your use case muse be described. Even with access, you’ll still be limited in using what’s provided through the Application Programming Interface.
MarketMuse First Draft was created to solve a specific use-case, in particular, generating long-form SEO-quality articles for content marketers. Here are the advantages is has to offer.
- Coherence and Structure – MarketMuse First Draft output is dictated by MarketMuse Content Briefs so drafts are coherent and structured out of the box. GPT-3 starts with prompt text but lacks guardrails, leading to unstructured output unsuitable for SEO-quality content.
- Control – Users can build their own MarketMuse Content Briefs before ordering a draft. Specify the topics the article should mention, questions it should answer, and the sections of the article. GPT-3 offers little control over which topics generations mention and which questions the content answers.
- Publication-Ready – MarketMuse First Draft output can be edited into publication-ready content in 1-2 hours. GPT-3 output takes several hours to be edited into publication-ready content.
- Degradation, Plagiarism, Repetition – First Draft produces text that is free of degradation at length, plagiarism, and repetition. GPT-3 output doesn’t check for degradation, plagiarism, or repetition.
- Training – First Draft is trained on articles from a curated dataset (that excludes sexist, racist, and adult content) to improve the outcome of generations. GPT-3 is trained on the entire web, including low-quality, explicit, and hateful content, leading to low-quality generations.
- Configuration – MarketMuse First Draft can be configured to write in your style or one you wish to emulate, as well as learn new vocabulary over time. GPT-3 can only generate text based on the parameters of the model, with little to no configurability.
- Article Length – MarketMuse First Draft can generate articles up to 5,000 words based on the length of the MarketMuse Content Brief. GPT-3 can only generate up to ~1,200 words.
Scale your content creation without scaling your costs and headaches. MarketMuse First Draft accelerates content creation by using AI to create complete drafts of articles based on MarketMuse Content Briefs. Keep your content costs predictable and your quality consistent by letting AI do the work of getting you a strong first draft.