With all the interest in GPT-3 lately, we decided to take a look at how it compares to MarketMuse First Draft, similar to the process taken when evaluating GPT-2.
Before diving into the examples, here’s a quick overview of the differentiating factors of First Draft.
- Unlike GPT-3, given a topic, we build an article piece-by-piece, using a content brief as a ‘backbone’.
- Given a topic, we generate a content brief, structured into subheadings and related topics, we use it as a guide.
- For each brief section, we use the related topics and the subheading as a prompt, and we keep generating until we produce output that passes our quality filters.
- Our filters include of course the content score and presence of relevant topics that we expect to see in the output, but we also check for grammatical errors, lexical diversity, plagiarism, and other readability measures.
Three Content Examples
GPT-3 is bigger and supposedly better than its predecessor, but it’s unlikely to take over the Internet. OpenAI published a detailed paper (PDF) on their language model. For our purposes, we’re not interested in studying the science behind natural language generation. Instead, we’re taking an empirical approach.
Below, you’ll find three excerpts of content covering three distinctly different subjects. Review them before proceeding because there will be a test!
The Importance of Being on Twitter
Glucagon as a Non-invasive Diabetic Treatment
What’s the Difference Between NLP, NLU, and NLG?
Which Piece of Content Was Created By a Human?
First, let’s see if it passes the “sniff” test. Do these articles seem as though they were written by a human? Only one was. Can you guess which?
Which Article Provides the Least Amount of Information?
Next, let’s see how well each article covers its respective topic. To determine this we ran each article through MarketMuse Optimize for an objective evaluation. The score is based on how many topics are mentioned as compared to its model.
For any given subject, MarketMuse analyzes vast amounts of data, thousands of pages, to determine the 50 most relevant topics. These are the issues that experts address when discussing the subject.
MarketMuse Content Score is a relative rating; meaning there is no perfect score. However, the GPT-3 article, with a score of 2 points, was spectacularly dismal any way you look at it.
GPT-3 is like a person that talks a lot but says very little.
There’s a really simple explanation for its poor performance. The article fails to address the important issues that an expert does when discussing the importance of being on Twitter. Granted, the post may be cute and entertaining, but it’s void of any substance.
Not once in nearly 2,400 words about Twitter did the article speak about or explain anything to do with:
- social media
- Twitter followers
- Twitter marketing
- trending hashtags
Not to mention the 45 other topics that you’ll find in the MarketMuse topic model. The problem is that the article lacks structure and any inherent meaning.
If a human submitted that article what would you do?
The post says nothing
insightful about the importance of being on Twitter. As a result, it’s infinitely more difficult to edit and polish that draft into a valuable piece of publishable content. It’s the same issue that we discovered when evaluating GPT-2.
There’s a word for this type of article. It’s called “fluff.”
The MarketMuse First Draft Advantage
GPT-3 is a solution in search of an application. The only way to access the API is to join a waitlist in which your use case muse be described. Even with access, you’ll still be limited in using what’s provided through the Application Programming Interface.
MarketMuse First Draft was created to solve a specific use-case, in particular, generating long-form SEO-quality articles for content marketers. Here are the advantages is has to offer.
Coherence and Structure
MarketMuse First Draft output is dictated by MarketMuse Content Briefs so drafts are coherent and structured out of the box.
GPT-3 starts with prompt text but lacks guardrails, leading to unstructured output unsuitable for SEO-quality content.
Users can build their own MarketMuse Content Briefs before ordering a draft. Specify the topics the article should mention, questions it should answer, and the sections of the article.
GPT-3 offers little control over which topics generations mention and which questions the content answers.
MarketMuse First Draft output can be edited into publication-ready content in 1-2 hours.
GPT-3 output takes several hours to be edited into publication-ready content.
Degradation, Plagiarism, Repetition.
First Draft produces text that is free of degradation at length, plagiarism, and repetition.
GPT-3 output doesn’t check for degradation, plagiarism, or repetition.
First Draft is available for all MarketMuse customers.
Access to the GPT-3 API is restricted.
MarketMuse First Draft is cost-effective for practical uses.
GPT-3 is not cost-effective for practical content creation at scale.
First Draft is trained on articles from a curated dataset (that excludes sexist, racist, and adult content) to improve the outcome of generations.
GPT-3 is trained on the entire web, including low-quality, explicit, and hateful content, leading to low-quality generations.
MarketMuse First Draft can be configured to write in your style or one you wish to emulate, as well as learn new vocabulary over time.
GPT-3 can only generate text based on the parameters of the model, with little to no configurability.
MarketMuse First Draft can generate articles up to 5,000 words based on the length of the MarketMuse Content Brief.
GPT-3 can only generate up to ~1,200 words.
Scale your content creation without scaling your costs and headaches. MarketMuse First Draft accelerates content creation by using AI to create complete drafts of articles based on MarketMuse Content Briefs. Keep your content costs predictable and your quality consistent by letting AI do the work of getting you a strong first draft.
Written by Stephen Jeske