GPT-3 is the Sparring Partner You Didn’t Know You Needed


You probably haven’t read the whole internet. Slacker.

GPT-3 has. It’s read all of Wikipedia, even the article about Berserk llama syndrome. It’s gone through all of Google Books. It read every month of the Common Crawl dataset from 2016 to October 2019, comprising every piece of writing on the web during that time. It has read 45 terabytes of text, which equals a book about 3,375,000,000 pages long. It is well-read.

That’s why it’s writing everything at the moment. By day it’s a coder, and by night it’s a poet. Need a Seinfeld script written? No problem. Want to chat with Shakespeare? He’s here for you. Bored with writing emails to your partner? Just write a bullet list, and it’ll take care of the rest. GPT-3 can now write seemingly anything.

How? And how well? We’ve been experimenting with the API over the past few weeks to understand how it works, where it works best, and to discover how it’s going to fit into the content marketing world.

GPT-3’s Mission in Artificial Life Is to Write Just One More Word

What has GPT-3 done with all that text? Its name alone contains the answer. GPT stands for, in reverse order:

  • Transformer is the algorithm the model uses. It specializes in being able to process long pieces of text, pay attention to the salient information therein, and understand how words are used in natural language.
  • Pre-trained means that the weights of the algorithm are already set for you based on the massive corpus of data it has imbibed. When you run it, you don’t need to do any more training.
  • Generative means that the model’s goal is to generate text. It wants to predict what word comes next in any given sentence.

So Generative Pre-trained Transformer Number 3 has analyzed billions of lines of text, figured out what matters and what doesn’t in language, and then uses that information to produce more text.

All it cares about is producing that next word and that word ‘fitting in’ with everything that has come before. It produces different probabilities for different words, and then picks the one with the highest likelihood. With the GPT-3 API, you give it a prompt—a piece of writing—and then it tries to predict what would come after that, based on all the stuff it has read and the rules it has learned. You can ask for five words or 1,000 words, and it’ll try to ‘complete’ your writing.

This is obviously different to how humans write.

We build abstract mental models. As I write this, I’m not just thinking that the word after this is is. I’m trying to synthesize my thoughts about GPT-3 with my knowledge of content marketing and my knowledge of neuroscience. I wrote this paragraph before I wrote the paragraph above. I have a construct of this entire article formed in my head that I am trying to make manifest with every word.

This is because people are more than just memory. We have higher executive functions that allow us to exhibit reasoning, planning, cognitive flexibility, and problem-solving. We also have lower-order emotions that allow us to have opinions and care about one idea over another, one word over another, not just as the output of a probability function.

GPT-3 doesn’t have any of this.

But even without these advanced neural functions honed by hundreds of thousands of years of evolution, GPT-3 seems to do A-OK on a lot of writing tasks. In the paper detailing GPT-3, the researchers gave test subjects 200-word news articles written by either humans or the model and asked them to state which one had written them. On average, they chose right only 52% of the time, which is basically chance.

Only 12% of human subjects were able to recognize this was written by a computer:

Screen Shot 2020-08-04 at 9.44.57 PM.png

The implications of this are enormous. If a computer can write text indistinguishable from humans without all that reasoning and emotion, where does that leave writers? If instead of writing 200 words, a journalist has to just write a title and subtitle and let a computer fill out the rest, then they can 10X their output. And, of course, a newspaper can employ just one journalist instead of 10.

And where does it leave humans? If reasoning and emotion aren't needed, are we over-engineered? No. As we've understood better how GPT-3 works, we've also understood better its limitations, where humans thrive, and how we can work together with AIs for greater success.

Getting GPT-3 to Write Great Long-Form Content Is Hard

Can GPT-3 write good content? Good enough that you’d be happy to publish it on your site? We decided the best way to test it out was to get it to write this article—an article about itself. It failed. This isn’t that article.

If you want to read what GPT-3 wrote, you can check it out here.

We generated that article following these steps:

  1. Gave GPT-3 a small prompt: a title and a brief synopsis of what we were looking for in the article (like you can see in the news article above).
  2. Used that to generate a couple of paragraphs of text output.
  3. Used the first (human) prompt plus the new (GPT-3) text as the next prompt.
  4. Repeated Step 3 until we bootstrapped a nice content-marketing piece at 1,200 words.

We did a little curating here and there, but with the exception of the title and the first paragraph, everything in that article is generated by GPT-3.

Now, the article’s not bad. Or rather it is, but it makes superficial sense.

There’s a vast gulf between something that sounds good and something that’s actually good. GPT-3 is fantastic at creating a simulacrum of good content, something that imitates the flow and style of a human writer, but everything that makes an article useful is absent.

Here are a few examples.

It Lacks a Narrative Thread

One of the first skills a writer develops is the ability to thread together the component parts of an argument into a cohesive, logical narrative. It’s here that GPT-3 has its biggest failing: the writing meanders and skirts around the topic. It can write about a question but struggles to directly answer the question.

When we asked Gail Marie, our lead editor, to edit the article, she tore it to shreds:

"The reasoning throughout is dizzying and often illogical. A paragraph will begin by stating that X is true, and end stating that X is false. Here’s one example (these sentences come one after the other):

‘The real problem is that you can't automate value, entertainment, or authority. At the low end, you can automate value.'

My brain hurts."

Here is a sample comment, inline:

Screen Shot 2020-08-05 at 9.42.19 AM.png

This was a common trait when we tried to generate long-form content with GPT-3. It loses the point.

The end of its articles doesn’t relate strongly to the beginning. Whereas a great piece of writing presents you with an idea at the outset and then builds up evidence to a logical conclusion, GPT-3 has a tendency to do what a bad piece of writing does—just say stuff.

A single narrative thread can be seen in the article The Hive is the New Network. It takes a single idea—the emergent properties of the hive and how they relate to human networks—and pulls it through the entire article, adding on evidence and opinion to build up a compelling argument. The point is never lost because the human writer was capable of understanding the point throughout the writing.

The ability to build a compelling argument requires showing proof, something else GPT-3 is sorely lacking.

There Is No (Real) Evidence for Claims

GPT-3 is good at making claims and bad at validating them with data or reasoning, in a few ways.

The first is it just won’t back anything up. It makes assertions about the world but doesn’t link or cite any evidence. Within our editing process, this would lead to both editors and copyeditors highlighting the text with ‘citation/source?’ Unfortunately, it’s not uncommon to see this in articles written by humans, either. There are plenty of blogs that don’t cite their sources.

The second way GPT-3 is bad at validating claims is slightly more unnerving: it makes mistakes with factual data. You can see an example of this in the news article shared above. It wrote in reference to the Methodist schism:

The first occurred in 1968, when roughly 10% of the denomination left to form the Evangelical United Brethren Church.

This isn’t true—it is the other way round. The Evangelical United Brethren Church merged with the United Methodist Church that year.

But it’s the third mistake that is more insidious. It will make stuff up. Sometimes this is funny. In the article we prompted it to write, GPT-3 had trouble with simple addition and subtraction.

Screen Shot 2020-08-03 at 8.08.34 PM.png

It says 5,000 - 2,000 = 500. This is not true.

Sometimes it is much less funny. When I was experimenting with generating content from small ‘thought leadership’ style sentences, GPT-3 wrote:

“If you design a good process,” says O’Reilly Media CEO Tim O’Reilly, “you can’t really have a bad business”

I can’t find that quote online. Maybe it’s on some obscure corner of the web that GPT-3 knows about but Google doesn’t. But this is a massive no-no in any writing profession. Not only is there no citation, but it looks like this is just plain made-up. This is unethical. It can get a human fired.

Backing stuff up is a core component of good content marketing. In this example, 3 Mistakes You're Making with Month-Over-Month Growth Rates, the writer has a) got the math correct, and b) shown examples and evidence of the ideas throughout. This makes the article more authoritative and leaves the reader with the intuition and knowledge that convinces them it’s true.

The mistakes themselves are the result of human-generated insight, something GPT-3 can’t mimic.

The Article Is Missing Insight

Good content adds new information and insight to the existing canon. In our example, GPT-3 struggled with that. There is nothing in the article that is truly new. It’s designed to use the ‘rules’ of written text to create something consistent with those rules. It can’t (yet) inject new information from outside of that system.

Here is an example from the article:

GPT-3 could help you write better headlines, or maybe even stories, that might get other people to write about your content, increasing your social media presence and your backlink profile.

In an article that hadn’t yet introduced the possibility of GTP-3 writing headlines, the benefits of others writing about your content, the resulting backlinks, or the effects such exposure may have on your social media profile...well, that’s just a bunch of barely related gobbledygook.

As you read more and more GPT-3-generated content over the coming weeks, look for sentences like the one above, even paragraphs full of them. You’ll get to the end and nothing will have been added to your life. As one HackerNews commenter put it after reading a disguised piece of GPT-3 content:

This is either something written by GPT-3, or the human equivalent. Zero substantive content, pure regurgitation.

Though GPT-3 has read a lot, it can’t synthesize that information in the same way as a human can to generate new insight.

A human does this with research. They pull from multiple sources and then use their higher-level cognitive function to build the mental models needed to synthesize disparate pieces of information and weave them into something new.

The article Killing Strategy: The Disruption Of Management Consulting is a great example. It is almost 8,000 words and includes hundreds of individual pieces of information that come together to give the reader something unseen elsewhere. You can’t create such a piece of content without generating that insight in your own brain first.

There are other problems that OpenAI is very open about. For example, the internet isn’t necessarily a nice place. A lot of it can be toxic. GPT-3’s language model is based on texts that contain that toxicity, which then leaks into whatever it is generating. Even vanilla topics can end up with NSFW language.

Additionally, the text written on the web was (mostly) written by humans. And humans are big balls of biases. Any AI trained on human-generated data, or trained by humans, will incorporate these biases. This is a huge topic in AI safety, and a lot of the documentation for the GPT-3 API is solely about how to minimize toxicity and bias in your application.

You Get Back What You Put in With Humans and AI Alike

If all GPT-3 can do is write trash, why the hype?

To some extent, we let GPT-3 down in our example article. Part of the experiment was to get it to write fast. We wanted to see if you could write something good with minimal human effort. You can’t.

But if you are willing to put in the extra effort to give GPT-3 what it needs and to understand its strengths and weaknesses, you can get drastically better results and start to see the areas where it is going to be a powerful contributor to your job.

Give It a Great Prompt

The more you put into GPT-3, the more you get out—both figuratively and literally.

GPT-3 exhibits ‘few-shot learning.’ It writes better if you give it examples of what you want. It takes the context and the patterns exhibited in the prompt and uses them to narrow down its model of language to just stuff that matches, then generates text that fits that context and pattern.

Your prompt is critical. The more information you give the model in the prompt, the better the output will be. If you give it the context and three-to-five examples of what you want, you’ll get a better output than if you give it little information.

Here is an example of few-shot learning that will be easy to work into any marketing workflow.

We fed a few tweets from the Slack Twitter feed into GPT-3, along with a context for the tweets (“The following are tweets from Slack's Twitter feed based on the title of the corresponding blog posts.”), and it generated a tweet for a new blog post based on the title:

Screen Shot 2020-08-02 at 2.38.27 PM.png

Title: New emoji pack to help with your remote work balance

Tweet: In honor of International Working from Home Day (which we totally made up), we’re introducing a new emoji pack. Show the world you work from home with a ⏳emoji! bit.ly/redacted-url

Let’s break that down.

It’s used the tweet examples to understand how Slack writes tweets and tried to match that pattern. It understood some defining features of Slack’s Twitter. GPT-3 added an emoji, added a bit.ly link (which we've redacted), and even added a joke, mimicking the lighthearted nature of Slack’s communications. If that tweet popped up in Slack’s feed tomorrow, it would 100% fit.

This isn’t a one-off. It isn’t great every time, but with the right content and examples, it understands what it needs to do most of the time.

Here are some other outputs from the same prompt:

🎉 We've just released a brand new emoji pack to help you with your remote work balance. Download it now to join the fun! bit.ly/redacted-url

Slack just launched a new set of emoji. The good news is that they're really cute. The bad news is that they might make it even harder to stay on top of your remote work balance. Check out all the new emoji on our blog. bit.ly/redacted-url

Did you know that the Slack team is going remote? Here's how we're making it work, and how you can too. 💡 bit.ly/redacted-url

Have a coworker who you need to regularly check in with? Need to coordinate the next team lunch? With today's emoji pack, you can keep remote work in check and get a little bit closer to a work-life balance.

@SlackHQ is launching a new emoji pack today! Featuring new emoji like 📝, 💻, and 🚢 to help you out with your remote work balance. bit.ly/redacted-url

😍 🙏🏼💡 📖 🤔🤩🧐 ( ͡° ͜ʖ ͡°)🎉🎊🎉⬇️ bit.ly/redacted-url

💕Remote work can be a balancing act, but we're here to help. Introducing our new emoji pack, designed to help you better express your work-life balance. Learn more 👇 bit.ly/redacted-url

🤔📚📔📓🔗🔘🔚🔛🔜🔝🔞🔟🔠🔡🔢🔤🔣🔤🔥🔦🔧🔨🔩🔫🔨🔬🔮🔭🔯🔰🔱🔲🔴

🎉Check out the latest emoji pack, featuring new additions for when you're feeling 🌞☔️🌦🌧🌌🌛🌝🌚🌞🌜☀️🌤🌠🌠🌡🌡🌧🌊🌊🌙🌊🌙🌙🌚🌚🌚🌛

Notice they are not all the same, not all have a link, not all have an emoji. Some are all emojis. But even those tweets fit the pattern in an extreme kind of way.

In one, the @slackhq handle was added, even though it doesn’t appear in the examples we gave GPT-3. A GPT-3-powered tool could allow a social media manager to use just a few previous tweets that are the best representation of the brand to generate a dozen tweets (or LinkedIn/FB/Instagram messages) ready to go.

Another example is article ideas. A good article hook is the most important part of writing. Say you want to generate a few more ideas for your blog. Chuck your previous article titles into GPT-3 and see what it spits out.

Here is a proof of concept with CB Insights:

Screen Shot 2020-08-02 at 8.04.17 PM.png

The bolded titles are current titles, and the regular titles are new opportunities, such as:

  • 300+ Companies Using Machine Learning To Better Understand And Serve Customers
  • 50 Startups And Projects That Are Redefining Retail
  • Banks And FinTech Startups Are Teaming Up To Solve Financial Inclusion
  • Brand Experiences In The Age Of Customer Centricity
  • Broadband In The United States: It’s Not That Bad
  • Capital Markets On The Blockchain
  • Chinese Internet And Tech Companies Are Investing In African Fintech

These all fit within the scope of the content for CB Insights.

The titles match the patterns they use. If you are working in content at CB Insights, these give you a jumping-off point for ideation. If you don’t like these, just generate 10 more. Don’t like those, generate 10 more. And 10 more.

It’s not necessary that the titles are on point, more that you go, ‘yeah, that’s an interesting idea.’ Generate 100, and within them you're likely to find 12 functional ideas for the next quarter.

In this way, GPT-3 is extremely similar to human writers. We work from briefs or pitches or one-line ideas. But the quality of each of those is critical to the final output. If we don’t get good context for our writing, it will fall short. With GPT-3 and humans, a couple of sentences that really explain an idea are the perfect starting point for great output.

Understand Where It Excels

One way of thinking about where GPT-3 succeeds is with a 2x2 framework.

Along one axis, we have short vs. long content. Along the horizontal axis is factual vs. creative content.

GPT-3 thrives in the short, creative quadrant. The tweets and titles above are excellent examples of this. In the screenshots above, you can see that for both the Temperature is set to 0.7. The temperature of the model is its creativity. Setting it at 0.7 gives you moderately creative content. It won’t go off on a tangent but will you give something new each time.

If you set it to 0, you’ll get a deterministic answer. Every time you run the model, it will give you the same output for the same prompt. That output will be as close to ‘fact’ as the model can make it.

Set it at 1 and it will go off on flights of fancy. For example, as it has read all of Wikipedia, it should know what a berserk llama is. Let’s look at its output for the question “What is a berserk llama?” at different temperatures:

  • 0: A berserk Llama is a Llama that is angry.
  • 0.1: A berserk Llama is a Llama that is angry.
  • 0.2: A berserk Llama is a Llama that is in a state of rage.
  • 0.3: A berserk Llama is a Llama that is angry.
  • 0.4: A berserk Llama is a Llama that is very angry.
  • 0.5: A berserk Llama is a Llama that has gone berserk.
  • 0.6: A berserk Llama is a Llama that has gone berserk.
  • 0.7: A berserk Llama is a large, furry, flightless bird.
  • 0.8: A berserk Llama is a Llama that has gone insane and will attack you.
  • 0.9: A berserk Llama is a Llama that is angry and charging around destroying things.
  • 1: A berserk Llama is a Llama that won’t stay with the herd, is very aggressive in attack mode, and will hurt, or kill, anything that is in its way. It has an attitude - extreme attitude. Lydia is a berserk Llama, but she is still extremely cute, even though her horns are sharp and her hooves are very hard. I know a Llama is a herbivore, not a carnivore. It would actually eat the

Just because GPT-3 thrives at short creativity doesn’t mean it can’t do well in the other quadrants. Given the right prompt, it can do long, creative stuff. And a berserk llama is a llama that is angry, so it knows facts.

It Plays Best as Part of a Team

Our lead editor Gail has a saying:

Content is a team sport

To get an article to spec at Animalz will involve at least five people: a strategist, a researcher, a writer, an editor, and a copyeditor.

If you think you’re going to get great content out of GPT-3 with no team and no effort, you’re wrong. Producing great content is a process involving different skills and perspectives and needs people with those different skills and perspectives.

Or ‘things’ with those different skills and perspectives. GPT-3 does have a skill: it knows pretty much everything, can learn what you care about quickly given the right context, and can write really fast.

It can also generate different perspectives.

GPT-3’s most compelling use is as a creative sparring partner. Gathering half a dozen people together to brainstorm on a given topic often creates new framings and ideas that you wouldn’t create on your own—GPT-3 can do this process in an instant using 45TB of information to mesh concepts together and create new ideas.

These ideas don’t need to be perfect—they serve only as a starting point for a new writer to explore. Set the creativity high enough and give it an idea you are struggling with and see what it outputs. Even if 9 out of 10 are garbage, you can still get excited about the tenth.

An example came as we were producing our experiment article. One of our prompts spat back this unrelated-but-downright-fascinating take on user-generated content, potential fodder for a new article:

The consumer experience has changed from one of passive observation, to active engagement. Consumers have become the producers of content, whether it's content that they share, content that they create themselves, or content that they modify.

Using GPT-3 in your workflow is going to require time and effort. But it could be worth it. Used well, it will drive you on to create even better work, giving you different perspectives from its immense knowledge and then allowing you and your mental models to turn those into something worth reading for GPT-n.

GPT-3 Is Enhancement, Not Replacement

GPT-3 isn’t all that different from you. If you want to write well, you have to read well. If you want to write smart, you need the right context for your thoughts. And if you want to publish great content, you need help.

Help comes from your teammates, including an API. We’re going to be using GPT-3 at Animalz. Not to write full articles, so we can sit back and share berserk llama memes. But to give us new ideas for articles, new perspectives on old opinions, and to drive us forward.

Content marketing is always changing. First, it was just about SEO. Then it was about well-ranking but high-quality work. Now it’s about well-ranking, high-quality work that builds a community.

GPT-3 is just another move forward. GPT-4, 5, and 6 are going to give us even more opportunities, as will all the new stuff humans come up with in the meantime. You are not replaced by any of this; you are bettered by it.