How Content Teams Can Protect Themselves from ChatGPT

This is a cross-posting of my article from The Drum

“Any sufficiently advanced technology is indistinguishable from magic.”

– Arthur C Clark

ChatGPT feels like magic, and I am under its spell. But magic can be used for good or ill, depending on the application.

With the launch of ChatGPT, writers, content managers and content teams are in turmoil.

What will this technology do to my product, my workflows, my team, my job?

There are no easy answers here, because the technology is so young, and the space is so frothy.

What’s certain is that we’ll be seeing much more AI-generated text in the near future.

If you’re in the business of content production, you need a map of the landscape, and some good tools to help you along the way. We’ll aim to give you both here.

First, we’ll talk a bit about what ChatGPT is, and what that means for the words it generates.

Next we’ll show you some practical tools you can use to detect AI-generated text.

Background on How ChatGPT Works

Machine learning has inflection points where it leaps so far forward that it’s stunning. ChatGPT, recently launched by OpenAI, is the shining example.

On its face, ChatGPT is a chatbot. Give it any text prompt – a question, statement, challenge – and it will generate a response ranging from a paragraph, to a sonnet, to computer code.

Under the hood, ChatGPT is a Large Language Model (LLM). The core is a massive text dataset, a big bunch of words.

ChatGPT uses this massive text dataset to predict what the next sequence of words will be in a given context.

It’s looking at statistical patterns and relationships between words to generate text that is coherent and realistic-sounding. Chat GPT can be surprising, funny, even poignant.

But it’s just a statistical prediction engine, deciding what word best fits in the next spot. As one machine learning expert put it, LLMs are “just adding one word at a time.”

The words that it creates look and sound good together, but there’s no guarantee that the output will be factually accurate. From ChatGPT:

“Language models like ChatGPT are generally very good at generating coherent and realistic-sounding text, but they may not always produce output that is completely accurate or factually correct. It is important to carefully evaluate the output of any language model and to seek out additional sources of information to verify its accuracy.”

So ChatGPT is… a bit of a bullshitter. The words sounds great, but the factual quality is a highly variable.

Machine learning experts know this already:

And industry-leading organizations like DeepMind have already published papers on the risk landscape associated with LLMs:

“The third risk area comprises risks associated with LLMs providing false or misleading information. This includes the risk of creating less well-informed users and of eroding trust in shared information.

Misinformation can cause harm in sensitive domains, such as bad legal or medical advice. Poor or false information may also lead users to perform unethical or illegal actions that they would otherwise not have performed.

Misinformation risks stem in part from the processes by which LMs learn to represent language: the underlying statistical methods are not well-positioned to distinguish between factually correct and incorrect information.”

The problem is that these LLMs are so good now that it’s almost indistinguishable from human-generated text.

In a research study from 2021, the findings were not great:

“We find that, without training, evaluators distinguished between LLM- and human-authored text at random chance level.”

As we’ve seen over the last few years, eroding trust in shared information is a big deal!

And if you work in regulated industries, this can be a real problem with real consequences.

So what are we to do as writers and content professionals?

The Tools

The good news is that there are several free online tools you can use to detect AI-generated language.

The next time you come across come content that seems fishy, pop it into one of these tools for a quick check.

GPTZero:

Productized version for educators: http://gptzero.me/

Free web app: https://etedward-gptzero-main-zqgfwb.streamlit.app/

Writer AI detector: https://writer.com/ai-content-detector/

Hugging Face GPT2: https://openai-openai-detector.hf.space/

Crossplag: https://crossplag.com/ai-content-detector/

These tools are not perfect. But they are an effective first step in a content verification pipeline.

Content teams that are producing content at very high volume have to deal with plagiarism all the time.

Old-school plagiarism checkers like Copyscape have been part of distributed content production pipelines for decades.

As a content professional, you’ll need to plan for the flood of AI-generated text that will be coming your way.

Free tools can help you ensure that you’re catching the robot text before your client does.

How Content Teams Can Protect Themselves from ChatGPT

Background on How ChatGPT Works

The Tools

1 thought on “How Content Teams Can Protect Themselves from ChatGPT”

Leave a Comment