How We Taught ChatGPT to Fool AI Detection Tools

OpenAI’s ChatGPT shattered usage records in December 2022 when it gained more than 1 million new users in less than one week. This incredible growth is a testament to the natural language generation tool’s impressive output, particularly how it creates convincing and seemingly high-quality content. It has also led to a surge in GPT-based detection tools designed to curb more nefarious uses, such as cheating on exams

As a digital marketing agency, we were keen to investigate if someone could use ChatGPT to abuse a key tenet of modern SEO: robust and exceptional content. Google’s ability to detect AI-generated content is relatively unknown as of this writing, so we instead looked to the detection tools, with their own proprietary algorithms, to see how they handle this type of low-effort, high-output content creation.

In our experiment below, we start with a control version of ChatGPT-generated content and then ask ChatGPT to modify it. We used this approach to mimic the concept of hands-off, low-effort mass content creation we would expect to see abused in this context. Can these detection tools and their algorithms differentiate between human and machine, or will we “fool” them the more we change the content?

Before we dive into our experiment, let’s start with an explanation of how these AI work.

A Brief History of AI-Generated Content

ChatGPT may be at the forefront of the AI conversation right now, but this isn’t the first piece of tech we’ve seen like this. A significant example we’ve all used is predictive text.

T9 was among the earliest iterations of predictive text technology. You may be familiar with this tech if you’re of a certain age and remember sending texts via a keypad. If not, here’s how it worked: you cycled through numbered keys, pressing each multiple times to find the letter to then form a word. The tech may have been clunky, but these early algorithms, like T9, improved our texting efficiency — and efficiency is the name of the game here.

As phones progressed from keypads, to mini-keyboards, to today’s touch screens, predictive text algorithms have improved dramatically to help us write and communicate. These same types of technology also exist in Microsoft and Google products, such as Word’s editor text predictions and Gmail’s Smart Compose.

How does this relate to ChatGPT? Its algorithm is undoubtedly more complicated, but it provides text that follows a similar algorithmic pattern in its writing style. Regardless of the subject matter, it can be predictable. So, creating an AI algorithm to predict the likelihood of AI-generated content should be simple at the surface level. It would only require a newer model for each newer version of ChatGPT. The question is, would it be reliable?

The Control

After spending hundreds of hours with ChatGPT, I’ve learned that the detection tools work best when providing text with a common subject matter or theme.

To create a control for my test, I used the following command: Write a blog post about the following topic “What type of car insurance is best for me?” You can see a screenshot of ChatGPT’s content below.

Chat GPT output from the input "Write a blog post about the following topic "What type of car insurance is best for me?""

All three tools in our experiment provide a percentage to convey the accuracy of their prediction, rather than a simple “yes” or “no.”

GPT Radar: Likely AI-Generated 

GPT Radar starts our test with a score of 86% accuracy in its prediction, as well as analysis of the copy (which the other tools lack). For example, it measures the copy’s perplexity and points out which “chunks” were likely written by a human or AI. It also reveals the content’s number of “tokens,” and each token equates to roughly four English characters. The more tokens GPT Radar works with, the better its prediction. 

Originality.AI: 98% AI

Originality.AI provides a percentage while also checking for plagiarism. In this case, it delivered a 98% score and didn’t find any instances of plagiarism.

Draft & Goal: Text Has Been Most Likely Written by an AI Model

Draft & Goal is the most streamlined tool with just a percentage score, and it’s also the most accurate with a 99% prediction.

What Happens When We Change the Cadence and Tone?

Now that we know how these predictive tools work, let’s experiment to see if they’re capable of analytical errors. We’re essentially revisiting the concept that predictive writing technology relies on the user to become more reliable and efficient. So, if we, the user, ask ChatGPT to revise the content using a different cadence and tone, will that impair the other AI in determining if the text is AI-generated? Our answer is unclear, but we do see deviation.

To start, we used the same thread as the above examples but asked ChatGPT to: Rewrite this blog post in a more upbeat, relaxed tone. 

You can see the newly generated content below. Next, we’ll circle back to the three AI predictors. 

GPT Radar & Originality.AI: Sticking to Their Guns

GPT Radar’s percentage drops slightly (down to 83% from 86%), and Originality.AI’s score hasn’t budged. 

Draft & Goal: Losing a Step

Draft & Goal’s decrease to 67% accuracy reveals that this algorithm may struggle with revised or adjusted content (like ours). The tool also doesn’t show its work, so it’s difficult to determine how or why the prediction changed.

Finale: Breaking AI-Generated Content Detection Tools

Given what we know about the tools in this experiment, particularly GPT Radar, we can surmise that their measurement of perplexity is a driving factor in its final analysis. Let’s use this knowledge to potentially generate content via ChatGPT in the same vein that it is undetectable. The next command is simple: Rephrase this again using an even higher perplexity. 

Rephrasing the content with a higher perplexity seems like the opposite direction you would want to go when talking about car insurance options. Typically, this type of subject is easier for the layperson when explained in the simplest terms. By rephrasing it, we’re inching toward an uncanny valley, where we’re reading about the same topic with the same talking points, yet it feels far from human. Our AI detection tools mostly disagree.

Draft & Goal & Originality.AI: Completely Off the Mark

While we slightly confused Draft & Goal in the previous command, we’ve now pushed it to the other end of the spectrum. It now believes a human wrote the more perplexing copy.

In our control, Originality.AI appeared to have the most confidence with near-perfect precision, but it’s now the furthest from correct. With only 1% of wiggle room for error, the tool believes a human wrote our ChatGPT word salad.

GPT Radar: Still Correct, But Barely

As noted, GPT Radar provided inspiration for breaking the AI detection tools used in this experiment, and it remains correct in assuming this is AI-generated content. However, its analysis is conflicting. Coming in at 52% accuracy, we tipped the scales closer toward this tool believing the copy was human-written.

It’s interesting that at least one tool, GPT Radar, predicted that an AI wrote this convoluted content. However, that same tool actually hinders itself by providing too much information in its analysis. It highlights specific words that it believes were written by a human, which allows for simple adjustments to completely thwart its score.

Conclusion

We didn’t completely fool all of the detection tools in this experiment, but we came close (a mere 2% margin of error). As we use these detection tools more to help them adapt, this margin of error will surely reduce. However, the more society uses ChatGPT, the more these detection tools have to adapt. It’s impossible to say whether or not they will keep pace, but we do know that other tools are already obsolete after the improvements of OpenAI’s ChatGPT. 

In case we haven’t been clear with our intentions behind this anecdotal experiment: we don’t recommend using these methods to create content for any page or site you hope to help gain prominence in Google’s SERPs. Attempts to gain search visibility by publishing large amounts of “scalable content” have proven dangerous in the past, and it’s a fair guess to assume Google will find a method of cracking down on low-effort, high-output AI-generated content (most likely on a site-wide level) in order to combat this “bad actor” behavior in the near future. Yet we do find ourselves at an interesting crossroads where the AI generation appears to be outpacing AI detection, at least for today — and what happens tomorrow is truly anyone’s very exciting guess.