Architecture of Ghostbuster, a new state-of-the-art method for detecting AI-generated text.
This was a real problem because large-scale language models like ChatGPT write incredibly well. Students began using these models to ghostwrite assignments, and some schools banned ChatGPT. These models also tend to produce text with factual errors, so cautious readers may want to know whether generative AI tools have been used to ghostwrite news articles or other sources before trusting them.
What can teachers and consumers do? Existing tools for detecting AI-generated text sometimes do not perform well on data that is different from what they were trained on. Additionally, if these models misclassify real human writing as AI-generated, they could endanger students whose real work is called into question.
Our recent paper introduces Ghostbuster, a state-of-the-art method for detecting AI-generated text. Ghostbuster works by finding the probability of generating each token in a document across multiple weak language models and then combining a function based on these probabilities as input to the final classifier. Ghostbuster doesn’t need to know what model was used to generate the document, nor does it need to know the probability that the document was generated from that particular model. This property makes Ghostbuster particularly useful for detecting text potentially generated by unknown models or black-box models such as ChatGPT and Claude, which are popular commercial models where probabilities are not available. We are particularly interested in ensuring that Ghostbuster generalizes well, so we evaluated a variety of ways in which text can be generated, including across different domains (using newly collected essay, news, and story datasets), language models, or prompts.
Examples of human-written and AI-generated text from our dataset.
Why is this approach necessary?
Many current AI-generated text detection systems are weak at classifying different types of text (e.g., different writing styles, different text generation models, or prompts). Simple models that only use perplexity are generally unable to capture more complex features and perform particularly poorly in novel writing domains. In fact, we found that the difficulty-only criterion performed worse than the random criterion in some areas, including data for non-native English speakers. Meanwhile, classifiers based on large-scale language models such as RoBERTa easily capture complex features, but they overfit the training data and have poor generalization. We found that the RoBERTa baseline had the worst generalization performance in the worst case, sometimes worse than the confusion-only baseline. Zero-shot methods, which classify text without training on labeled data by calculating the probability that the text was generated by a particular model, tend not to work well when using different models to actually generate the text.
How Ghostbusters Work
Ghostbuster uses a three-step training process: probability calculation, feature selection, and classifier training.
probability calculation: We converted each document into a set of vectors by calculating the probability of each word in the document being generated by a set of weak language models (a unigram model, a trigram model, and two non-command adjusted GPT-3 models, ada). and da Vinci).
Feature Selection: We use a structured search procedure to (1) define a set of vector and scalar operations that combine probabilities, and (2) search for useful combinations of these operations using forward feature selection and iteratively add the best results. selected. Remaining features.
Classifier training: We trained a linear classifier on the best probability-based features and some additional manually selected features.
result
When trained and tested on the same domain, Ghostbuster achieved 99.0 F1 on all three datasets, outperforming GPTZero by 5.9 F1 and DetectGPT by 41.6 F1. Outside the domain, Ghostbuster achieved an average of 97.0 F1 across all conditions, outperforming DetectGPT by 39.6 F1 and GPTZero by 7.5 F1. The RoBERTa baseline achieved 98.1 F1 in the within-domain evaluation across all datasets, but generalization performance was inconsistent. Ghostbuster outperformed the RoBERTa baseline in all domains except out-of-domain creative writing, and on average had significantly better out-of-domain performance than RoBERTa (13.8 F1 margin).
Results for Ghostbuster’s in- and out-of-domain performance.
To ensure that Ghostbuster is robust to the different ways users prompt the model, such as requesting different writing styles or reading levels, we evaluated Ghostbuster’s robustness to multiple prompt variants. Ghostbuster outperformed all other approaches tested for these prompt variants using 99.5 F1. To test generalization across models, we evaluated performance on Claude-generated text, where Ghostbuster outperformed all other approaches tested with 92.2 F1.
The AI-generated text detector was tricked by lightly editing the generated text. We examined Ghostbuster’s robustness against edits such as changing sentences or paragraphs, reordering letters, and replacing words with synonyms. Most changes at the sentence or paragraph level did not have a significant impact on performance, but when editing text through repeated paraphrasing, using off-the-shelf detection evasion tools such as undetectable AI, or making numerous changes at the word or character level. Performance degraded smoothly. Performance was best even on long documents.
Because AI-generated text detectors may misclassify text from non-native English speakers as AI-generated text, we evaluated Ghostbuster’s performance on writing from non-native English speakers. All models tested achieved greater than 95% accuracy on two of the three data sets tested, but their accuracy was worse on the third set of short essays. However, document length may be the main factor here, as Ghostbuster performs about as well on these documents (74.7 F1) as it does on other out-of-domain documents of similar length (75.6 to 93.1 F1).
Users who want to apply Ghostbuster to real-world cases where the potential use of text generation (e.g., a student essay written with ChatGPT) is not limited, will want to use short texts, domains learned in Ghostbuster (e.g., different varieties of English), and English for which English is not their first language. Text created by a user, a human-edited model generation, or by guiding an AI model to modify human-written input. To avoid perpetuating algorithmic harm, we strongly discourage automatic punishment of suspected use of text generation without human supervision. Instead, we recommend cautious use of Ghostbuster directly by humans when labeling someone’s writing as AI-generated could cause harm. Ghostbuster can also help with a variety of low-risk applications, such as filtering AI-generated text from language model training data and verifying whether online information sources are AI-generated.
conclusion
Ghostbuster is a state-of-the-art AI-generated text detection model with 99.0 F1 performance across tested domains, representing a significant advance over existing models. It generalizes well across a variety of domains, prompts, and models, and is well suited for identifying text from black boxes or unknown models because it does not require access to the probabilities of the specific model used to generate the document.
Future directions for Ghostbuster include providing explanations for model decisions and improving robustness, especially against attacks that seek to spoof detectors. AI-generated text detection approaches can also be used in conjunction with alternatives such as watermarking. We also hope that Ghostbuster will help with a variety of applications, including filtering language model training data and displaying AI-generated content on the web.
Try Ghostbuster here: ghostbuster.app
Learn more about Ghostbuster here. [ paper ] [ code ]
Guess if the text was directly generated by AI here: ghostbuster.app/experiment