It Is a Mistake to Use Chat-GPT to Evaluate Texts
Setting Up the Experiment
I conducted an experiment with Chat-GPT Pro today. I took an incomplete manuscript for a book (approximately 100 pages, double-spaced), and uploaded the *.docx file to Chat-GPT. Then, I gave it simple instructions: give me a chapter-by-chapter feedback and analysis.
The Results
Chat-GPT began strong with a succinct summaries of the first chapters along with some basic analysis. However, after Chapter 10, Chat-GPT transitioned to more overarching themes and wrote some basic areas for improvement. So far, so good.
I assumed that she stopped after 10 chapters due to some constraint on response-length. That’s understandable. It’s not free to run, and each of those words comes with a cost. I gave it the next query to skirt around that issue: do the same for Chapters 11-20.
That’s where things got weird.
Giving Feedback on Text that Does Not Exist
There’s a chapter in the book titled “A Visitor from Another Dimension”. It is entirely blank. Here’s the feedback:
Love, loss, and addiction happen in other chapters that I’ve fed individually to Chat-GPT, but that was months ago.
Another chapter is titled The Buchanans. It is also entirely blank at the moment.
Ignoring the Text and Giving Feedback on Whatever Chat-GPT Wanted Instead.
Chapter 15: Hell is supposed to be a lighthearted chapter that talks about being high in The Container Store. It is an unserious chapter meant to break up the heavier events. Here is the beginning of the chapter, for reference. The tone does not change, nor does anything bad happen at the store.
The Implications
Chat-GPT Does Not Read Everything All the Way Through
It seems Chat-GPT stopped reading after about ten pages after it got the “feel” of the manuscript. This doesn’t seem that bad when we’re evaluating a manuscript, but imagine if you wanted it to analyze a data file. Would you want the machine to stop reading after a certain number of rows of data? Or what if you needed Chat-GPT to summarize some legal documents for you. Do you think getting a vague “feel” for a legal text would be sufficient?
Chat-GPT’s Feedback is Generic at Best, Completely Made-Up at Worst
The feedback the machine provides could be applied to nearly any piece of writing. “Pacing” is nebulous and you could interpret it to fit whatever you thought the piece needed in terms of pacing. Making characters distinct is also something that sounds plausible, but really could be applied to any piece of writing with characters.
Leave a Reply
You must be logged in to post a comment.