agentic-ai

An LLM Flagged My Paper About LLMs Flagging Things.

LessWrong · Jun 9, 2026, 6:00 PM

To Whom it May Concern,So, I used to be a teacher, criminology, in a small wonderful town. After ten years it was time for a change, I went military. Yes, awkward, but not unrewarding. In any case, I luckily kept all of my evaluations, and some student submissions, every now and again I re-read them for nostalgia’s sake.Then the venerable GPT got invented, and it was insane, I ignored it, other than comical You Tube videos. At around the time 4o came out, I had to do what is known as a full system migration (I will not re-modify my games, at 200 mods each). I figured I would ask the chatbot. It told me, hallucination or not, I have no idea, that I was looking at 8 hours of my own time, or over a thousand dollars at a repair shop. I opt for the 8 hours, being cheap.Good old sycophantic 4o led me through the process, with significant frustrations, I might add. I found it was not useless, to my surprise. Then, much later, at a family dinner a young’un told me that she suspected her university was using some LLM to do corrections. Combined with a few YouTube videos I’ve seen with professors complaining about exactly this, I was horrified. GPT 4o, the thing that couldn’t get the commander of a world war 2 battle right, is correcting University Students (the irony that the students had 4o write their papers in turn is not lost on me). Being a nerd in the military, I said “lets do a study!”.I took an evaluation I used to use and asked myself “how did exhausted lazy me replace measures for actual metrics”, made a table, and called it an audit report. I then took said report, and the evaluation criteria, along with a completed assignment I did myself (writing an assignment for an evaluation I made myself, grade-ception?). I had a few frontier models, and a few offline ones through OpenWebUi correct the thing. I checked their outputs against the audit report, and, sure enough, the LLMs marked precisely as exhausted lazy me did. With one exception, Grok went full confabulation

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

An LLM Flagged My Paper About LLMs Flagging Things.

More in agentic-ai