FilterHN

Artificial Writing and Automated Detection [pdf]

47 points

by mathattack

5 days ago

| past

| 8 comments

| nber.org

| HN

▲

foxfired

5 days ago

[-]

There was a post just a few hours ago on the frontpage asking not to use AI for writing [0]. I copied the content and pasted it on multiple "AI detection" tools. It scored from 0% and to up to 80%. This is not gonna cut it. As someone who used LLMs to "improve" my writing, after a while, no matter the prompt, you will find the exact same patterns. "Here's the kicker" or "here is the most disturbing part" those expressions and many more come up no matter how your engineer the prompt. But here's the kicker, real people also use these expressions, just at a lesser rate.

Detection is not what is going to solve the problem. We need to go back and reevaluate why we are asking students to write in the first place. And how we can still achieve the goal of teaching even when these modern tools are one click away.

[0]: https://news.ycombinator.com/item?id=45722069

▲

TomasBM

5 days ago

[-]

I see what you did there.

I think we'll still need ways to detect copy-pasted zero-shot content that's generated by LLMs, for the same reasons that teachers needed ways to detect plagiarism. Kids, students, and interns [1] "cheat" for various different reasons [2], and we want to be able to detect lazy infractions early enough so that we can correct their behavior.

This leads to three outcomes:

1. Those that never really meant to cheat will learn how to do things properly.

2. Those that cheated out of laziness will begrudgingly need to weigh their options, at which point doing things properly may be less effort.

3. Those that meant to cheat will have to invest (much) more effort, and run the risk of being kicked-out if they're rediscovered.

[1] But also employees, employers, government officials, etc.

[2] There could be some relatively benign reasons. For example, they could: not know how to quote/reference others properly; think it's OK because "everyone does it" or they don't care about subjects that involve writing; do it "just this once" out of procrastination; and similar.

▲

dpoloncsak

4 days ago

[-]

The whole argument is "A written response to an answer is no longer a valid form of testing for knowledge"

We don't need better detection. We need better ways to measure one's grasp of a concept. When calculators were integrated into education, the focus shifted from working the problem out, to using the correct formulas and using the calculator effectively. Sure, elementary classes will force you to 'show your work', but that's to build the foundation to build on, I believe.

We don't need to detect plagiarism if we're asking students verbal answers, for example

▲

TomasBM

4 days ago

[-]

Yes, and it's not a sound argument, IMO.

"Grasping concepts" is not the only learning goal in schools or universities. Many classes - including within STEM programmes - want to teach students about writing, argumentation, researching, critical analysis, dealing with feedback, etc.

Oral exams can be more stressful, depending on the student. They also don't check for the student's writing or researching ability. They can be gamed with rhetorical skills. Grading of oral exams tends to be more opaque. And so on.

Then there's the issues I explained above, where you don't want to inadvertently reward cheating. Even if you don't care about the cheaters, you should try your best to detect and reward real effort. Otherwise, it'd be stupid not to cheat and use the class for free credits, at which point, from an educational POV, it's a useless class.

So, all in all, there are still very good reasons for doing take-home written responses and essays, and good reasons for wanting to detect cheating or plagiarism.

▲

mmooss

5 days ago

[-]

One major missing piece in using AIs is self-expression. The idea of writing is to express your own ideas, to put yourself on the page; someone writing for you, AI or biological, can't do that. There are far too many nuances and subtleties.

I suspect many students write to pass the class, and AI can do that. Perhaps the problem is the incentives to write that way.

▲

binarymax

5 days ago

[-]

My two cents about this after working with some teachers: this is a cat and mouse game and you're wasting your time trying to catch students writing essays on their own time.

It is better to pivot and not care about the actual content of the essay, but instead seek alternate strategies to encourage learning - such as an oral presentation or a quiz on the knowledge. In the laziest case, just only accept hand-written output - because even if it was generated at least they retained some knowledge by copying it.

▲

nonethewiser

5 days ago

[-]

Do teachers prefer grading papers or something? This always seemed like the obvious answer and there are no shortage of complaints. There is something making papers "sticky" that I do not understand. Education needs to be agile enough to change it's assessment methods. It's getting to the point where we can't just blame LLMs anymore. Figure out how to asses learning outcomes instead of just insisting on methods that you assumed should work.

▲

burkaman

5 days ago

[-]

Oral exams and quizzes are hard for reasons unrelated to understanding the subject matter. Language barriers, public speaking anxiety, exam stress, etc. All things that students should hopefully learn how to overcome, but that's a lot to ask a teacher to deal with in addition to teaching history or whatever. With a paper, a student can choose their own working environment, choose a day and time when they are best able to focus, have a constructive discussion with the teacher if they're having trouble midway through the work, and spread their effort (if they want to) across more than an hour-long test or 5-minute oral exam. In an imaginary world where they couldn't cheat, a paper gives the teacher the best chance of evaluating whether a student understands the material.

I don't think you're wrong necessarily, but there are good reasons that teachers like papers other than "we've always used them".

▲

mmooss

5 days ago

[-]

> Oral exams and quizzes are hard for reasons unrelated to understanding the subject matter. Language barriers, public speaking anxiety, exam stress, etc

People have some different challenges writing papers and taking oral and written quizzes, but is one way or the other necessarily easier? For writing papers, think about language barriers, anxiety about writing ability, stress of writing papers, need for self-motivation and time management, etc.

▲

binarymax

5 days ago

[-]

Because, assuming it's done properly w/o cheating, it's a great learning tool. It's sometimes easy to forget that certain tasks are the way they are because they're supposed to teach. We don't structure teaching and learning around what the least painful thing is.

▲

nonethewiser

4 days ago

[-]

>Because, assuming it's done properly w/o cheating

But that's what we are solving for. So you can't assume it.

This is what I mean when I say educators need to be more agile instead of insisting on assessment methods they simply assume should work.

▲

otterley

5 days ago

[-]

How wide is the gap between “least painful thing” and “most effective thing”?

▲

NewsaHackO

5 days ago

[-]

I think the most realistic way is to do a flipped classroom, where middle-school and beyond, children are expected to be independent learners. Class time should be spent on application of skills and evaluation.

▲

globalnode

5 days ago

[-]

Why do we even grade people? Just teach the content and be done with it. Sure if a student wants to assess their knowledge to see how well they can answer questions they can do that for kicks. If industry wants well educated people, they should have supervised entrance quizes or exams, the onus is on them. This obsession with catching cheaters is out of control.

▲

TomasBM

5 days ago

[-]

If you're asking this seriously:

We need to grade people because that's the best way we have to determine (for one or more subjects) who's:

1. capable enough, so that we can promote them to the next stage;

2. improving or has potential for improvement, so that we can give them the tools or motivation to continue;

3. underperforming, so that we can find out why and help them turn it around (or reduce the pressure);

4. actually learning the content, and if not, why not.

Thankfully, everyone knows this system is flawed, so most don't put too much weight on school grades. But overall, the grades are there to provide both an incentive for teachers and students to do better, and a way to compare performance.

▲

globalnode

5 days ago

[-]

All good points, and I was sort of coming at it from the point of view of catching cheaters. ofc cheaters skew the data but theyre ultimately hurting themselves. They wont pass a companies' entrance tests or will soon find themselves unemployed if they cant do the work. Yes its a problem but I see a lot of effort being spent on trying to detect them. Is that effort proportional to the problem?

▲

laptopdev

5 days ago

[-]

If computer usage hampers a child's socialization with the group he's learning with, maybe the simplest and most meaningful solution would be preventing children enrolled in language comprehension classes from having access to computers at home particularly at core language and reasoning stages in development.

▲

vesterthacker

5 days ago

[-]

The paper Artificial Writing and Automated Detection by Brian Jabarian and Alex Imas examines the strange boundary that now divides human expression from mechanical imitation. Within their analysis one feels not only the logic of research but the deeper unease of our age, the question of whether language still belongs to those who think or only to those who simulate thought. They weigh false positives and false negatives, yet behind those terms lives an older struggle, the human desire to prove its own reality in a world of imitation.

I read their work and sense the same anxiety in myself. When I write with care, when I choose words that carry rhythm and reason, I feel suspicion rather than understanding. Readers ask whether a machine has written the text. I lower my tone, I break the structure, I remove what once gave meaning to style, only to make the words appear more human. In doing so, I betray something essential, not in the language but in myself.

The authors speak of false positives, of systems that mistake human writing for artificial output. But that error already spreads beyond algorithms. It enters conversation, education, and the smallest corners of daily life. A clear sentence now sounds inhuman; a careless one, sincere. Truth begins to look artificial, and confusion passes for honesty.

I recall the warning of Charlotte Thomson Iserbyt in The Deliberate Dumbing Down of America. She foresaw a culture that would teach obedience in place of thought. That warning now feels less like prophecy and more like description.

When people begin to distrust eloquence, when they scorn precision as vanity and mistake simplicity for virtue, they turn against their own mind. And when a society grows ashamed of clear language, it prepares its own silence. Not the silence of peace, but the silence of forgetfulness, the kind that falls when no one believes in the power of words any longer.

▲

Jacobee

5 days ago

[-]

I saw what you did:

"yet behind those terms lives an older struggle, the human desire to prove its own reality in a world of imitation."

..each paragraph ends with this corny and tiresome 50's mechanized `erudite' baloney.

--The Rod Serling Algo, aka, TTZ

▲

Legend2440

5 days ago

[-]

I suspect AI text detection has actually become easier, as chatbots today have been heavily finetuned towards a more distinctive style.

For example “delve” and the em-dash are both a result of the finetuning dataset, not the base LLM.

▲

AuthAuth

5 days ago

[-]

You are forgetting the human mind accounting for this and adding "write this like a kinda dumb high school student". I just did a little test between a copilot essay and the same prompt with "write this like a kinda dumb high school student" and it reads like an essay i would have written.

▲

bryanrasmussen

5 days ago

[-]

In the brave world of the future you too will be able to get a C- with very little effort!

▲

haffi112

5 days ago

[-]

That's where the humanizers come in. These are solutions that take LLM generated text and make it sound human written to avoid detection.

The principle of training them is quite simple. Take an LLM and reward it for revising text so that it doesn't get detected. Reinforcement learning takes care of the rest for you.

▲

rawgabbit

5 days ago

[-]

Wow. Never heard of Pangram until now. Quote:

     Pangram maintains near-perfect accuracy across long and medium length texts. It achieves very low error rates even on shorter passages and ‘stubs.’

▲

alfalfasprout

5 days ago

[-]

I'm extremely skeptical of these claims. Especially when we're dealing with careful prompting to adjust tone/style.

▲

haffi112

5 days ago

[-]

Even if it was close to being near perfect, that is still not enough due to the negative impact of false positive detections on students.

▲

zingababba

5 days ago

[-]

Mmmm yes, I probably will never be able to find it again but someone recently tested a lot of these out and found you could bypass them easily by changing a few words around.

▲

Mkengin

5 days ago

[-]

Or use RL to beat any AI detectors: https://reddit.com/r/LocalLLaMA/comments/1lnrd1t/you_can_jus...

▲

wrp

3 days ago

[-]

As a teacher, I have closely watched the effect of LLM use on student writing and attempts by colleagues to use automated detection. I plead with you, DO NOT USE AUTOMATED DETECTION.

I hate AI slop and I fight against it in my work, but as that style of writing becomes increasingly prevalent, students are unconsciously adopting it for their base writing style. Automated detection of LLM writing never worked well, and now LLM and human writing have converged so much in style that machine detectors are worthless.

Our response should be to refuse to accept slop, whether produced by human or machine. I strive to point out the stylistic details of slop and how to avoid or edit them away.

▲

andy99

5 days ago

[-]

While it’s interesting work, so far my experience is that AI isn’t good enough (or most people aren’t good enough with AI) for detection to really be a concern, at least in “research” or any writing over a few sentences.

If you think about the 2x2 of “Good” vs “By AI”, you only really care about the case when something it good work that an AI did, and then only when catching cheaters, as opposed to deriving some utility.

If it’s bad, who cares if it’s AI or not, and most AI is pretty obvious thoughtless slop, and most people that use it aren’t paying attention to mask that, so I guess what I’m saying is for most cases one could just set a quality bar and see if the work passes.

I think maybe a difference AI brings is that in many cases people don’t really know how to understand or judge the quality of what they are reading, or are to lazy to, so have substituted as proxies for quality the same structural cues that AI now uses. So if you’re used to saying “it’s well formatted, lots of bulleted lists, no spelling mistakes, good use of adjectives, must be good”, now you have to actually read it and think about it to know.

▲

vages

5 days ago

[-]

I personally would value a spam filter that filters out AI generated content.

▲

xtiansimon

4 days ago

[-]

Slightly off-topic. A client uses Stripe for a small business website. We got an automated email saying a transaction was flagged as potentially fraudulent. We should investigate and possibly refund before a chargeback occurs. What? Is this a stolen card or what?

So I inquired with the chatbot and they list possible causes of a flagged transaction could be stolen card, as well as a few other examples which amount to a mix of service issues which are customer-determined. But the bot says it’s definitely not a chargeback. What?

So now I contact support. They say it’s a flag from the credit card issuing bank. Wait. What? Is this a fraudulent stolen card or not? Still no. It’s just a warning based on pattern usage. Why you passing this slop to my client? If there is a pattern problem, the flag should go to the customer who authorizes the charge. Otherwise it’s a chargeback or known stolen card.

They say, well, you can contact the customer. What? If the pattern is actually a stolen card, which is listed as a possible cause of the flag while not saying it is or isn’t, then they can just lie!

Which is a lot to say this pattern matching for fraud or negative patterns suffers from idiocy, even in the simplest of contexts.