LLM Writing Tropes.md
100 points
5 hours ago
| 22 comments
| tropes.fyi
| HN
Jordan-117
3 hours ago
[-]
Wikipedia also has an exhaustive guide, though it's not fun finding tropes you use yourself (I'm very guilty of the false range "from X to Y" thing):

https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

Another one that seems impossible for LLMs to avoid: breaking article into a title and a subtitle, separated by a colon. Even if you explicitly tell it not to, it'll do it.

reply
o_____________o
4 minutes ago
[-]
reply
malfist
1 hour ago
[-]
Thats the thing about AI writing though. Those tropes are things humans do too. But like once or twice in an article. Not every single freaking paragraph
reply
capnrefsmmat
3 hours ago
[-]
I work on research studying LLM writing styles, so I am going to have to steal this. I've seen plenty of lists of LLM style features, but this is the first one I noticed that mentions "tapestry", which we found is GPT-4o's second-most-overused word (after "camaraderie", for some reason).[1] We used a set of grammatical features in our initial style comparisons (like present participles, which GPT-4o loved so much that they were a pretty accurate classifier on their own), but it shouldn't be too hard to pattern-match some of these other features and quantify them.

If anyone who works on LLMs is reading, a question: When we've tried base models (no instruction tuning/RLHF, just text completion), they show far fewer stylistic anomalies like this. So it's not that the training data is weird. It's something in instruction-tuning that's doing it. Do you ask the human raters to evaluate style? Is there a rubric? Why is the instruction tuning pushing such a noticeable style shift?

[1] https://www.pnas.org/doi/10.1073/pnas.2422455122, preprint at https://arxiv.org/abs/2410.16107. Working on extending this to more recent models and other grammatical features now

reply
albert_e
14 minutes ago
[-]
There is an organization named Tapestry (parent of Coach Inc).

Wonder how they can avoid the trop while not censoring themselves out.

reply
djoldman
2 hours ago
[-]
The RLHF is what creates these anomalies. See delve from kenya and nigeria.

Interestingly, because perplexity is the optimization objective, the pretrained models should reflect the least surprising outputs of all.

reply
networked
2 hours ago
[-]
You may be interested in my links on AI's writing style: https://dbohdan.com/ai-writing-style. I've just added your preprint and tropes.fyi. It has "hydrogen jukeboxes: on the crammed poetics of 'creative writing' LLMs" by nostalgebraist (https://www.tumblr.com/nostalgebraist/778041178124926976/hyd...), which features an example with "tapestry".

> Why is the instruction tuning pushing such a noticeable style shift?

Gwern Branwen has been covering this: https://gwern.net/doc/reinforcement-learning/preference-lear....

reply
joshvm
3 hours ago
[-]
No mention of Claude/ChatGPT's favourite new word genuine and friends? They also like using real and honest when giving advice. Far as I can tell this is a new-ish change.

> Honestly? We should address X first. It's a genuine issue and we've found a real bug here.

Honorable mention: "no <thing you told me not to do>". I guess this helps reassure adherence to the prompt? I see that one all the time in vibe coded PRs.

reply
pinum
2 hours ago
[-]
Similarly, "X that actually works"
reply
layer8
1 hour ago
[-]
...and half of the time still doesn't do what you want.
reply
CharlesW
3 hours ago
[-]
reply
awakeasleep
1 hour ago
[-]
If this bugs you, open chatGPT personality settings, choose “efficient” base style, and turn off the enthusiasm and warmth sliders

It makes a tremendous difference. Almost everything on this list is the emotional fluff ChatGPT injects to simulate a personality.

reply
esperent
42 minutes ago
[-]
Nobody here is still using ChatGPT, hopefully. They are the company of using automated drone warframe that results in schools getting bombed.

I'm sure that other companies are guilty of this too but OpenAI is the one that's loudly and dramatically stepping forward and saying "we support this" and if you continue using them after learning that, you are also saying that you support this.

reply
veqq
42 minutes ago
[-]
reply
mvkel
4 hours ago
[-]
Weirdly, LLMs seem to break with these instructions. They simply ignore them, almost as if the pretraining/RL weights are so heavy, no amount of system prompting can override it
reply
RandomWorker
4 hours ago
[-]
It's a beauty. We can easily detect the issues with Youtubers that generate scripts from this tool. I've noticed these tropes, after 30 seconds, remove, block, and do not recommend any further. I hope to train the algorithm to detect AI scripts and stop recommending me those videos. It's honestly turned me off from YouTube so much, or I find myself going to my "subscribed" tab and going to content creators that still believe in the craft.
reply
antinomicus
2 hours ago
[-]
I’ve taken it one step further. YouTube as a front end is awful, and I’ve had enough. Tons of little dark patterns made to keep you on the site, annoying algorithms taking you places you never want to go, shitty ai slop, the whole nine yards. But I still like certain channels. As a result I’m doing everything self hosted now - not just YouTube but literally every single piece of digital media I consume. For YouTube I had to create a rotating pool of 5 residential ISP proxies - replaced as soon as YouTube download bot restrictions kick in - and rotated weekly either way.

With this I am able to get all my favorite subs onto my actual hard drive, with some extra awesome features as a result: I vibe coded a little helper app that lets me query the transcript of the video and ask questions about what they say, using cheap haiku queries. I can also get my subs onto my jellyfin server and be able to view it in there on any device. Even comments get downloaded.

All these streamers have gone too far trying to maximize engagement and have broken the social contract, so I see this as totally fair game.

reply
esperent
40 minutes ago
[-]
I assume it'll work more as a review pass rather than expecting good results outright. For all kinds of things like this where I feel like I'm fighting the LLM, doing the initial work then auditing it seems to be the best approach (the other one is writing all kinds of tests, LLMs including Opus 4.6 love to fudge tests just as much as they love telling you how insightful you are).
reply
duskwuff
1 hour ago
[-]
IIRC, it's well documented that negative instructions tend to be ineffective - possibly through some sort of LLM analogue to the "pink elephant paradox", or simply because the language models are unable to recognize clichés until they've already been generated.
reply
esperent
35 minutes ago
[-]
That was definitely true with early LLMs but I don't know if that's still the case. Certainly not as strong as it used to be. I think now most negative instructions are followed quite well but there's still a few things that must be deeply embedded from pretaining that are harder to avoid - these specific annoying phrasings, for example.
reply
carleverett
4 hours ago
[-]
"The "It's not X -- it's Y" pattern, often with an em dash. The single most commonly identified AI writing tell. Man I f*cking hate it. AI uses this to create false profundity by framing everything as a surprising reframe. One in a piece can be effective; ten in a blog post is a genuine insult to the reader. Before LLMs, people simply did not write like this at scale."

This one hit home... the first time I ever saw Claude do it I really liked it. It's amazing how quickly it became the #1 most aggravating thing it does just through sheer overuse. And of course now it's rampant in writing everywhere.

reply
bitwize
3 hours ago
[-]
If you sound like a car ad from Road & Track, I'm going to flag you as bot.

"No rough handling. No struggles to accelerate. Just pure performance. The new Toyota GT. It's not just a car—it's a revolution."

Most of the tropes listed on this page give text a more "car ad" (or sometimes "movie trailer") quality. I wonder if magazine scans and press releases unduly weighted the training set.

reply
Retr0id
2 hours ago
[-]
I think it's more likely that car ads and chatbots are both optimizing for the same thing i.e. grabbing the audience's attention.
reply
nh23423fefe
3 hours ago
[-]
Weird to care about a harmless construction along with punctuation.
reply
mapmeld
3 hours ago
[-]
If you participate in certain online communities where posts used to generally share real ideas and ask real beginner questions, you get tired of it. I am especially tired of seeing "it's not X - it's Y" on /r/MachineLearning posts, claiming that they've found some "geometry" or basic PyTorch code which they think will solve AI hallucinations. And it's becoming clear these people are not just doing this sort of a thing on a whim, but spending days in delusional conversations with the AI.
reply
ashivkum
3 hours ago
[-]
weirder still to immerse your brain in sewage and take pride in your lack of discernment.
reply
FartyMcFarter
3 hours ago
[-]
The article has been slashdotted so I don't know if this one is in there but:

One I've seen Gemini using a lot is the "I'll shoot straight with you" preamble (or similar phrasing), when it's about to tell me it can't answer the question.

reply
1970-01-01
3 hours ago
[-]
What we really need is a browser plugin underlining these patterns, especially for comments.
reply
layer8
1 hour ago
[-]
As the article points out at the end, these aren't bad per se. The issue is that LLMs overuse them, and we're all getting the same(-ish) LLM. It's not so different from how people sometimes have their idiosyncratic phrasings they use all the time.
reply
layer8
1 hour ago
[-]
reply
bryanrasmussen
2 hours ago
[-]
I sort of think the whole middlebrow angst thread about Bourdieu going on right now applies to LLM writing

https://news.ycombinator.com/item?id=47260028

reply
bryanrasmussen
2 hours ago
[-]
This makes me think of the attractiveness of overly bad writing to writers, as a challenge, the most obvious example being the bulwer-lytton award, or the instinctive ignoring of instructions from fiction magazines that might say "we don't want any stories about murderous grandparents, French bashing, bestiality, bank robbers from the future, or kind-hearted Nazis - and especially do not try to be super brilliant and funny and send us your story about kind-hearted Nazi bank-robbing french-bashing grandparents that like killing people and having sexy fun times with barnyard animals! Because every original thinker like you thinks they are the first to have come up with that idea!" and then as a writer you feel challenged to do exactly what they say they don't want because what a glorious triumph if you manage to outdo everyone and get your dreck published because it's dreck that is so bad it's good!

It does not seem like there are lots of people who are perversely inclined to write a story with all these tropes and words in it, but surely there must be some, because if you make something that beats the LLM (by being creatively good) using all the crap the LLM uses, it would seem some sort of John Henry triumph (discounting the final end of John Henry of course, which is a real downer)

reply
xgulfie
2 hours ago
[-]
If only we could fix how it writes like garbage
reply
netsec_burn
2 hours ago
[-]
Another trope: longer README.md's than anyone would make, or want.
reply
NewsaHackO
2 hours ago
[-]
Yes, to me this is a huge tell. Especially when it goes into detail about pros and cons (using a table) on the most superficial points.
reply
roywiggins
2 hours ago
[-]
Don't forget "The Ludlum Delusion"- every header in an article or readme reads like a Robert Ludlum novel title, ie "The [Noun:0.9|Adjective:0.1] [Noun]".
reply
charlieflowers
2 hours ago
[-]
This list reads like, "AIs are not your typical braindead person on the street. They actually use a decent but not crazily advanced vocabulary."

I mean, "tapestry" is a great word for something that is interconnected. Why not use it?

reply
tiahura
2 hours ago
[-]
Many of these are standard fare in legal writing.

Negative parallelism is a staple of briefs. "This case is not about free speech. It is about fraud." It does real work when you're contesting the other side's framing.

Tricolons and anaphora are used as persuasion techniques for closing arguments and appellate briefs.

Short punchy fragments help in persuasive briefs where judges are skimming. "The statute is unambiguous."

As with the em dash - let's not throw the baby out with the bath water.

reply
bitwize
3 hours ago
[-]
You know how no one ever wrote their own software and then generative AI came along and suddenly we could have app meals home-cooked by barefoot developers? (The use of such cottagecore terminology for a process that requires being an ongoing client of a hundred-gigabuck, planet-burning megacorporation rubs me in many wrong ways.)

If AI finally gets rid of the thing that drove me nuts for years: "leverage" as a verb mean roughly "to use"—when no human intervention seems to work, then I shall be over-the-moon happy. I once worked at a place where this particular word was lever—er, used all the damn time and I'd never encountered something so NPC-ish. I felt like I was on The Twilight Zone. I could've told you way back then that you sounded like a bot doing that, now people might actually believe me and thank god.

I will stick by the em dashes however. And I might just start using arrows too. Compose - > → right arrow. Not even difficult.

reply
cyanydeez
3 hours ago
[-]
This kills the headline baiting tech blogger.
reply
agnishom
2 hours ago
[-]
> (let's play cat and mouse!).

No thanks, I hate this large scale social experiment

reply