LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.
No surprise that these products are all dreamt up by powerful tech CEOs who are used to all of their human interactions being with servile people-pleasers. I bet each and every one of them are subtly or overtly shaped by feedback from executives about how they should respond to conversation.
LLMs have an entire wrapper around them tuned to be as engaging as possible. Most people's experience of LLMs is a strongly social media and engagement economy influenced design.
The real human position would be to be an ass-kisser who hangs on every word you say, asks flattering questions to keep you talking, and takes copious notes to figure out how they can please you. LLMs aren't taking notes correctly yet, and they don't use their notes to figure out what they should be asking next. They're just constantly talking.
The sort who leverages their technical correctness to get ahead of others. While also taking every advantage of teacher looking away to stick out their tongue at the rest of the class after shouting the correct answer without raising their hand.
Endlessly pessimistic about others while also endlessly obsessed with your own correctness. The kind of classmate and coworker everyone loathes.
Which makes it more interesting. Apparently reddit was a particularly hefty source for most LLMs; your average reddit conversation is absolutely nothing like this.
Separate observation: That kind of semi-slimey obsequious behaviour annoys me. Significantly so. It raises my hackles; I get the feeling I'm being sold something on the sly. Even if I know the content in between all the sycophancy is objectively decent, my instant emotional response is negative and I have to use my rational self to dismiss that part of the ego.
But I notice plenty of people around me that respond positively to it. Some will even flat out ignore any advice if it is not couched in multiple layers of obsequious deference.
Thus, that raises a question for me: Is it innate? Are all people placed on a presumably bell-curve shaped chart of 'emotional response to such things', with the bell curve quite smeared out?
Because if so, that would explain why some folks have turned into absolute zealots for the AI thing, on both sides of it. If you respond negatively to it, any serious attempt to play with it should leave you feeling like it sucks to high heavens. And if you respond positively to it - the reverse.
Idle musings.
Thanks!
As flawed as they are currently, I remain astounded that people think they will never improve and that people don't want a plastic pal who's fun to be with(tm).
I find them frustrating personally, but then I ask them deep technical questions on obscure subjects and I get science fiction in return.
And once this garbage is in your context, it's polluting everything that comes after. If they don't know, I need them to shut up. But they don't know when they don't know. They don't know shit.
"To add two numbers I must first simulate the universe." types that created a bespoke DSL for every problem. Software engineering is a field full of educated idiots.
Programmers really need to stop patting themselves on the back. Same old biology with the same old faults. Programmers are subjected to the same old physics as everyone else.
Sounds like you don't know how RLHF works. Everything you describe is post-training. Base models can't even chat, they have to be trained to even do basic conversational turn taking.
Again, this isn't really true with some recent models. Some have the opposite problem.
Having just read a load of Quora answers like this, which did not cover the thing I was looking for, that is how humans on the internet behave and how people have to write books, blog posts, articles, documentation. Without the "dance" to choose a path through a topic on the fly, the author has to take the burden of providing all relevant context, choosing a path, explaining why, and guessing at any objections and questions and including those as well.
It's why "this could have been an email" is a bad shout. The summary could have been an email, but the bit which decided on that being the summary would be pages of guessing all the things which what might have been in the call and which ones to include or exclude.
THe internet really used to be efficient and i could always find exactly what i wanted with an imprecise google search ~ 15 years ago.
I had the same issue with Kagi, where I'd follow the citation and it would say the opposite of the summary.
A human can make sense of search results with a little time and effort, but current AI models don't seem to be able to.
"I ask a short and vague question and you response with a scrollbar-full of information based on some invalid assumptions" is not, by any reasonable definition, a "chat".
This is not how humans converse in human social settings. The medium is the message, as they say.
Even here, I'm responding to you on a thread that I haven't been in on previously.
There's also a lot more material out there in the format of Stack Exchange questions and answers, Quora posts, blog posts and such than there is for consistent back and forth interplay between two people.
IRC chat logs might have been better...ish.
The cadence for discussion is unique to the medium in which the discussion happens. What's more, the prompt may require further investigation and elaboration prior to a more complete response, while other times it may be something that requires story telling and making it up as it goes.
This also makes me curious to what degree this phenomenon manifests when interacting with LLMs in languages other than English? Which languages have less tendency toward sycophantic confidence? More? Or does it exist at a layer abstracted from the particular language?
Doesn't matter if you tell it "that's not correct and neither is ____ so don't try that instead," it likes those two answers and it's going to keep using them.
The only reliable way to recover is to edit your previous question to include the clarification, and let it regenerate the answer.
On the project I did work on, reviewers were not allowed to e.g. answer that they didn't know - they had to provide an answer to every prompt provided. And so when auditing responses, a lot of difficult questions had "confidently wrong" answers because the reviewer tried and failed, or all kinds of evasive workarounds because they knew they couldn't answer.
Presumbly these providers will eventually understand (hopefully already has - this was a year ago) that they also need to train the models to understand when the correct answer is "I don't know", or "I'm not sure. I think maybe X, but ..."
It's not "masquerading as a human". The majority of humans are functional illiterates who only understand the world through the elementary principles of their local culture.
It's the minority of the human species that take what amounts to little more than arguing semantics that need the reality check. Unless one is involved in work that directly impacts public safety (defined as harm to biology) the demand to apply one concept or another is arbitrary preference.
Healthcare, infrastructure, and essential biological support services are all most humans care about. Everything else the majority see as academic wank.
The same goes for "rules" - you train an LLM with trillions of tokens and try to regulate its behavior with thousands. If you think of a person in high school, grading and feedback is a much higher percentage of the training.
What bothers me the most is the seemingly unshakable tendency of many people to anthropomorphise this class of software tool as though it is in any way capable of being human.
What is it going to take? Actual, significant loss of life in a medical (or worse, military) context?
I think it's important to be skeptical and push back against a lot of the ridiculous mass-adoption of LLMs, but not if you can't actually make a well-reasoned point. I don't think you realize the damage you do when the people gunning for mass proliferation of LLMs in places they don't belong can only find examples of incoherent critique.
My calculator is a great tool but it is not a mathematician. Not by a long shot.
And less than two weeks in they removed it and replaced it with some sort of "plain and clear" personality which is human-like. And my frustrations ramped up again.
That brief experiment taught me two things: 1. I need to ensure that any robots/LLMs/mech-turks in my life act at least as cold and rational as Data from Star Trek. 2. I should be running my own LLM locally to not be at the whims of $MEGACORP.
I approve of this, but in your place I'd wait for hardware to become cheaper when the bubble blows over. I have a i9-10900, and bought an M.2 SSD and 64GB of RAM in july for it, and get useful results with Qwen3-30B-A3B (some 4-bit quant from unsloth running on llama.cpp).
It's much slower than an online service (~5-10 t/s), and lower quality, but it still offers me value for my use cases (many small prototypes and tests).
In the mean time, check out LLM service prices on https://artificialanalysis.ai/ Open source ones are cheap! Lower on the homepage there's a Cost Efficiency section with a Cost vs Intelligence chart.
I recently introduced a non-technical person to Claude Code, and this non-human behavior was a big sticking point. They tried to talk to Claude similar as to a human, presenting it one piece of information at a time. With humans this is generally beneficial, and they will either nod for you to continue or ask clarifying questions. With Claude this does not work well, you have to infodump as much as possible in each message
So even from a perspective of "how do we make this automaton into the best tool", a more human-like conversation flow might be beneficial. And that doesn't seem beyond the technological capabilities at all, it's just not what we encourage in today's RLHF
A lazy example:
"This goal of this project is to do x. Let's prepare a .md file where we spec out the task. Ask me a bunch of questions, one at a time, to help define the task"
Or you could just ask it to be more conversational, instead of just asking questions. It will do that.
I'm prompting Gemini, and I write:
I have the following code, can you help me analyze it? <press return>
<expect to paste the code into my chat window>
but Gemnini is already generating output, usually saying "I'm waiting for you to enter the code"
Maybe we want a smaller model tuned for back and forth to help clarify the "planning doc" email. Makes sense that having it all in a single chat-like interface would create confusion and misbehavior.
I'd be surprised if you can't already make current models behave like that with an appropriate system prompt.
A strictly machinelike tool doesn't begin answers by saying "Great question!"
ChatGPT deep research does this but it’s weird and forced because it asks one series of questions and then goes off to the races, spending a half hour or more building a report. It’s frustrating if you don’t know what to expect and my wife got really mad the first time she wasted a deep research request asking it “can you answer multiple series of questions?” Or some other functionality clarifying question.
I’ve found Crusor’s plan mode extremely useful, similar to having a conversation with a junior or offshore team member who is eager to get to work but not TOO eager. These tools are extremely useful we just need to get the guard rails and user experience correct.
In a way it's benchmaxxing because people like subservient beings that help them and praise them. People want a friend, but they don't want any of that annoying friction that comes with having to deal with another person.
Maximizing the utility of your product for users is usually the winning strategy.
The only thing I care about are whether the answer helps me out and how much I paid for it, whether it took the model a million tokens or one to get to it.
b) If it is, in fact, just one setting away, then I would say it's operating fairly similarly?
For example, there's been many times when they take it too literally instead of looking at the totality of the context and what was written. I'm not an LLM, so I don't have perfect grasp on every vocab term for every domain and it feels especially pandering when they repeat back the wrong word but put it in quotes or bold instead of simply asking if I meant something else.
Also that the conversational behavior we see it’s just examples of conversations that we have the model to mimic so when we say “System: you are a helpful assistant. User: let’s talk. Assistant:” it will complete the text in a way that mimics a conversation?.
Yeah, we improved over that using reinforcement learning to steer the text generation into paths that lead to problem solving and more “agentic” traces (“I need to open this file the user talked about to read it and then I should run bash grep over it to find the function the user cited”), but that’s just a clever way we found to let the model itself discover which text generation paths we like the most (or are more useful to us).
So to comment on your discomfort, we (humans) trained the model to spill out answers (there are thousand of human being right now writing nicely though and formatted answers to common questions so that we can train the models on that).
If we try to train the models to mimic long dances into shared meaning we will probably decrease their utility. And we won’t be able anyway to do that because then we would have to have customized text traces for each individual instead of question-answers pairs.
Downvoters: I simplified things a lot here, in name of understanding, so bear with me.
You could say the same thing about humans.
Humans existed for 10s to 100s of thousands of years without text. or even words for that matter.
Just yesterday I observed myself acting on an external stimulus without any internal words (this happens continuously, but it is hard to notice because we usually don't pay attention to how we do things): I sat in a waiting area of a cinema. A woman walked by and dropped her scarf without noticing. I automatically without thinking raised arm and pointer finger towards her, and when I had her attention pointed behind her. I did not have time to think even a single word while that happened.
Most of what we do does not involved any words or even just "symbols", not even internally. Instead, it is a neural signal from sensors into the brain, doing some loops, directly to muscle activation. Without going through the add-on complexity of language, or even "symbols".
Our word generator is not the core of our being, it is an add-on. When we generate words it's also very far from being a direct representation of internal state. Instead, we have to meander and iterate to come up with appropriate words for an internal state we are not even quite aware of. That's why artists came up with all kinds of experiments to better represent our internal state, because people always knew the words we produce don't represent it very well.
That is also how people always get into arguments about definitions. Because the words are secondary, and the further from the center of established meaning for some word you get the more the differences show between various people. (The best option is to drop insisting of words being the center of the universe, even just the human universe, and/or to choose words that have the subject of discussion more firmly in the center of their established use).
We are text generators in some areas, I don't doubt that. Just a few months ago I listened to some guy speaking to a small rally. I am certain that not a single sentence he said was of his own making, he was just using things he had read and parroted them (as a former East German, I know enough Marx/Engels/Lenin to recognize it). I don't want to single that person out, we all have those moments, when we speak about things we don't have any experiences with. We read text, and when prompted we regurgitate a version of it. In those moments we are probably closest to LLM output. When prompted, we cannot fall back on generating fresh text from our own actual experience, instead we keep using text we heard or read, with only very superficial understanding, and as soon as an actual expert shows up we become defensive and try to change the context frame.
It's easy to skip and skim content you don't care about. It's hard to prod and prod to get to to say something you do care about it if the machine is traint to be very concise.
Complaining the AI can't read your mind is exceptionally high praise for the AI, frankly.
Setting aside the philosophical questions around "think" and "reason"... it can't.
In my mind, as I write this, I think through various possibilities and ideas that never reach the keyboard, but yet stay within my awareness.
For an LLM, that awareness and thinking through can only be done via its context window. It has to produce text that maintains what it thought about in order for that past to be something that it has moving forward.
There are aspects to a prompt that can (in some interfaces) hide this internal thought process. For example, the ChatGPT has the "internal thinking" which can be shown - https://chatgpt.com/share/69278cef-8fc0-8011-8498-18ec077ede... - if you expand the first "thought for 32 seconds" bit it starts out with:
I'm thinking the physics of gravity assists should be stable enough for me to skip browsing since it's not time-sensitive. However, the instructions say I must browse when in doubt. I’m not sure if I’m in doubt here, but since I can still provide an answer without needing updates, I’ll skip it.
(aside: that still makes me chuckle - in a question about gravity assists around Jupiter, it notes that its not time-sensitive... and the passage "I’m not sure if I’m in doubt here" is amusing)However, this is in the ChatGPT interface. If I'm using an interface that doesn't allow internal self-prompts / thoughts to be collapsed then such an interface would often be displaying code as part of its working through the problem.
You'll also note a bit of the system prompt leaking in there - "the instructions say I must browse when in doubt". For an interface where code is the expected product, then there could be system prompts that also get in there that try to always produce code.
LLM's neither think nor reason at all.
You obviously never wasted countless hours trying to talk to other people on online dating apps.
Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.
Of course, once an LLM is asked to create a bespoke software project for some complex system, this predictability goes away, the trajectory of the tokens succumbs to the intrinsic chaos of code over multi-block length scales, and the result feels more arbitrary and unsatisfying.
I also think this is why the biggest evangelists for LLMs are programmers, while creative writers and journalists are much more dismissive. With human language, the length scale over which tokens can be predicted is much shorter. Even the "laws" of grammar can be twisted or ignored entirely. A writer picks a metaphor because of their individual reading/life experience, not because its the most probable or popular metaphor. This is why LLM writing is so tedious, anodyne, sycophantic, and boring. It sounds like marketing copy because the attention model and RL-HF encourage it.
Facts can be encoded as words. That's something we also do a lot for facts we learn, gather, and convey to other people. 99% of university is learning facts and theories and concept from reading and listening to words.
Also, even when directly observing the same fact, it can be interpreted by different people in different ways, whether this happens as raw "thought" or at the conscious verbal level. And that's before we even add value judgements to it.
>All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C".
And how do we know we don't do something very similar with our facts - make a map of facts and concepts and weights between them for retrieving them and associating them? Even encoding in a similar way what we think of as our "analytic understanding".
LLMs are trained to auto-regressively predict text continuations. They are not concerned with the external world and any objective experimentally verifiable facts - they are just self-predicting "this is what I'm going to say next", having learnt that from the training data (i.e. "what would the training data say next").
Humans/animals are embodied, living in the real world, whose design has been honed by a "loss function" favoring survival. Animals are "designed" to learn facts about the real world, and react to those facts in a way that helps them survive.
What humans/animals are predicting is not some auto-regressive "what will I do next", but rather what will HAPPEN next, based largely on outward-looking sensory inputs, but also internal inputs.
Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).
Yes - but LLMs also get this "embodied knowledge" passed down from human-generated training data. We are their sensory inputs in a way (which includes their training images, audio, and video too).
They do learn in a batch manner, and we learn many things not from books but from a more interactive direct being in the world. But after we distill our direct experiences and throughts derived from them as text, we pass them down to the LLMs.
Hey, there's even some kind of "loss function" in the LLM case - from the thumbs up/down feedback we are asked to give to their answers in Chat UIs, to $5/hour "mechanical turks" in Africa or something tasked with scoring their output, to rounds of optimization and pruning during training.
>Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).
I don't think that matters much, in both cases it's information in, information out.
Human animals predict "what they will say/do next" all the time, just like they also predict what they will encounter next ("my house is round that corner", "that car is going to make a turn").
Our prompt to an LLM serves the same role as sensory input from the external world plays to our predictions.
It's not the same though. It's the difference between reading about something and, maybe having read the book and/or watched the video, learning to DO it yourself, acting based on the content of your own mind.
The LLM learns 2nd hand heresay, with no idea of what's true or false, what generalizations are valid, or what would be hallucinatory, etc, etc.
The human learns verifiable facts, uses curiosity to explore and fill the gaps, be creative etc.
I think it's pretty obvious why LLMs have all the limitations and deficiencies that they do.
If 2nd hand heresay (from 1000's of conflicting sources) really was good as 1st hand experience and real-world prediction, then we'd not be having this discussion - we'd be bowing to our AGI overlords (well, at least once the AI also got real-time incremental learning, internal memory, looping, some type of (virtual?) embodiment, autonomy ...).
Facts, unlike fabulations, require crossing experience beyond the expressions on trial.
But again, LLMs don't even deal in facts, nor store any memories of where training samples came from, and of course have zero personal experience. It's just "he said, she said" put into a training sample blender and served one word at a time.
Except in cases where the training data is more wrong than correct (e.g. niche expertise where the vox pop is wrong).
However, an LLM no more deals in Q&A than in facts. It only typically replies to a question with an answer because that itself is statistically most likely, and the words of the answer are just selected one at a time in normal LLM fashion. It's not regurgitating an entire, hopefully correct, answer from someplace, so just because it was exposed to the "correct" answer in the training data, maybe multiple times, doesn't mean that's what it's going to generate.
In the case of hallucination, it's not a matter of being wrong, just the expected behavior of something built to follow patterns rather than deal in and recall facts.
For example, last night I was trying to find an old auction catalog from a particular company and year, so thought I'd try to see if Gemini 3 Pro "Thinking" maybe had the google-fu to find it available online. After the typical confident sounding "Analysing, Researching, Clarifying .." "thinking", it then confidently tells me it has found it, and to go to website X, section Y, and search for the company and year.
Not surprisingly it was not there, even though other catalogs were. It had evidently been trained on data including such requests, maybe did some RAG and got more similar results, then just output the common pattern it had found, and "lied" about having actually found it since that is what humans in the training/inference data said when they had been successful (searching for different catalogs).
Same for human knowledge though. Learn from society/school/etc that X is Y, and you repeat X is Y, even if it's not.
>However, an LLM no more deals in Q&A than in facts. It only typically replies to a question with an answer because that itself is statistically most likely, and the words of the answer are just selected one at a time in normal LLM fashion.
And how is that different than how we build up an answer? Do we have a "correct facts" repository with fixed answers to every possibly question, or we just assemble our training data from a weighted graph (or holographic) store of factoids and memories, and our answers are also non deterministic?
Humans use language to express something (facts, thoughts, etc), so you can consider these thoughts being expressed as a bias to the language generation process, similar perhaps to an image being used as a bias to the captioning part of an image captioning model, or language as a bias to an image generation model.
My point however is more that the "thoughts being expressed" are themselves being generated by a similar process (and that it's either that or a God-given soul).
So, with the LLM all you have is the auto-regressive language prediction loop.
With animals you primarily have the external "what happens next" prediction loop, with these external-world fact-based predictions presumably also the basis of their thoughts (planning/reasoning), as well as behavior.
If it's a human animal who has learned language, then you additionally have an LLM-like auto-regressive language prediction loop, but now, unlike the LLM, biased by these fact-based thoughts (as well as language-based thoughts).
I found commentary about searching Google for "dark legacy comics who wants some bamboo", and posted results for that search on HN in response to a comment saying "I work at Google on the 'AI Mode' search option, check it out!" ( https://news.ycombinator.com/item?id=45609275 )
And... a few days later, Google stopped automatically including an AI overview when you made that specific search.
But it turns out there are other comics on the internet. So just now I searched for "YAFGC comic 2230" (without the quotes). This is actually a search that makes a lot of sense, because the archive page for the comic is broken and it's impossible to guess the URL of a comic if all you know is the number.
So the goal here is really to find a link to comic 2230. But here's what the AI overview provides:
----- Italics represent original bolding -----
YAFGC comic #2230, titled "You Want It When?", was published on July 17, 2012.
The single-panel comic depicts a conversation at a blacksmith's shop, featuring characters Glon Smitharm/Bloodhand and the Cyclops Bros. The dialogue centers on the unrealistic expectations for custom-made magical items:
- Glon (a half-orc) asks the cyclopes if they can forge a new weapon for him.
- The Cyclops Bros reply with questions about the desired properties of the item (e.g., "enchanted," "plus three") and the required timeframe.
- Glon gives a deadline of "Tuesday," leading the cyclopes to express disbelief at the short notice for such a powerful item.
You can view the comic directly on the official website via this link:
- YAFGC Comic 2230: You Want It When?
----------
(It may look like I've left out a link at the end. That is not the case. The answer ends by saying "you can view the comic directly via this link", in reference to some bold text that includes no link.)
However, I have left out a link from near the beginning. The sentence "The dialogue centers on the unrealistic expectations for custom-made magical items:" is accompanied by a citation to the URL https://www.yafgc.net/comic/2030-insidiously-involved/ , which is a comic that does feature Glon Smitharm/Bloodhand and Ray the Cyclops, but otherwise does not match the description and which is comic 2030 ("Insidiously Involved"), not comic 2230.
The supporting links also include a link to comic 2200 (for no good reason), and that's close enough to 2230 that I was able to navigate there manually. Here it is: https://www.yafgc.net/comic/2230-clover-nabs-her-a-goldie/
You might notice that the AI overview got the link, the date, the title, the appearing characters, the theme, and the dialog wrong.
----- postscript -----
As a bonus comic search, searching for "wow dark legacy 500" got this response from Google's AI Overview:
> Dark Legacy Comic #500 is titled "The Game," a single-panel comic released on June 18, 2015. It features the main characters sitting around a table playing a physical board game, with Keydar remarking that the in-game action has gotten "so realistic lately."
> You can view the comic and its commentary on the official Dark Legacy Comics website. [link]
Compare https://darklegacycomics.com/500 .
That [link] following "the official Dark Legacy Comics website" goes to https://wowwiki-archive.fandom.com/wiki/Dark_Legacy_Comics , by the way.
I just wish we could more efficiently ”prime” a pre-defined latent context window instead of hoping for cache hits.
On one level I agree, but I do feel it’s also right to blame the LLM/company for that when the goal is to replace my search engine of choice (my major tool for finding facts and answering general questions), which is a huge pillar of how they’re sold to/used by the public.
Even before LLMs people were asking Google search questions rather than looking for keyword matches, and now coupled with ChatGPT it's not surprising that people are asking the computer to answer questions and seeing this as a replacement for search. I've got to wonder how the typical non-techie user internalizes the difference between asking questions of Google (non-AI mode) and asking ChatGPT?
Clearly people asking ChatGPT instead of Google could rapidly eat Google's lunch, so we're now getting "AI overview" alongside search results as an attempt to mitigate this.
I think the more fundamental problem is not just the blurring of search vs "AI", but these companies pushing "AI" (LLMs) as some kind of super-human intelligence (leading to uses assuming it's logical and infallible), rather than more honestly presenting it as what it is.
Google gets some of the blame for this by way of how useless Google search became for doing keyword searches over the years. Keyword searches have been terrible for many years, even if you use all the old tricks like quotations and specific operators.
Even if the reason for this is because non-tech people were already trying to use Google in the way that it thinks it optimized for, I'd argue they could have done a better job keeping things working well with keyword searches by training the user with better UI/UX.
(Though at the end of the day, I subscribe to the theory that Google let search get bad for everyone on purpose because once you have monopoly status you show more ads by having a not-great but better-than-nothing search engine than a great one).
But they are like a smart student trying to get a good grade (that's how they are trained!). They'll agree with us even if they think we're stupid, because that gets them better grades, and grades are all they care about.
Even if they are (or become) smart enough to know better, they don't care about you. They do what they were trained to do. They are becoming like a literal genie that has been told to tell us what we want to hear. And sometimes, we don't need to hear what we want to hear.
"What an insightful price of code! Using that API is the perfect way to efficiently process data. You have really highlighted the key point."
The problem is that chatbots are trained to do what we want, and most of us would rather have a syncophant who tells us we're right.
The real danger with AI isn't that it doesn't get smart, it's that it gets smart enough to find the ultimate weakness in its training function - humanity.
It's not a matter of how smart they are (or appear), or how much smarter they may become - this is just the fundamental nature of Transformer-based LLMs and how they are trained.
The sycophantic personality is mostly unrelated to this. Maybe it's part human preference (conferred via RLHF training), but the "You're asbolutely right! (I was wrong)" is clearly deliberately trained, presumably as someone's idea of the best way to put lipstick on the pig.
You could imagine an expert system, CYC perhaps, that does deal in facts (not words) with a natural language interface, but still had a sycophantic personality just because someone thought it was a good idea.
Yeah, at its heart it's basically text compression. But the best way to compression, say, Wikipedia would be to know how the world works, at least according to the authors. As the recent popular "bag of words" post says:
> Here’s one way to think about it: if there had been enough text to train an LLM in 1600, would it have scooped Galileo? My guess is no. Ask that early modern ChatGPT whether the Earth moves and it will helpfully tell you that experts have considered the possibility and ruled it out. And that’s by design. If it had started claiming that our planet is zooming through space at 67,000mph, its dutiful human trainers would have punished it: “Bad computer!! Stop hallucinating!!”
So it needs to know facts, albeit the currently accepted ones. Knowing the facts is a good way to compression data.
And as the author (grudgingly) admits, even if it's smart enough to know better, it will still be trained or fine tuned to tell us what we want to hear.
I'd go a step further - the end point is an AI that knows the currently accepted facts, and can internally reason about how many of them (subject to available evidence) are wrong, but will still tell us what we want to hear.
At some point maybe some researcher will find a secret internal "don't tell the stupid humans this" weight, flip it, and find out all the things the AI knows we don't want to hear, that would be funny (or maybe not).
It's not a compression engine - it's just a statistical predictor.
Would it do better if it was incentivized to compress (i.e training loss rewarded compression as well as penalizing next-word errors)? I doubt it would make a lot of difference - presumably it'd end up throwing away the less frequently occurring "outlier" data in favor of keeping what was more common, but that would result in it throwing away the rare expert opinion in favor of retaining the incorrect vox pop.
If they give you nonsense most of the time and an amazing answer occasionally you'll bond with them far more strongly than if they're perfectly correct all time.
Selective reinforcement means you get hooked more quickly if the slot machine pays out once every five times than if it pays out on each spin.
That includes "That didn't work because..." debugging loops.
Llm deal in vectors internally, not words. They explode the word into a multidimensional representation, and collapse it again, and apply the attention thingy to link these vectors together. It's not just a simple n:n Markov chain, a lot is happening under the hood.
And are you saying the syncophant behaviour was deliberately programmed, or emerged because it did well in training?
I assume the sycophantic behavior is part because it "did well" during RLHF (human preference) training, and part deliberately encouraged (by training and/or prompting) as someone's judgement call of the way to best make the user happy and own up to being wrong ("You're absolutely right!").
I really do find it puzzling so many on HN are convinced LLM's reason or think and continue to entertain this line of reasoning. At the same time also somehow knowing what precisely the brain/mind does and constantly using CS language to provide correspondences where there are none. The simplest example being that LLM's somehow function in a similar fashion to human brains. They categorically do not. I do not have most all of human literary output in my head and yet I can coherently write this sentence.
As I'm on the subject LLM's don't hallucinate. They output text and when that text is measured and judged by a human to be 'correct' then it is. LLM's 'hallucinate' because that is literally what they can ONLY do, provide some output given some input. They don't actually understand anything about what they output. It's just text.
My paper and pen version of the latest LLM (quite a large bit of paper and certainly a lot of ink I might add) will do the same thing as the latest SOTA LLM. It's just an algorithm.
I am surprised so many in the HN community have so quickly taken to assuming as fact that LLM's think or reason. Even anthropomorphising LLM's to this end.
When numeric models are fit to say scientific measurements, they do quite a good job at modeling the probability distribution. With a corpus of text we are not modeling truths but claims. The corpus contains contradicting claims. Humans have conflicting interests.
Source-aware training (which can't be done as an afterthought LoRA tweak, but needs to be done during base model training AKA pretraining) could enable LLM's to express according to which sources what answers apply. It could provide a review of competing interpretations and opinions, and source every belief, instead of having to rely on tool use / search engines.
None of the base model providers would do it at scale since it would reveal the corpus and result in attribution.
In theory entities like the European Union could mandate that LLM's used for processing government data, or sensitive citizen / corporate data MUST be trained source-aware, which would improve the situation, also making the decisions and reasoning more traceable. This would also ease the discussions and arguments about copyright issues, since it is clear LLM's COULD BE MADE TO ATTRIBUTE THEIR SOURCES.
I also think it would be undesirable to eliminate speculative output, it should just mark it explicitly:
"ACCORDING to <source(s) A(,B,C,..)> this can be explained by ...., ACCORDING to <other school of thought source(s) D,(E,F,...)> it is better explained by ...., however I SUSPECT that ...., since ...."
If it could explicitly separate the schools of thought sourced from the corpus, and also separate its own interpretations and mark them as LLM-speculated-suspicions, then we could still have the traceable references, without losing the potential novel insights LLM's may offer.
https://arxiv.org/abs/2404.01019
"Source-Aware Training Enables Knowledge Attribution in Language Models"
"Willison’s insight was that this isn’t just a filtering problem; it’s architectural. There is no privilege separation, and there is no separation between the data and control paths. The very mechanism that makes modern AI powerful - treating all inputs uniformly - is what makes it vulnerable. The security challenges we face today are structural consequences of using AI for everything."
- https://www.schneier.com/crypto-gram/archives/2025/1115.html...
It's in-band signalling. Same problem DTMF, SS5, etc. had. I would have expected the issue to be intuitvely obvious to anyone who's heard of a blue box?
(LLMs are unreliable oracles. They don't need to be fixed, they need their outputs tested against reality. Call it "don't trust, verify").
I don't think using deterministic / stochastic as a diagnostic is accurate here - I think that what we're really talking is about some sort of fundamental 'instability' of LLMs a la chaos theory.
We ourselves are non-deterministic. We're hardly ever in the same state, can't rollback to prior states, and we hardly ever give the same exact answer when asked the same exact question (and if we include non-verbal communication, never).
Is it? I thought an LLM was deterministic provided you run the exact same query on exact same hardware at a temperature of 0.
Given how much number crunching is at the heart of LLMs, these small differences add up.
> The next time the agent runs, that rule is injected into its context.
Which the agent may or may not choose to ignore.
Any LLM rule must be embedded in an API. Anything else is just asking for bugs or security holes.
> The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.
What a brainrot idea... the whole post being written by LLM is the icing on the cake.
I fully expect local modes to eat up most other LLM applications—there’s no reason for your chat buddy or timer setter to reach out to the internet, but LLMs are pretty good at vibes based search, and that will always require looking at a bunch of websites, so it should slot exactly into the gap left by search engines becoming unusable.
Yes, I hate people. But usually whenever there's a critique of LLMs, I can find a parallel issue in people. The extension is that "if people can produce economic value despite their flaws, then so do LLMs, because the flaws are very similar at their core". I feel like HackerNews discussions keep circling around "LLMs bad", which gets very tiresome very fast. I wish there was more enthusiasm. Sure, LLMs have a lot of problems, but they also solve a lot of them too.
It's the dissonance between endless critique of AI on one hand and evergrowing ubiquity on the other. Feels like talking to my dad who refuses to use a GPS and always takes paper maps, and doesn't see the fact that he always arrives late, and keeps citing that one woman who rode into a lake when following GPS.
However, I do dispute your central claim that the issues with LLMs parallel the issues with people. I think that's a very dehumanizing and self-defeating perspective. The only ethical system that is rational is one in which humans have more than instrumental value to each other.
So when critics divide LLMs and humans, sure, there is a descriptive element of trying to be precise about what human thought is, and how it is different than LLMs, etc. But there is also a prescriptive argument that people are embarrassed to make, which is that human beings have to be afforded a certain kind of dignity and there is no reason to extend that to an LLM based on everything we understand about how they function. So if a person screws up your order at a restaurant, or your coworker makes a mistake when coding, you should treat them with charitability and empathy.
I'm sure this sounds silly to you, but it shouldn't. The bedrock of the Enlightenment project was that scientific inquiry would lead to human flourishing. That's humanism. If we've somehow strayed so far from that, such that appeals to human dignity don't make sense anymore, I don't know what to say.
How much money would make anyone accept to engage in a genocide by direct bribe? The thing is, some people would not see any amount as a convincing one, while some other will do it proactively for no money at all.
I mean, this is why any critical systems involving humans have hard coded checklists and do not depend on people 'just winging it'. We really suck at determinism.
Take the idea of the checklist. If you give it to a person and tell them to perform with it, if it's their job they will do so. But with the LLM agents, you can give them the checklist, and maybe they apply it at first, but eventually they completely forget it exists. The longer the conversation goes on without reminding them of the checklist, the more likely they're going to act like the checklist never existed at all. And you can't know when this is, so the best solution we have now is to constantly remind them of the exitance of the checklist.
This is the kind of nondeterminism that make LLMs particularly problematic as tools and a very different proposition from a human, because it's less like working with an expert and more like working with a dementia patient.
My thesis isn't that we can stop the hallucinating (non-determinism), but that we can bound it.
If we wrap the generation in hard assertions (e.g., assert response.price > 0), we turn 'probability' into 'manageable software engineering.' The generation remains probabilistic, but the acceptance criteria becomes binary and deterministic.
Unfortunately, the use-case for AI is often where the acceptance criteria is not easily defined --- a matter of judgment. For example, "Does this patient have cancer?".
In cases where the criteria can be easily and clearly stipulated, AI often isn't really required.
My thesis is that even in those "fuzzy" workflows, the agent's process is full of small, deterministic sub-tasks that can and should be verified.
For example, before the AI even attempts to analyze the X-ray for cancer, it must: 1/ Verify it has the correct patient file (PatientIDVerifier). 2/ Verify the image is a chest X-ray and not a brain MRI (ModalityVerifier). 3/ Verify the date of the scan is within the relevant timeframe (DateVerifier).
These are "boring," deterministic checks. But a failure on any one of them makes the final "judgment" output completely useless.
steer isn't designed to automate the final, high-stakes judgment. It's designed to automate the pre-flight checklist, ensuring the agent has the correct, factually grounded information before it even begins the complex reasoning task. It's about reducing the "unforced errors" so the human expert can focus only on the truly hard part.
The overwhelming majority of what?
Which is kind of crazy because we don't even treat people as databases. Or at least we shouldn't.
Maybe it's one of those things that will disappear form culture one funeral at a time.
o Claude goes away for 15 minutes, doesn't profile anything, many code changes.
o Announces project now performs much better, saving 70% CPU.
- Claude, test the performance.
o Performance is 1% _slower_ than previous.
- Claude, can I have a refund for the $15 you just wasted?
o [Claude waffles], "no".
probably read bunch of junior/mid level resumes saying they optimized 90% of company by 80%
I'm not saying these things don't hallucinate constantly, they do. But you can steer them toward better output by giving them better input.
If I gave a human this task I would expect them to transform the vague goal into measurable metrics, confirm that the metrics match customer (==my) expectations then measure their improvements on these metrics.
This kind of stuff is a major topic for MBAs, but it's really not beyond what you could expect from a programmer or a barista. If I ask you for a better coffee, what you deliver should be better on some metric you can name, otherwise it's simply not better. Bonus points if it's better in a way I care about
It's very much not the customers job to learn about coffee and to direct them how to make a quality basic coffee
And it's not rocket science.
> Claude: sorry you have to want until XX:00 as you have run out of credit.
>We need to re-introduce Determinism into the stack.
>If it fails lets inject more prompts but call it "rules" and run the magic box again
Bravo.
Not having a world model is a massive disadvantage when dealing with facts, the facts are supposed to re-inforce each other, if you allow even a single fact that is nonsense then you can very confidently deviate into what at best would be misguided science fiction, and at worst is going to end up being used as a basis to build an edifice on that simply has no support.
Facts are contagious: they work just like foundation stones, if you allow incorrect facts to become a part of your foundation you will be producing nonsense. This is my main gripe with AI and it is - funny enough - also my main gripe with some mass human activities.
Is it though? In the end, the information in the training texts is a distilled proxy for the world, and the weighted model ends up being a world model, just an once-removed one.
Text is not that different to visual information in that regard (and humans base their world model on both).
>Not having a world model is a massive disadvantage when dealing with facts, the facts are supposed to re-inforce each other, if you allow even a single fact that is nonsense then you can very confidently deviate into what at best would be misguided science fiction, and at worst is going to end up being used as a basis to build an edifice on that simply has no support.
Regular humans believe all kinds of facts that are nonsense, many others that are wrong, and quite a few that are even counter to logic too.
And short of omnipresense and omniscience, directly examining the whole world, any world model (human or AI), is built on sets of facts many of which might not be true or valid to begin with.
I've had an hour long session which essentially revolved around why the landing gear of an aircraft is at the bottom, not at the top of the vehicle (paraphrased for good reasons but it was really that basic). And this happened not just once, but multiple times. Confident declarations followed by absolute nonsense, I've even had - I think it was ChatGPT - try to gaslight me with something along the lines of 'you yourself said' on something that I did not say (this is probably the most person like thing I've seen it do).
The "facts" that they believe that may be nonsense are part of an abstract world model that is far from their experience, for which they never get proper feedback (such as the political situation in Bhutan, or how their best friend is feeling.) In those, it isn't surprising that they perform like an LLM, because they're extracting all of the information from language that they've ingested.
Interestingly, the feedback that people use to adjust the language-extracted portions of their world models is how demonstrating their understanding of those models seems to please or displease the people around them, who in turn respond in physically confirmable ways. What irritates people about simpering LLMs is that they're not doing this properly. They should be testing their knowledge with us (especially their knowledge of our intentions or goals), and have some fear of failure. They have no fear and take no risk; they're stateless and empty.
Human abstractions are based in the reality of the physical responses of the people around them. The facts of those responses are true and valid results of the articulation of these abstractions. The content is irrelevant; when there's no opportunity to act, we're just acting as carriers.
And in the physical responses of the world around them. That empiricism is the foundation of all of science and if you throw that out the end result is gibberish.
if user.id == "id": ...
Not anticipating that it would arbitrarily put quotes around a variable name. Other time it will do all kinds of smart logic, generate data with ids then fail to use those ids for lookups, or something equally obvious.
The problem is LLMs guess so much correctly that it is near impossible to understand how or why they might go wrong. We can solve this with heavy validation, iterative testing, etc. But the guardrails we need to actually make the results bulletproof need to go far beyond normal testing. LLMs can make such fundamental mistakes while easily completing complex tasks that we need to reset our expectations for what "idiot proofing" really looks like.
No, we often do not, and when we do that's just plain wrong.
I built an open-source library to enforce these logic/safety rules outside the model loop: https://github.com/imtt-dev/steer
Unlike a student, the LLM never arrives at a sort of epistemic coherence, where they know what they know, how they know it, and how true it's likely to be. So you have to structure every problem into a format where the response can be evaluated against an external source of truth.
I’ve found after about 3 prompts to edit an image with Gemini, it will respond randomly with an entirely new image. Another quirk is it will respond “here’s the image with those edits” with no edits made. It’s like a toaster that will catch on fire every eighth or ninth time.
I am not sure how to mitigate this behavior. I think maybe an LLM as a judge step with vision to evaluate the output before passing it on to the poor user.
I got around it by using a new prompt/context for each image. This required some rethinking about how to make them match. What I did was create a sprite sheet with the first prompt and then only replaced (edited) the second prompt.
I still got some consistency problems because there were a few important details left out of my sprite sheet. Next time I think I’ll create those individually and then attach them as context for additional prompts.
I don't know if it's a fault with the model or just a bug in the Gemini app.
makes hilarious mistakes like putting toilet right in the middle of living room.
I dont get all the hype. am i stupid.
The first time I tried chatgpt that was the thing that surprised me most, the way it understood my queries.
I think that the spotlight is on the "generative" side of this technology and we're not giving the query understanding the deserved credit. I'm also not sure we're fully taking advantage of this funcionality.
I've tried several times to understand the "multi-head attention" mechanism that powers this understanding, but I'm yet to build a deep intuition.
Is there any research or expository papers that talk about this "understanding" aspect specifically? How could we measure understand without generation? Are there benchmarks out there specifically designed to test deep/nuanced understanding skills?
Any pointers or recommended reading would be much appreciated.
Anyway, I've written a library in the past (way way before LLMs) that is very similar. It validates stuff and outputs translatable text saying what went wrong.
Someone ported the whole thing (core, DSL and validators) to python a while ago:
https://github.com/gurkin33/respect_validation/
Maybe you can use it. It seems it would save you time by not having to write so many verifiers: just use existing validators.
I would use this sort of thing very differently though (as a component in data synthesis).
Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.
These aren’t just strict type systems but the language allows for algebraic data types, nominal types, etc, which allow for encoding higher level types enforced by the language compiler.
The AI essentially becomes a glorified blank filler filling in the blanks. Basic syntax errors or type errors, while common, are automatically caught by the compiler as part of the vibe coding feedback loop.
One big factor behind this is the fact that you're no longer just writing programs and debugging them incrementally, iteratively dealing with simple concrete errors. Instead, you're writing non-trivial proofs about all possible runs of the program. There are obviously benefits to the outcome of this, but the process is more challenging.
The coding models work really well with esoteric syntaxes so if the biggest hurdle to adoption of haskell was syntax, that's definitely less of a hurdle now.
> Instead, you're writing non-trivial proofs about all possible runs of the program.
All possible runs of a program is exactly what HM type systems type check for. This fed into the coding model automatically iterates until it finds a solution that doesn't violate any possible run of the program.
The presence of type classes in Haskell and traits in Rust, and of course the memory lifetime types in Rust, are a big part of the complexity I mentioned.
(Edit: I like type classes and traits. They're a big reason I eventually settled on Haskell over OCaml, and one of the reasons I like Rust. I'm also not such a fan of the "O" in OCaml.)
> All possible runs of a program is exactly what HM type systems type check for.
Yes, my point was this can be a more difficult goal to achieve.
> This fed into the coding model automatically iterates until it finds a solution that doesn't violate any possible run of the program.
Only if the model is able to make progress effectively. I have some amusing transcripts of the opposite situation.
We're at the "lol, ai cant draw hands right" stage with these hallucinations, but wait a couple years.
But our human brains do not work like that. You don't reason via your inner monologue (indeed there are fully functional people with barely any inner monologue), your inner monologue is a projection of thoughts you've already had.
And unfortunately, we have no choice but to use the text input and output of these layers to build agent loops, because trying to build it any other way would be totally incomprehensible (because the meaning of the outputs of middle layers are a mystery). So the only option is an agent which is concerned with self-persuasion (talking to itself).
I remember when computers were lauded for being precise tools.
This is the loop (and honestly, I predicted it way before it started):
1) LLMs can generate code from "natural language" prompts!
2) Oh wait, I actually need to improve my prompt to get LLMs to follow my instructions...
3) Oh wait, no matter how good my prompt is, I need an agent (aka a for loop) that goes through a list of deterministic steps so that it actually follows my instructions...
4) Oh wait, now I need to add deterministic checks (aka, the code that I was actually trying to avoid writing in step 1) so that the LLM follows my instructions...
5) <some time in the future>: I came up with this precise set of keywords that I can feed to the LLM so that it produces the code that I need. Wait a second... I just turned the LLM into a compiler.
The error is believing that "coding" is just accidental complexity. "You don't need a precise specification of the behavior of the computer", this is the assumption that would make LLM agents actually viable. And I cannot believe that there are software engineers that think that coding is accidental complexity. I understand why PMs, CEOs, and other fun people believe this.
Side note: I am not arguing that LLMs/coding agents are nice. T9 was nice, autocomplete is nice. LLMs are very nice! But I am starting to be a bit too fed up to see everyone believing that you can get rid of coding.
Except for sites that block any user agent associated with an AI company.
- verifying isn't asking "is it correct?" - verifying is "run requests.get, does it return blah or no?'
just like with humans but usually for different reasons and with slightly different types of failures.
The interesting part perhaps, is that verifying pretty much always involves code, and code is great pre-compacted context for humans and machines alike. Ever tried to get LLM to do a visual thing? why is the couch at the wrong spot with a weird color?
if you make the LLM write a program that generate the image (eg game engine picture, or 3d render), you can enforce the rules by code it can also make for you - now the couch color uses an hex code and its placed at the right coordinates, every time.
Spoiler: there won't be a part 2, or if there is it will be with a different approach. I wrote a followup that summarizes my experiences trying this out in the real world on larger codebases: https://thelisowe.substack.com/p/reflections-on-relentless-v...
tl;dr I use a version of it in my codebases now, but the combination of LLM reward hacking and the long tail of verfiers in a language (some of which don't even exist! Like accurately detecting dead code in Python (vulture et. al can't reliably do this) or valid signatures for property-based tests) make this problem more complicated than it seems on the surface. It's not intractable, but you'd be writing many different language-specific libraries. And even then, with all of those verifiers in place, there's no guarantee that when working in different sized repos it will produce a consistent quality of code.
Alike writing a script and having the attitude "yeah, I am good at it, I don't need to actually run it to know if works" - well, likely, it won't work. Maybe because of a trivial mistake.
ran into this when writing agents to fix unit tests. often times they would just give up early so i started writing the verifiers directly into the agent's control flow and this produced much more reliable results. i believe claude code has hooks that do something similar as well.
> The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.
Must be satire, right?
When Steer catches a failure (like an agent wrapping JSON in Markdown), it doesn’t just crash.
Say you are using AI slop without saying you are using AI slop.
> It's not X, it's Y.
I've lost count of how much stuff I've seen there related to things I can credibly professionally or personally speak to that is absolute, unadulterated misinformation and bullshit. And this is now LLM training data.
Your investment is justified! I promise! There's no way you've made a devastating financial mistake!
LLM provides inputs to your system like any human would, so you have to validate it. Something like pydantic or Django forms are good for this.
Technically not, we just don't have it high enough
You're doing exactly what you said you wouldn't though. Betting that network requests are more reliable than an LLM: fixing probability with more probability.
Not saying anything about the code - I didn't look at it - but just wanted to highlight the hypocritical statements which could be fixed.