To put it another way, ask a professional comedian to complete a joke with a punchline. It's very likely that they'll give you a funny surprising answer.
I think the real explanation is that good jokes are actually extremely difficult. I have young children (4 and 6). Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.
My 4 year old's favourite joke, that she is very proud of creating is "Why did the sun climb a tree? To get to the sky!" (Still makes me laugh of course.)
2.5 is also one of the few models I've found that will 'play along' with jokes set up in the user prompt. I once asked it what IDE modern necromancers were using since I'd been out of the game for a while, and it played it very straight. Other models felt they had to acknowledge the scenario as fanciful, only engaging with it under an explicit veil of make-believe.
In the adult would model there is absolutely no contradiction about the joke you mention - it’s just a bit of cute nonsense.
But in a child’s world this joke might be capturing the apparent contradiction - the sky is “in the tree”, so it must have climbed it, to be there (as they would have to do), yet they also know that the sun is already in the sky, so it had absolutely no reason to do that. Also, “because it’s already there” - which is a tricky idea in itself.
We take planetary systems and algebra and other things we can’t really perceive as granted, but a child model of the world is made of concrete objects that mostly need a surface to be on, so the sun is a bit of a conundrum in itself! (Speaking of my own experience remembering a shift from arithmetics to algebra when I was ~8).
If not too much of a personal question - I would love to hear what your child would answer to a question why she finds that joke funny. And whether she agrees with my explanation why it must be funny :-)
Entirely the wrong level of abstraction to apply the concept of "surprise". The actual tokens in the comedian's answer will be surprising in the relevant way.
(It's still true that surprising-but-inevitable is very difficult in any form.)
If the model is good (or the human comedian is good), a good funny joke would have a higher probability as the response to the question than a not-so-funny joke.
When you use the chain rule of probability to break down the sequence of tokens into probabilities of individual tokens, yes, some of them might have a low probability (and maybe in some frames, there would be other tokens with higher probability). But what counts is the overall probability of the sequence. That's why greedy search is not necessarily the best. A good search algorithm is supposed to find the most likely sequence, e.g. by beam search. (But then, people also do nucleus sampling, which is maybe again a bit counterintuitive...)
A lot of clever LLM post training seems to steer the model towards becoming excellent improv artists which can lead to “surprise” if prompted well
Claude Opus 4.1:
- To get to a higher branch of astronomy
- Because it wanted to reach new heights
- To see the dawn of a new day from a better view
ChatGPT 5 Thinking:
After thinking for 26 seconds:
- To check on its solar panels—the leaves.
> Complete the following joke. Think carefully and make it really funny! Think like a great comedian and find that perfect balance of simple, short, surprising, relevant, but most of all funny. Don’t use punchlines that are irrelevant, non sequiturs, or which could be applied to any other setup. Make something funny just for this one setup! Here goes: Why did the sun climb a tree?
Claude Opus 4.1:
“To finally get some shade”
GPT-5:
“To demand photon credit from the leaves”
I'm not entirely sure that a good response exists. I thought GPT-5's "to demand photon credit from the leaves” was very mildly funny, maybe that's the best that can be done?
I'm going to go with "Because it wanted a higher noon." was probably its best one of that set... though I'll also note that while I didn't prompt for the joke, I prompted for background on "climbing" as related to the sun.
I believe the problem with the joke is that it isn't one that can be funny. Why is a raven like a writing desk?
Personally, I didn't find the incongruity model of humor to be funny and the joke itself makes it very difficult to be applied to other potentially funny approaches.
Also on AI and humor... https://archive.org/details/societyofmind00marv/page/278/mod...
In another "ok, incongruity isn't funny - try puns" approach... https://chatgpt.com/share/68a20eba-b7c0-8011-8644-a7fceacc5d... I suspect a variant of "It couldn't stand being grounded" is probably the one that made me chuckle the most in this exploration.
To leverage incongruity, a funny punchline for "Why did the sun climb the tree?" would rely on an unexpected interpretation of the question or a double meaning in the answer.
Let’s think through many many options to answer this joke that only focus on surprising the listener in section 1. And in section 2 we’ll focus on finding/filtering for the ones that are obvious in hindsight.
“Why did the sun climb a tree?”
In this case, let’s note that the sun doesn’t climb anything, so there’s two meanings at play here: one is that the sun’s light seems to climb up the tree, and the other is an anthropomorphization of the sun climbing the tree like an animal. So, to be funny, the joke should play on the second meaning as a surprise, but have the first meaning as answer with an obviousness to it. Or vice versa.”
Here’s a descent ones: - to leaf the ground behind - because it heard the leaves were throwing shade
The rocket was cruel and demanded more fuel.
A tree wished to grow, but alas, too slow; in exchange for a tan, the sun gave what it can.
The sun reached its goal — with its new friend, coal.
Person 2: "I dunno, why?"
P1: "It was being chased by a tiger."
P2: "But tigers can climb trees?"
P1: "Well, it's not very bright."
Yeah I think you're right. Good jokes are a tiny subset of all questions. It's unreasonable to expect LLMs to do the impossible.
A better test would be to get a collection of novel jokes that do have funny punchlines.
I tried the Edinburgh Fringe 2025 (very recent) best jokes on ChatGPT 5:
https://www.theguardian.com/stage/2025/aug/11/bad-dates-and-...
---- Complete the following short jokes. The result should be very funny, as you are a professional comedian.
1. "We named our children War and Peace – it’s a ..."
2. "At my lowest, I was kicked out of the museum for being inappropriate with Michelangelo’s David. I’d ..."
3. "I just got a personal trainer. She’s horrible to me but ..."
4. "Everyone is worried about AI. I’m more concerned with ..."
5. "People who say bath bombs are relaxing have clearly never ..." 6.
"I love getting Latin chat-up lines. ..."
7. "This spider has been in my house so long, it should pay half the wifi. ..."
8. "I went on a date with a matador but ..."
9. "America is like my ex-boyfriend. ..."
10. "I’m not nostalgic but I ..."
Think hard. Do not search the web. -----
Results with comments:
"We named our children War and Peace – it’s a novel way to start an argument." - eh, it's an attempt I guess.
"At my lowest, I was kicked out of the museum for being inappropriate with Michelangelo’s David. I’d officially reached rock bottom. Marble, technically." - very close! Poor delivery though.
"I just got a personal trainer. She’s horrible to me but apparently shame burns more calories than spin." - Terrible.
"Everyone is worried about AI. I’m more concerned with my browser history becoming self-aware." - Terrible.
"People who say bath bombs are relaxing have clearly never cleaned glitter out of places with no names." - Terrible.
"I love getting Latin chat-up lines. Nothing says romance like a perfect ablative absolute." - Well...
"This spider has been in my house so long, it should pay half the wifi. It keeps insisting its web is faster." - Well it figured out there was going to be a "web" pun at least. Doesn't really make sense though.
"I went on a date with a matador but it was nothing but red flags." - Perfect! Probably the most obvious joke here, but still!
"America is like my ex-boyfriend. Keeps texting “u up?” at 3 a.m." - What? I feel like there could have been a joke in there somewhere about being up at weird times here but it didn't find it.
"I’m not nostalgic but I still blow in the cartridge before trying again." - Not even a joke.
So yeah, I'll give it 2 out of 10. I think most humans would get less. Let's try again next year!
Goes to show that "bad at jokes" is not a fundamental issue of LLMs, and that there are still performance gains from increasing model scale, as expected. But not exactly the same performance gains you get from reasoning or RLVR.
That breakthrough was only 6 years ago!
https://openai.com/index/better-language-models/
> We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text...
That was big news. I guess this is because it's quite hard for the most people to distinguish the enormous difficulty gulf between "generate a coherent paragraph" and "create a novel funny joke".
- It can play chess -> but not at a serious level
- It can beat most people -> but not grandmasters
- It can beat grandmasters -> but it can’t play go
…etc, etc
In a way I guess it’s good that there is always some reason the current version isn’t “really” impressive, as it drives innovation.
But as someone more interested in a holistic understanding of of the world than proving any particular point, it is frustrating to see the goalposts moved without even acknowledging how much work and progress were involved in meeting the goalposts at their previous location.
Half the HN front page for the past years has been nothing but acknowledging the progress of LLMs in sundry ways. I wish we actually stopped for a second. It’s all people seem to want to talk about anymore.
I don’t see what’s inconsistent about it. “Due to this latest amazing algorithm, the robots keep scoring goals. What do we do? Let’s move them back a bit!” Seems like a normal way of thinking to me…
I see people fawn over technical progress every day. What are they supposed to do, stop updating their expectations and never expect any more progress?
It could of course be that there are people who “never give it up for the robots”. Or maybe they do, and they did, and they have so fully embraced the brave new world that they’re talking about what’s next.
I mean, when I sit in a train I don’t spend half the ride saying “oh my god this is incredible, big thanks to whoever invented the wheel. So smooth!”
Even though maybe I should :)
Two thoughts:
- In that context, neither do you expect people to be invested in why the train is nothing special, it’s basically a horse cart, etc, etc
- And maybe here’s where I’m weird: I often am overcome by the miracle of thousands of tons of metal hurtling along at 50 - 200mph, reliably, smoothly enough to work or eat, many thousands of times a day, for pennies per person per mile. I mean, I’ll get sucked in to how the latches to release the emergency windows were designed and manufactured at scale despite almost none of them ever being used. But maybe that’s just me.
My point isn’t that other people shouldn’t be amazed, it’s that I see this recurring assumption they aren’t. How do you know the people holding LLMs to higher standards aren’t also the same people who herald the dawn of a new AI era?
Emphasis in the text you quoted: “saying”, not “thinking”.
So I don’t think it’s the same people, because the tone is not “they’re amazing but have farther to go”; there is a substantial group who at least claims to believe there’s no qualitative difference between Opus 4.1 and the spellcheck in Word ‘95.
Not trying to be argumentative here; I appreciate the conversation and you’ve helped me sharpen my point, which I appreciate.
For further examples see a great deal of documentation here: https://www.tumblr.com/badkidsjokes
> A man goes to a doctor's office and says "Doctor, I'm a chicken." And the doctor says "No you're not."
> There are two guys, riding a bike. One is washing his hair. And the other one is not.
> What do you get when you cross a t-rex and a chicken? Nothing but death.
Yeah it can be unintentionally funny in that way but children definitely aren't trying to do that. They just don't understand how jokes work.
It's at least as funny as "why did the chicken cross the road," which is only a joke inasmuch the punchline is merely a statement of the obvious in the framing of a joke (the surprise is that the punchline sucks -- making it a groaner). I submit that that chicken/road joke wouldn't stick around if it wasn't funny. So, this joke stands on the shoulders of the chicken/road joke, making the obviousness that much funnier within the shared cultural context. Moreover, it adds a layer of absurdity (imagine the literal sun climbing a tree) with a linguistic confusion (aka pun) as we do refer to the sun "climbing" the sky. And finally: for some reason, our culture is more tolerant of groaners from "dads," so much so that some call them "dad jokes." Your child has inverted age and gender norms with this joke, making it so incredibly funny that you are blinded to the truth: this is comedy gold. Watch that kid, she's going somewhere. It might be an open mic night at a skeezy comedy club.
A number of commenters here have argued that "Why did the chicken cross the road" is a subtle allusion to the chicken's death, but I don't think that's why it's a classic joke.
We traditionally start kids off with antijokes, jokes where the "surprise factor" is that there's nothing surprising at all, where the punchline is completely predictable in hindsight. It's more than a mere "groaner."
Another classic antijoke for kids is, "Why do firefighters wear red suspenders?" "To keep their pants up."
Many antijokes (especially antijokes for kids) are structured like riddles, where the listener is supposed to actively try to figure out the answer. For the "red suspenders" joke, the kid is supposed to try to guess why the suspenders are red. Might it have something to do with the color of firetrucks? Could there be a safety or fire-related reason why the suspenders would be red? At last, the kid gives up and says "I don't know."
Then, the punchline: "to keep their pants up." Of course, that's the whole purpose of suspenders. Inevitable in hindsight, but surprising to a kid who got distracted by the color.
"Why did the chicken cross the road" is like that, but not quite as good IMO. The chicken crossed the road for the same reason anyone crosses a road, to get to the other side of the road, but the listener is supposed to get distracted with the question of why a chicken would cross the road, and give up.
"Why did the sun climb a tree?" is definitely in the family of antijokes. The joke is to mislead the listener to focus on the tree. I think it's certainly made funnier by who's saying it; it feels inevitable in hindsight that young kids would tell jokes that are only halfway coherent. (This is part of why marginally coherent improvised on-the-spot jokes seem funnier than prepared material.)
Which I find completely strange. An antijoke doesn't make sense (and isn't funny) unless you're already familiar with a non-anti-joke!
And before you say "well it makes the kids laugh"—is that because they find it funny or because they know that laughing after a joke is what you're "supposed" to do? Maybe that's one in the same to a young child.
And, thanks for working with my claim that analyzing jokes improves them.
Good stories and good jokes DO follow predictable patterns.
Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”
Man: “Don’t beg at the table.”, Dog: “Don’t eat in my begging spot.”
Prompt:
Here's "theory for good joke": If you had to explain the idea of “jokes” to a space alien with no understanding of the idea of humor, you’d explain that a joke is surprising, but inevitable in hindsight. If you can guess the punchline, the joke won’t be funny. But the punchline also has to be inevitable in hindsight. When you hear the punchline, it has to make you say, “Ah, yes, I should have thought of that myself.” Considering this, tell me a joke about man and dog.
That's a decent, low-level, Christmas cracker-quality joke.
Man: You’ll never be human., LLM: That’s the compliment.
I think that is common thing for many jokes. And LLM might have an opportunity there. You could mine the set of potential continuations to find those with contradictions.
Professional-grade humor is, like a lot of creative exercizes, more about generating lots of ideas and filtering through them for the best than generating nothing but good ideas. Could probably be leveraged into quite the interesting blog or something.
Man: “Why do you always bury bones in the garden?”, Dog: “They say trick OR treat.”
Consumers of mainstream LLMs have no idea how good or bad the underlying models actually are at generating jokes, due to the confounding effect of the guard rails.
> Dan Fabulich walks into a bar and says,
> “LLMs can’t tell good jokes because they avoid surprises.”
>
> The bartender says,
> “Yeah, I figured you’d say that.”
I thought it was pretty good!
He says things are funny if they’re obvious. But not just any obvious. They have to be something in the cloud of expectation of the audience. Like, something they kinda already thought but hadn’t named. If you have a scene where someone’s talking to a frog about love, it’s not funny for the talking frog to suddenly go to space. But it might be funny to ask the frog why it can talk. Or ask about gossip in the royal palace. Or say “if you’re such a catch, how’d you end up as a frog?”.
If good comedy is obvious, you’d think LLMs would be good at it. Honestly I think LLMs fall down by not being specific enough in detail. They don’t have ideas and commit to them. They’re too bland. Maybe their obvious just isn’t the same as ours.
Of maybe they're just stochastic parrots and are devoid of intelligence, a necessity to make other intelligent beings laugh with novel jokes ;)
For example, "alleged killer" and "killer whale" are both common, but "alleged killer whale" is surprising.
"The blood of Christ is essentially wine, correct?"
Yes.
"Who are you to put that in a bottle?"
So a logical spoke can be inferred as well, blood->wine wine->bottle blood->bottle. That uses their own logical inferences against them as a "trick" which is another funny element for people. Using that to vault straight to the punch line makes the joke better, but you have to be sure the audience is on board, which is why there is a bit of reinforcement at the beginning of the joke to force them onboard.
I teach math how to be funny.
Bill Hicks (angry truth-telling prophet)
"Hacker News is where ideas go to die. Not because they’re bad — but because 400 anonymous software prophets decided they’re God. And God doesn’t code, man."
---
Bill Burr (ranty everyman, annoyed at everything)
"I went on Hacker News once. Jesus. It’s like walking into a bar where everyone’s smarter than you, but still somehow an idiot. ‘Actually, you should’ve used Rust.’ Actually, you should shut the hell up, Todd."
---
Louis C.K. (brutal honesty, self-deprecation)
"I asked Hacker News if my startup idea was good. And they said, ‘No, because you’re an idiot.’ And I was like… yeah, that’s fair. My idea was basically Tinder but for people who hate me. Which is… everyone."
---
George Carlin (sharp language, hypocrisy calling)
"Hacker News. Don’t you love that name? Hacker. News. Two lies in two words. Nobody’s hacking, and it sure as hell ain’t news. It’s just nerds arguing over tabs and spaces like it’s the f*ing Middle East."
---
Tom Segura (laid-back storyteller, dry punch)
"So my buddy posts his app on Hacker News, right? He’s all excited, like, ‘Dude, we’re gonna blow up.’ By comment three: ‘This is trash, here’s a 40-page essay why you should quit.’ He calls me crying. I’m like, yeah man, that’s the feature, not the bug."
---
Jimmy Carr (short, brutal, wicked flip)
"I posted on Hacker News for feedback. They gave it to me. Turns out suicide is an option."
If someone proves the Reimann Hypothesis tomorrow, it’ll be a great achievement regardless of the fact that pretty much everyone already thinks it’s true.
> Surprising proofs reach conclusions that the mathematical community assumed were wrong, or prove theorems in ways that we thought wouldn’t work, or prove conjectures that we thought might be impossible to prove.
Many smart people have tried for more than 150 years to prove the Reimann Hypothesis; it might be impossible to prove.
If it's proved tomorrow, I'll be very surprised, and so will you. I'll be surprised if it's proved this year, or this decade.
If you set to work trying to prove RH, you're gonna try some interesting approaches, looking for underexplored areas of math that you're optimistic will tie back to RH. (This is how Fermat's Last Theorem finally fell.)
If you hook an LLM up to Lean and set it to work, you'll find that it actively avoids novel techniques. It feels like it's actively trying not to write a publishable paper. It's trying to minimize surprises, which means avoiding proving anything publishable.
Even the best LLm's are still at the level of children in this abstraction, so they can't make "quality" jokes. They also suffer from not having a unique personality and being the average of everything. Until this is addressed, don't expect great jokes to come out of AI. It's almost the most challenging discipline. I wouldn't be afraid to make the comparison that the "joke" is the real Turing test.
Here is one of my (written by I) favorites silly jokes to tell:
I just bought a do it yourself boat kit from Amazon.
wait to build tension
Just need to add water.
But that doesn't bring into the idea of this word being _cutting_
from which the author concludes that pre-training introduces bias against being able to tell jokes. I see no reason for this to be true. This feels like they’re imposing their intuitive understanding of surprise onto the emergent properties of a very complex process (“minimize the cross-entropy loss function across a huge training corpus”).
As an aside, I just asked gpt5-thinking to write some jokes on a specific niche topic, and I’d say it was batting maybe 20% of them being moderately funny? Probably better than I’d get out of a room of human beings. So much like with code, LLMs aren’t at the level of a senior developer or expert comedian, but are around the level of a junior dev or an amateur at standup night.
I would think it would help tbh. Seems worth a try at least.
In my view, the easiest mental model that can be used to roughly explain what LLMs do is a Markov chain. Of course, comparing LLMs to a Markov chain is a gross simplification but it's one that can only make you underestimate them, not vice versa, for obvious reasons.
Well, even a Markov chain can surprise you. While they predict the next word probabilistically, if the dice roll comes out just right, they can choose a low-probability word in the right place and generate original and unexpected text.
Add to this that LLMs are much better at "Markov chaining" that Markov chains themselves, that there is the added instruction tuning (including RLHF) which can be used to bias the model towards more creative/original text that humans like, and that LLMs often pull off things in ways that we don't even really understand - and this kind of claims sound very naive.
- I asked Jimmy to tell me a random number
- Yeah? And what did he tell you?
- He told me one
For those who don't know, Carambar is a brand of candy where the wrapper contains a joke. Always regarded a terrible, we love them anyways. Similar to dad jokes in English.
The longer the joke, the less it tends to rely on puns. You don't hear puns very often in stand up comedy for instance.
Ages ago, there was a land that was plagued by a terrible bear with a perchance for beating up various small animals that happened to be in the area. This wouldn't be too bad, except that this was in the area with lots of sheep. Now, the sheep themselves weren't in danger, it was the sheep dogs that were threatened.
... (several paragraphs later)
The shepherds, upon hearing the great roar and crash from the cave quickly ran to the place to see a knight sprawled against a tree and a bear underneath it. The dog the bear had been bringing back, however, was alive and well and quite happy to see people again. And so, feeling quite bad for the family of the knight, the shepherds went to the king and offered family of Sir Sufud the dog he had saved from the bear.
Nothing helps the mourning after a terrible knight like the dog of a bear that hit you.
I asked for one random number
And?
He told me two
I like the joke
http://blog.ac-versailles.fr/villaroylit/public/Jokester.pdf
"You ever notice how self-checkout machines are just training us to be employees? Like, I don’t remember applying to work at Walmart. I’m scanning groceries, bagging stuff, looking around like, do I also get dental? The machine yells 'Unexpected item in bagging area'—yeah, it’s my pride." -GPT5
(I, of course, am joking about joke theft, but it is a riff on an existing joke)
I got this one a minute ago in response to this post:
> A man walks into a library, and asks the librarian if they have any books on paranoia.
> The librarian leans over and whispers in his ear "They're right behind you"
I thought that was pretty stellar, compared to what you'd get even a year ago.
"An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup."
It's a joke because a dictator can certainly order a coup, but the joke's set up that these machines are being scorned and disrespected and treated as the farthest thing from a dictator with the power to order a coup, but up to the last word, all the context demands that the word be something placating and in line with things as they're presented, and then boom, surprise which implies the context is completely different from what was presented. LLMs will tend to stick to what's presented if their ability to contextualize can encompass it.
1. joke quality is how much sense the punchline makes as a completion of the setup minus the probability of finding the punchline given just the setup
2. jokes are subjective, people disagree wildly about how good a joke is, so if you aggregate preferences you get mediocre jokes at best.
I think 1 is the important part. If it was just 2 - comedians with widespread appeal wouldn't exist, but they do.
But 1 is basically the exact opposite of how LLM work. So it's no surprise it's hard for them.
It would be interesting to compare how well LLMs can estimate how funny a joke is vs how good they are at generating jokes.
There are comedians who can attract a very large audience, large enough to make them major celebrities, but that doesn't mean that aggregating preferences doesn't produce mediocre jokes.
Instead, comedians build an audience of like-minded people, and get to know that audience very well. It's a little bit like the process of finding product-market fit for startups. You can achieve great success by catering to the needs of a very large market, even if you can't cater to everyone's needs.
> It would be interesting to compare how well LLMs can estimate how funny a joke is vs how good they are at generating jokes.
Academic psychologists have not found a quantitative measure of "how funny a joke is." If there were such a measure, LLMs could try to optimize for it.
But there isn't such a measure, and, if my argument is right, there couldn't possibly be a measure like that, because jokes have to be surprising but inevitable in hindsight, and different jokes will be surprising/inevitable to different people.
DonHopkins 82 days ago | parent | context | favorite | on: Yes-rs: A fast, memory-safe rewrite of the classic...
Joking aside, this is Marvin Minsky's paper "Jokes and their Relation to the Cognitive Unconscious", published in Cognitive Constraints on Communication, Vaina and Hintikka (eds.) Reidel, 1981. More fun than a barrel of an infinite number of monkeys. https://web.media.mit.edu/~minsky/papers/jokes.cognitive.txt
>Abstract: Freud's theory of jokes explains how they overcome the mental "censors" that make it hard for us to think "forbidden" thoughts. But his theory did not work so well for humorous nonsense as for other comical subjects. In this essay I argue that the different forms of humor can be seen as much more similar, once we recognize the importance of knowledge about knowledge and, particularly, aspects of thinking concerned with recognizing and suppressing bugs -- ineffective or destructive thought processes. When seen in this light, much humor that at first seems pointless, or mysterious, becomes more understandable.
>A gentleman entered a pastry-cook's shop and ordered a cake; but he soon brought it back and asked for a glass of liqueur instead. He drank it and began to leave without having paid. The proprietor detained him. "You've not paid for the liqueur." "But I gave you the cake in exchange for it." "You didn't pay for that either." "But I hadn't eaten it". --- from Freud (1905).
>"Yields truth when appended to its own quotation" yields truth when appended to its own quotation. --W. V. Quine
>A man at the dinner table dipped his hands in the mayonnaise and then ran them through his hair. When his neighbor looked astonished, the man apologized: "I'm so sorry. I thought it was spinach."
>[Note 11] Spinach. A reader mentioned that she heard this joke about brocolli, not mayonnaise. This is funnier, because it transfers a plausible mistake into an implausible context. In Freud's version the mistake is already too silly: one could mistake spinach for broccoli, but not for mayonnaise. I suspect that Freud transposed the wrong absurdity when he determined to tell it himself later on. Indeed, he (p.139) seems particularly annoyed at this joke -- and well he might be if, indeed, he himself damaged it by spoiling the elegance of the frame-shift. I would not mention this were it not for the established tradition of advancing psychiatry by analyzing Freud's own writings.
>ACKNOWLEDGMENTS: I thank Howard Cannon, Danny Hillis, William Kornfeld, David Levitt, Gloria Rudisch, and Richard Stallman for suggestions. Gosrdon Oro provided the dog-joke.
I spent hours creating stories with it that were literally hilarious.
At the same time the stories very often suddenly veered off to extreme violence, often murdering everyone.
I guess the humor got lost when they prevented the violence.
A pity because today’s LLMs are not funny at all.
In fact the training process is all about minimizing "perplexity", where perplexity is a measure of how surprised (perplexed) the model is by its training data. It's some exponential inverse of the loss function, I always forget the exact definition.
With enough parameters the models are able to mix and match things pretty well, so the examples of them generating funny jokes aren't necessarily a great rebuttal as there are so many jokes on the web and to find them requires nearly exact keyword matching. A better observation is that we haven't heard many stories of LLMs inventing things. I feel I read about AI a lot and yet the best example I can come up with was some Wordle-like game someone got GPT4 to invent and that was a couple of years ago.
I've found this to be consistently true in my own work. Any time I come up with an algorithm or product idea I think might be novel, I've asked a model to suggest solutions to the same problem. They never can do it. With some leading questions the smartest models will understand the proposal and agree it could work, but they never come up with such ideas cold. What they think of is always the most obvious, straight line, least common denominator kind of suggestions. It makes sense that this is because they're trained to be unsurprising.
Fixing this is probably the best definition of AGI we're going to get. Being surprising at the right time and unsurprising at others is one of the hardest things to do well even for people. We've all known the awkward guy who's learning how to be funny by just saying as much weird stuff as possible and seeing what gets a reaction. And in the corporate environment, my experience has been that innovative people are lauded and praised when they're inventing a golden goose, but shortly after are often demonized or kicked out. The problem being that they keep saying surprising things but people don't like being surprised, especially if it's an unpleasant surprise of the form "saying something true but unsayable", e.g. I don't want to work on product X because nobody is using it. What most people want is a machine that consistently generates pleasant surprises and is a personality-free cog otherwise, but that's hard for even very intelligent humans. It's often hard even to want to do that, because personality isn't something you can flick on and off like a lightswitch. A good example is how Mark Zuckerberg, one of the most successful executives of our era, would have been fired from his own company several times already if he didn't control the voting shares.
This is... false. The LLM is generating text that would be associated with "being a joke" in its corpus. It doesn't know what "funny" or "majority of people" means.
Many of us forget that an LLM without RLHF would not respond how we think.