Three Years from GPT-3 to Gemini 3
338 points
2 days ago
| 23 comments
| oneusefulthing.org
| HN
mynameisjody
22 hours ago
[-]
Every time I see an article like this, it's always missing --- but is it any good, is it correct? They always show you the part that is impressive - "it walked the tricky tightrope of figuring out what might be an interesting topic and how to execute it with the data it had - one of the hardest things to teach."

Then it goes on, "After a couple of vague commands (“build it out more, make it better”) I got a 14 page paper." I hear..."I got 14 pages of words". But is it a good paper, that another PhD would think is good? Is it even coherent?

When I see the code these systems generate within a complex system, I think okay, well that's kinda close, but this is wrong and this is a security problem, etc etc. But because I'm not a PhD in these subjects, am I supposed to think, "Well of course the 14 pages on a topic I'm not an expert in are good"?

It just doesn't add up... Things I understand, it looks good at first, but isn't shippable. Things I don't understand must be great?

reply
stavros
22 hours ago
[-]
It's gotten more and more shippable, especially with the latest generation (Codex 5.1, Sonnet 4.5, now Opus 4.5). My metric is "wtfs per line", and it's been decreasing rapidly.

My current preference is Codex 5.1 (Sonnet 4.5 as a close second, though it got really dumb today for "some reason"). It's been good to the point where I shipped multiple projects with it without a problem (with eg https://pine.town being one I made without me writing any code).

reply
yread
8 hours ago
[-]
I feel it sometimes tries to be overly correct. Like using BigInts when working with offsets in big files in javascript. My files are big but not 53bits of mantissa big. And no file APIs work with bigints. This was from Gemini 3 thinking btw
reply
gghffguhvc
7 hours ago
[-]
I just whack-a-mole these things in AGENTS.md for a while until it codes more like me.
reply
Sammi
2 hours ago
[-]
Coding LLMs were almost useless for me, until my AGENTS.md crossed some threshold of completeness and now they are mostly useful. I now curate multiple different markdown files in a /docs folder, that I add to the context as needed. Any time the LLM trips on something and we figure it out, then I ask it to document it's learnings in a markdown doc, and voila it can do it correctly from then on.
reply
apwell23
4 hours ago
[-]
> https://pine.town

how many prompts did it take you to make this?

how did you make sure that each new prompt didn't break some previous functionality?

did you have a precise vision for it when you started or did you just go with whatever was being given to you?

reply
GoatInGrey
3 hours ago
[-]
Judging by the site, they don't have insightful answers to these questions. It's broken with weird artifacts, errors, and amateurish console printing in PROD.

https://i.ibb.co/xSCtRnFJ/Screenshot-2025-11-25-084709.png

https://i.ibb.co/7NTF7YPD/Screenshot-2025-11-25-084944.png

reply
stavros
36 minutes ago
[-]
I definitely don't have insightful answers to these questions, just the ones I gave in the sibling comment an hour before yours. How could someone who uses LLMs be expected to know anything, or even be human?

Alas, I did not realize I was being held to the standard of having no bugs under any circumstance, and printing nothing to the console.

I have removed the amateurish log entries, I am pitiably sorry for any offense they may have caused. I will be sure to artisanally hand-write all my code from now on, to atone for the enormity of my sin.

reply
boplicity
1 hour ago
[-]
It also doesn't seem to work right now.
reply
stavros
35 minutes ago
[-]
Yeah, all of the above was a single bug in the plot allocation code, the exception that handled the transaction rollback had the wrong name. It's working again.
reply
stavros
4 hours ago
[-]
> how many prompts did it take you to make this?

Probably hundreds, I'd say.

> how did you make sure that each new prompt didn't break some previous functionality?

For the backend, I reviewed the code and steered it to better solutions a few times (fewer than I thought I'd need to!). For the frontend, I only tested and steered, because I don't know much about React at all.

This was impossible with previous models, I was really surprised that Codex didn't seem to completely break down after a few iterations!

> did you have a precise vision

I had a fairly precise vision, but the LLM made some good contributions. The UI aesthetic is mostly the LLM, as I'm not very good at that. The UX and functionality is almost entirely me.

reply
Madmallard
15 hours ago
[-]
It's not really any different in my experience
reply
mirekrusin
13 hours ago
[-]
Stochastic parrot? Autocomplete on steroids? Fancy autocorrect? Bullshit generator? AI snake oil? Statistical mimicry?

You don't hear that anymore.

Feels like whole generation of skeptics evaporated.

reply
m4nu3l
48 minutes ago
[-]
I think the stochastic part is true and useless. It can be applied to anyone or anything. Yes, the models give you probabilities, but any algorithm gives you probabilities (only zero or one for deterministic ones). You can definitely view the human mind as a complex statistical model of the world.

Now, that being said, do I think they are as good as a skilled human on most things? No, I don't. My trust issues have increased after the GPT-5 presentation. The very first question was to showcase its "PhD-level" knowledge, and it gave a wrong answer. It just happened to be in a field I know enough about to notice, but most didn't.

So, while I think they can be considered as having some form of intelligence, I believe they have more limits than a lot of people seem to realise.

reply
bigstrat2003
12 hours ago
[-]
I certainly hold those opinions still, because the models still have yet to prove they are anything worth a person's time. I don't bother posting that because there's no way an AI hype person and I are ever going to convince each other, so what's the point?

The skeptics haven't evaporated, they just aren't bothering to try to talk to you any more because they don't think there's value in it.

reply
Glemkloksdjf
9 hours ago
[-]
So you don't even try LLMs regularly?

And whats with everything else regarding ML progress like image generation, 3d world generation etc.?

I vibe coded plenty of small things i haven't ever had the time for them. You don't have anything which you wanted to do and can fit in a single page html application? It can even use local storage etc.

reply
notachatbot123
5 hours ago
[-]
Maybe your bubble flew away from those voices? I see them all the time, and am glad.
reply
poulpy123
7 hours ago
[-]
still haven't see something proving it was not autocomplete on steroids or statistical mimicry
reply
Marazan
3 hours ago
[-]
It is all those things.

The Bitter Lesson is with enough VC subsidised compute those things are useful.

reply
rkozik1989
3 hours ago
[-]
Those echoes have grown louder over the past year or so. The only way you've heard less of it is if you buried your head under sand.
reply
deadbabe
1 hour ago
[-]
It is all those things. It consistently fails to make truly novel discoveries, everything it does is derived from something it trained on from somewhere.

No point in arguing about it though with true believers, they will never change their minds.

reply
tempestn
20 hours ago
[-]
Have you tried Gemini 3 yet? I haven't done any coding with it, but on other tasks I've been impressed compared to gpt 5 and Sonnet 4.5.
reply
joegibbs
18 hours ago
[-]
It's very good but it feels kind of off-the-rails in comparison to Sonnet 4.5 - at least with Cursor it does strange things like putting its reasoning in comments that are about 15 lines long, deleting 90% of a file for no real reason (especially when context is reaching capacity) and making the same error that I just told it not to do.
reply
Culonavirus
11 hours ago
[-]
The computer science field is going to be an absolute shitshow within 5 years (it already kinda is). On one side you'll have ADHD dog attention span zoomers trying out all these nth party model apis and tools every 5 seconds (switching them like socks, insisting the latest one is better, but ultimately producing the same slop) and on the other side you'll have all these applied math gurus squeezing out the last bits of usable AI compute on the planet... and nothing else.

We used to joke that "The internet was a mistake.", making fun of the bad parts... but LLMs take the fucking cake. No intelligent beings, no sentient robots, just unlimited amounts of slop.

The tech basically stopped evolving right around the point of it being good enough for spam and slop, but not going any further, there are no cures no new laws of physics or math or anything else being discovered by these things. All AI use in science I can see is based on finding patters in data, not intelligent thought (as in novel ideas). What a bust.

reply
visarga
11 hours ago
[-]
I don't see a big difference to humans, we are saying many unreasonable things too, validation is necessary. If you use internet, books or AI it is your job to test their validity. Anything can be bullshit, written by human or AI.

In fact I fear the humans optimize for attention and cater to the feed ranking Algorithm too much, while AI is at least trying to do a decent job. But with AI it is the responsibility of the user to guide it, what AI does depends on what the user does.

reply
psychoslave
10 hours ago
[-]
There are some major differences though. Without using these tools, individual are pretty limited in how much bullshit they can output for many reasons, including they are not mere digital puppet without need to survive in society.

It’s clear pro-slavery-minded elitists are happy to sell the speech that people should become "good complement to AI", that is even more disposable as this puppets. But unlike this mindless entities, people have will to survive deeply engraved as primary behavior.

reply
ako
8 hours ago
[-]
Humans can output serious amounts of unproven bullshit, e.g., 3000 incompatible gods and all the religions that come with them...
reply
psychoslave
6 hours ago
[-]
Sure, but that’s not raw individual output on its mere direct utterance capacities.

Now anyone mildly capable of using a computer is able to produce many more fictional characters than all that humanity collectively kept in its miscellaneous lores, and drawn them in an ocean of insipid narratives. All that nonetheless mostly passing all the grammatical checkboxes at a level most humans would fail (I definitely would :D).

reply
darkwater
5 hours ago
[-]
How many individuals were involved and over how many years?
reply
ako
5 hours ago
[-]
Why does it matter? If you consider not just the people creating these hallucination, but also the people accepting them and using them, it must be billions and billions...
reply
darkwater
4 hours ago
[-]
and that's the point. You need a critical mass of people buying into something. With LLMs, you just need ONE person with ONE model and a modest enough hardware.
reply
psychoslave
4 hours ago
[-]
https://chat.mistral.ai/chat/8b529b3e-337f-42a4-bf36-34fd9e5...

>Here’s a concise and thoughtful response you could use to engage with ako’s last point:

---

"The scale and speed might be the key difference here. While human-generated narratives—like religions or myths—emerged over centuries through collective belief, debate, and cultural evolution, LLMs enable individuals to produce vast, coherent-seeming narratives almost instantaneously. The challenge isn’t just the volume of ‘bullshit,’ but the potential for it to spread unchecked, without the friction or feedback loops that historically shaped human ideas. It’s less about the number of people involved and more about the pace and context in which these narratives are created and consumed."

reply
visarga
3 hours ago
[-]
But people post on social networks, blogs, newspapers and other widely read places, while LLMs post in chat rooms with 1 reader most of their outputs.
reply
psychoslave
2 hours ago
[-]
No, the web is now full of this bot generated noise.

And even when only considering the tools used in isolated sessions not exposed by default, the most popular ones are tuned to favor engagement and retention over relevance. That's a different point as LLM definitely can be tuned in different direction, but in practice in does matter in terms of social impact at scale. Even prime time infotainment covered people falling in love or encouraged into suicidal loops by now. You're absolutely right is not always the best

reply
ako
10 hours ago
[-]
Completely disagree, what i see agentic coding agents do in combination with LLMs is seriously mind-blowing. I don't care how much knowledge is compressed into an LLM. What is way more interesting is what it does when it misses some knowledge. I see it come up with a plan to create the knowledge by running an experiment (running a script, sometimes asking me to run a script or try something), evaluating the output, and then replan based on the output. Full Plan-Do-Check-Act. Finding answers systematically to things you don't know is way more impressive than remembering lots of stuff.
reply
cons0le
11 hours ago
[-]
The worst part is when the AI spits out dogshit results --people show up at lightspeed in the comments to say how "you're not using it right" / "try this other model, it's better"

Anecdotally, the people I see the most excited about AI are the people that don't do any fucking work. I can create a lot of value with plain ol' for loop style automation in my niche. We're stil nowhere near the limit of what we can do with automation, that I don't give a fuck about what AI can do. Bruh in windows 10 copy and fuckin paste doesn't work for me anymore, but instead of fixing that they're adding AI

reply
HPsquared
10 hours ago
[-]
LLMs help a lot of users with making FOR loops and things like that. At least it's been the case for me, I'd never tried to use PowerShell before but with a bit of LLM guidance was able to cobble together some useful (for me) one-liner commands to do things like "use this CSV of file names and pixel locations, and make cropped PNG thumbnails of these locations from these images".

Stuff like that which regular users often do by hand, they can ask an LLM for the command (usually just a few lines of a scripting language if they only know the magic words to use).

reply
dyauspitr
11 hours ago
[-]
The only people I see complaining about AI are those that have the most to lose.
reply
cons0le
4 hours ago
[-]
Using it isn't optional though, its forced through corporate policy. If my boss would shut up about it that would be enough for me
reply
phantasmish
6 hours ago
[-]
My wife and I are both paid to work on AI products and we both think the whole thing’s only sorta useful in-fact. Not nothing, but… not that much, either.

I’m not worried about AI taking our jobs, I’m worried about the market crash when the reality of the various failed (… to actually reduce payroll) or would’ve-been-cheaper-and-better-without-AI initiatives the two of us have been working on non-stop since this shit started break through the hype of investment and the music stops.

reply
Uptrenda
7 hours ago
[-]
The LLM only reflects the input of what its fed. If the results are unintelligent then so is the input.
reply
amunozo
6 hours ago
[-]
It's been three years of amazing use cases and discoveries, and in those same years we got things like Ozempic. You can be skeptical of all the hyped things that are said that may be exaggerated without negating the good side.
reply
jchallis
5 hours ago
[-]
The patent for Ozempic was filed nearly 20 years ago: https://patents.google.com/patent/US8129343B2/en?oq=US812934...

Ozempic’s FDA approval was in 2017, the same year transformers were invented.

Whatever you can place at LLMs, GLP-1’s aren’t one of them.

reply
digdugdirk
5 hours ago
[-]
Ozempic has nothing to do with LLMs, so I'm a bit confused about the point you're making here?
reply
recursive
4 hours ago
[-]
My chatbot told me that chatbots invented drugs.
reply
KoolKat23
14 hours ago
[-]
imo don't waste your time for coding with Gemini 3. Perhaps worth it if it's something Claude's not helping with, as Gemini 3's reasoning is very good supposedly.
reply
stavros
20 hours ago
[-]
Only a tiny bit, but I should. When you say GPT-5, do you mean 5.1? Codex or regular?
reply
tempestn
13 hours ago
[-]
Sorry, yeah, 5.1 regular chatbot.
reply
stavros
11 hours ago
[-]
Ahh, try 5.1 Codex (with codex cli), it's much better, I've found.
reply
gtirloni
17 hours ago
[-]
Maybe the wtfs per line are decreasing because these models aren't saying anything interesting or original.
reply
stavros
17 hours ago
[-]
No, it's because they write correct code. Why would I want interesting code?
reply
gtirloni
8 hours ago
[-]
Oh, my bad. I still had the comment someone made about the model writing phd-level paper in my head and didn't realize you were talking about code.

Fully agree.

reply
gumaflux
9 hours ago
[-]
:D made my day
reply
Lerc
20 hours ago
[-]
I guess you have a couple of options.

You could trust the expert analysis of people in that field. You can hit personal ideologies or outliers, but asking several people seems to find a degree of consensus.

You could try varying tasks that perform complex things that result in easy to test things.

When I started trying chatbots for coding, one of my test prompts was

    Create a JavaScript function edgeDetect(image) that takes an ImageData object and returns a new ImageData object with all direction Sobel edge detection.  
That was about the level where some models would succeed and some will fail.

Recently I found

    Can you create a webgl glow blur shader that takes a 2d canvas as a texture and renders it onscreen with webgl boosting the brightness so that #ffffff is extremely bright white and glowing,
Produced a nice demo with slider for parameters, a few refinements (hierarchical scaling version) and I got it to produce the same interface as a module that I had written myself and it worked as a drop in replacement.

These things are fairly easy to check because if it is performant and visually correct then it's about good enough to go.

It's also worth noting that as they attempt more and more ambitious tasks, they are quite probably testing around the limit of capability. There is both marketing and science in this area. When they say they can do X, it might not mean it can do it every time, but it has done it at least once.

reply
taurath
20 hours ago
[-]
> You could trust the expert analysis of people in that field

That’s the problem - the experts all promise stuff that can’t be easily replicated. The promises the experts send doesn’t match the model. The same request might succeed and might fail, and might fail in such a way that subsequent prompts might recover or might not.

reply
Lerc
19 hours ago
[-]
The experts I am talking about trusting here are the ones doing the replication, not the ones making the claims.
reply
timschmidt
19 hours ago
[-]
That's how working with junior team members or open source project contributors goes too. Perhaps that's the big disconnect. Reviewing and integrating LLM contributions slotted right into my existing workflow on my open source projects. Not all of them work. They often need fixing, stylistic adjustments, or tweaking to fit a larger architectural goal. That is the norm for all contributions in my experience. So the LLM is just a very fast, very responsive contributor to me. I don't expect it to get things right the first time.

But it seems lots of folks do.

Nevertheless, style, tweaks, and adjustments are a lot less work than banging out a thousand lines of code by hand. And whether an LLM or a person on the other side of the world did it, I'd still have to review it. So I'm happy to take increasingly common and increasingly sophisticated wins.

reply
worble
18 hours ago
[-]
Junior's grow into mids, and eventually into seniors. OSS contributor's eventually learn the codebase, you talk to them, you all get invested in the shared success of the project and sometimes you even become friends.

For me, personally, I just don't see the point of putting that same effort into a machine. It won't learn or grow from the corrections I make in that PR, so why bother? I might as well have written it myself and saved the merge review headache.

Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

reply
mwigdahl
3 hours ago
[-]
I wonder if that difference in mentality is a large part of the pro- vs anti-AI debate.

To me the AI is a very smart tool, not a very dumb co-worker. When I use the tool, my goal is for _me_ to learn from _its_ mistakes, so I can get better at using the tool. Code I produce using an AI tool is my code. I don't produce it by directly writing it, but my techniques guide the tool through the generation process and I am responsible for the fitness and quality of the resulting code.

I accept that the tool doesn't learn like a human, just like I accept that my IDE or a screwdriver doesn't learn like a human. But I myself can improve the performance of the AI coding by developing my own skills through usage and then applying those skills.

reply
timschmidt
17 hours ago
[-]
> It won't learn or grow from the corrections I make in that PR, so why bother?

That does not match my experience. As the codebases I've worked with LLMs on become more opinionated and stylized, it seems to to a better job of following the existing work. And over time the models have absolutely improved in terms of their ability to understand issues and offer solutions. Each new release has solved problems for me that the previous ones have struggled with.

Re: interpersonal interactions, I don't find that the LLM has pushed them out or away. My projects still have groups of interested folk who talk and joke and learn and have fun. What the LLMs have addressed for me in part is the relative scarcity of labor for such work. I'm not hacking on the Linux Kernel with 10,000 contributors. Even with a dozen contributors, the amount of contributed code is relatively low and only in areas they are interested in. The LLM doesn't mind if I ask it to do something super boring. And it's been surprisingly helpful in chasing down bugs.

> Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

Regardless of whether or not that happens, they've already been useful for me for at least 9 months. Since O3, which is the first one that really started to understand Rust's borrow checker in my experience. My measure isn't whether or not it writes code as well as I do, but how productive I am when working with it compared to not. In my measurements with SLOCCount over the last 9 months, I'm about 8x more productive than the previous 15 years without (as long as I've been measuring). And that's allowed me to get to projects which have been on the shelf for years.

This article by an AI researcher I happen to have worked with neatly sums up feelings I've had about comments like yours: https://medium.com/@ahintze_23208/ai-or-you-who-is-the-one-w...

reply
adamors
22 hours ago
[-]
> Things I don't understand must be great?

Couple it with the tendency to please the user by all means and it ends up lieing to you but you won’t ever realise, unless you double check.

reply
JumpCrisscross
16 hours ago
[-]
> Couple it with the tendency to please the user by all means

Why aren't foundational model companies training separate enterprise and consumer models from the get go?

reply
apendleton
22 hours ago
[-]
I think they get to that a couple of paragraphs later:

> The idea was good, as were many elements of the execution, but there were also problems: some of its statistical methods needed more work, some of its approaches were not optimal, some of its theorizing went too far given the evidence, and so on. Again, we have moved past hallucinations and errors to more subtle, and often human-like, concerns.

reply
jrumbut
16 hours ago
[-]
Well, that's why people still have jobs but I appreciate the idea of the post that the neat demo was a coherent paragraph or silly poem. The silly poems were all kind of similar, not very funny, and the paragraphs were a good start but I wouldn't use them for anything important.

Now the tightrope is a whole application or a 14 page paper and the short pieces of code and prose are now professional quality more often than not. That's some serious progress.

reply
leeoniya
4 hours ago
[-]
> But because I'm not a PhD in these subjects, am I supposed to think, "Well of course the 14 pages on a topic I'm not an expert in are good"?

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

reply
seidleroni
6 hours ago
[-]
The author actually discusses the results of the paper. He's not some rando but a Wharton Professor and when he is comparing the results to a grad student, it is with some authority.

"So is this a PhD-level intelligence? In some ways, yes, if you define a PhD level intelligence as doing the work of a competent grad student at a research university. But it also had some of the weaknesses of a grad student. The idea was good, as were many elements of the execution, but there were also problems..."

reply
monooso
21 hours ago
[-]
The author goes into the strengths and weaknesses of the paper later in the article.
reply
brightball
21 hours ago
[-]
I keep trying out different models. Gemini 3 is pretty good. It’s not quite as good at one shotting answers as Grok but overall it’s very solid.

Definitely planning to use it more at work. The integrations across Google Workspace are excellent.

reply
visarga
11 hours ago
[-]
You don't use it that way. You use it to help you build and run experiments, and help you discuss your findings, and in the end helps you write your discoveries. You provide the content, and actual experiments provide the signal.
reply
ManlyBread
11 hours ago
[-]
Like clockwork. Each time someone criticizes any aspect of any LLM there's always someone to tell that person they're using the LLM wrong. Perhaps it's time to stop blaming the user?
reply
matwood
8 hours ago
[-]
You wouldn't use a screwdriver to hammer a nail. Understanding how to use a tool is part of using the tool. It's early days and how to make the best use of these tools is still being discovered. Fortunately a lot of people are experimenting on what works best, so it only takes a little bit of reading to get more consistent results.
reply
bradly
4 hours ago
[-]
What if the company selling the screwdriver kept telling you your could use it as a hammer? What if you were being bombarded with marketing the hammers are being replaced by screwdrivers?
reply
becquerel
10 hours ago
[-]
You can recognise that the technology has a poor user interface and is wrought with subtleties without denying its underlying capabilities. People misuse good technology all the time. It's kind of what users do. I would not expect a radically new form of computing which is under five years old to be intuitive to most people.
reply
sandspar
10 hours ago
[-]
If someone says that they can't get a camera to work, you tell them how to fix it, right? I can't think of what other response is appropriate.
reply
ManlyBread
9 hours ago
[-]
Why would their response be appropriate when even the creators of the LLM doesn't clearly state the purpose of their software, yet alone instruct users how to use it? The person I replied to said that this software should be used yo "help you build and run experiments, and help you discuss your findings, and in the end helps you write your discoveries" - I dare anyone to find any mention of this workflow being the "correct" way of using any LLM in the LLM's official documentation.
reply
Herring
22 hours ago
[-]
I think the point is we’re getting there. These models are growing up real fast. Remember 54% of US adults read at or below the equivalent of a sixth-grade level.
reply
lm28469
22 hours ago
[-]
> Remember 54% of US adults read at or below the equivalent of a sixth-grade level.

The sane conclusion would be to invest in education, not to dump hundreds of billions of llms, but ok

reply
daedrdev
21 hours ago
[-]
Education is not just a funding issues. Policy choices, like making it impossible for students to fail which means they have no incentive to learn anything, can be more impactful.
reply
taurath
20 hours ago
[-]
But holy shit is it also a funding issue when teachers make nothing.
reply
gilfoy
19 hours ago
[-]
As far as I understand it, the problem isn’t that teachers are shit. Giving more money would bring in better teachers, but I don’t know that they’d be able to overcome the other obstacles
reply
BoiledCabbage
17 hours ago
[-]
> Giving more money would bring in better teachers, but I don’t know that they’d be able to overcome the other obstacles

Start with the easiest thing to control? Of giving more money and see what it does?

We seem to believe in every other industry that to get the best talent pay a high salary salary, but for some reason we expect teachers to do it out of compassion for the children while they struggle to pay bills. It's absurd.

Probably one of the single most important responsibilities of a society is to prepare the next generation, and it pays enormous return. But because we can't measure it with quarterly profits we just ignore it.

The rate of return on providing society with as good education is insane.

reply
anomaly_
16 hours ago
[-]
I think you need to research the issue more. Teachers are well remunerated in most states. Educational outcomes are largely a function of policy settings. Have a look at the amazing turnaround in literacy rates in Mississippi after they started teaching phonics again.
reply
throwawaylaptop
17 hours ago
[-]
I date a lot of teachers. My last one was in the San Ramon (CA) Valley School district, she makes about $90k a year at 34 years old. Talking to her basically makes me want to homeschool my kids to make sure someone like her isn't their teacher. Paying teachers more won't do ANYTHING until we become a lot more selective about who gets to become and stay a teacher. It can't be like most government jobs where getting it is like winning the lottery and knowing you can make above market money for below market performance.
reply
taurath
13 hours ago
[-]
Its interesting to hear you say that you date a lot of teachers while simultaneously holding this view of their level of competence. Or just not the ones you date?
reply
baq
10 hours ago
[-]
I guess the problem isn't only the pay, it's the opportunity cost which only a certain kind of people are willing to pay for the whole career. If you select those people out... you're left with zero candidates.
reply
michaelmrose
17 hours ago
[-]
There is so much wrong with this. You cannot judge the class of teachers based on a small sample of your taste in women. You didn't actually communicate anything materially wrong with her. You listed a high income area to make us think teachers are overpaid but we have no insight by default into median income in the area or her qualifications.

Lastly its entirely impossible to attract better candidates without more money its just not how the world works.

For reference the median household income in san ramon is about 200k so 2 teachers would be below average. A cop with her experience in the same town makes 158k

reply
pessimizer
8 hours ago
[-]
This is so basic that I feel I shouldn't need to say it, but you can't be selective if you don't pay. You take what you get.

The reason teaching became largely a women's profession when they used to be exclusively men is because we wanted to make education universal and free so we did that by paying less, and women who needed to work also had to take what they could get. The reason it has become a moron's profession is because we have made it uniquely undesirable. If you think that teachers should be amazing and imminently qualified and infinitely safe to have around children, pay them like programmers.

Instead, the middle-class meme is to pay them nothing, put them in horrible conditions, and resent them too. Typical "woman's work" model.

reply
wing-_-nuts
3 hours ago
[-]
>The reason teaching became largely a women's profession when they used to be exclusively men is because we wanted to make education universal and free so we did that by paying less, and women who needed to work also had to take what they could get.

Do you have any source on the assertion that being a teacher used to pay more? Because to my knowledge it has never been a high paying profession.

reply
suddenlybananas
8 hours ago
[-]
If teachers made as much as half the people on this site, perhaps things would be better. 90k in San Ramon is more or less the median wage [1]. It's not _that_ much money.

[1] https://en.wikipedia.org/wiki/San_Ramon,_California#2020_cen...

reply
throw45678943
23 minutes ago
[-]
Who knows? Maybe with the way AI is going that will be considered a lot of money compared to what people earn on this site.

As in what people generally earn on this site will crash way down and be outsourced to these models. I'm already seeing it personally from a social perspective - as a SWE most people I know (inc teachers in my circle) look at me like my days are numbered "cause of AI".

reply
brightball
21 hours ago
[-]
Investing in education is a trap because no matter how much money is pumped into the current model, it’s not making a difference.

We need different models and then to invest in the successes, over and over again…forever.

reply
thewebguyd
20 hours ago
[-]
Because education alone in a vacuum won't fix the issues.

Even if the current model was working, just continuing to invest money in it while ignoring other issues like early childhood nutrition, a good and healthy home environment, environmental impacts, etc. will just continue to fail people.

Schooling alone isn't going to help the kid with a crappy home life, with poor parents who can't afford proper nutrition, and without the proper tools to develop the mindset needed to learn (because these tools were never taught by the parents, and/or they are too focused on simply surviving).

We, as a society, need to stop allowing people to be in a situation where they can't focus on education because they are too focused on working and surviving.

reply
brightball
19 hours ago
[-]
Exactly correct.
reply
krainboltgreene
18 hours ago
[-]
It's so hilarious to look at 10k years of education history and be like "Nah, funding doesn't make a difference."

Incredible.

reply
brightball
16 hours ago
[-]
The US already spends more per student than almost any other country (5th globally) and the outcomes are getting constantly worse.

It’s not a funding problem.

reply
Herring
4 hours ago
[-]
That's half true. You have to think about cost of living, you can't just compare across the globe like that. And especially opportunity cost. In the US, teacher pay lags behind similarly educated professionals.

But you're right after a certain point other factors matter more than simple $ per student. Unfortunately one of those factors is teacher pay <=> teacher quality.

reply
reverius42
11 hours ago
[-]
A lot of that funding in the US goes to pay teachers money they then use to pay for health insurance -- which in other countries is often provided by the tax base at large and not counted as an education expense.
reply
krainboltgreene
1 hour ago
[-]
It's incredibly unfair that you get to just lie online or worse that you actually believe what you're saying.
reply
acheron
21 hours ago
[-]
Education funding is highest in places that have the worst results. Try again.
reply
lm28469
10 hours ago
[-]
Yes for example is its very well known that Angola has a top tier education system while Swedish people can barely read or count
reply
wing-_-nuts
2 hours ago
[-]
Well, if you actually look at the data:

https://nces.ed.gov/programs/coe/indicator/cmd/education-exp...

We spend far more than most countries per pupil, for much poorer results

https://worldpopulationreview.com/country-rankings/pisa-scor...

It's pretty clear that while spending is a factor, it's probably not the biggest one. The countries that seem to do best are those that combine adequate funding with real rigor in instruction.

reply
Herring
45 minutes ago
[-]
I posted elsewhere you can't just compare across the globe like that. You have to think about cost of living and especially opportunity cost. In the US, teacher pay lags behind similarly educated professionals, which means they get stretched thin and the best with options will leave.
reply
krainboltgreene
18 hours ago
[-]
Just flatly not true.
reply
mythrwy
18 hours ago
[-]
New Mexico (where I live) is dead last in education out of all 50 states. They are currently advertising for elementary school teachers between 65-85K per year. Summers off. Nice pension. In this low cost of living state that is a very good salary, particularly the upper bands.

I don't think it's a money issue at this point.

reply
Izikiel43
20 hours ago
[-]
Because they use whole language theory (https://en.wikipedia.org/wiki/Whole_language) instead of phonics for teaching how to read.
reply
Herring
22 hours ago
[-]
In theory yeah, but in practice 54% will also vote against funding education. Catch-22.
reply
Izikiel43
20 hours ago
[-]
In WA they always pass levies for education funding at local and state level however results are not there.

Mississipi is doing better on reading, the biggest difference being that they use phonics approach to teaching how to read, which is proven to work, whereas WA uses whole language theory (https://en.wikipedia.org/wiki/Whole_language), which is a terrible idea I don't know how it got traction.

So the gist of it, yes, spend on education, but ensure that you are using the right tools, otherwise it's a waste of money.

reply
Herring
1 hour ago
[-]
I almost agree, but too many people will take that to mean “we need to do more with less”. It’s a feature of capitalism. Teachers are stretched thin in most places, that’s always the main problem. Are WA teachers compensated about the same as other similarly educated professionals? As cops?

Hire smart motivated people, pay them well, leave them alone, they’ll figure this one out. It’s not hard, anyone can google what Finland does.

reply
forgotoldacc
19 hours ago
[-]
First time hearing of whole language theory, and man, it sounds ridiculous. Sounds similar to the old theory that kids who aren't taught a language at all will simply speak perfect Hebrew.
reply
tehjoker
21 hours ago
[-]
Not true, most people are not upper-middle class anti-tax wackos. They benefit from those people being taxed.
reply
cj
21 hours ago
[-]
In my own social/family circle, there’s no correlation between net worth and how someone leans politically. I’ve never understood why given the pretty obvious pros/cons (amount paid in taxes vs. benefits received)
reply
wmeredith
21 hours ago
[-]
The electorate in the U.S. commonly votes against its own interests.
reply
krainboltgreene
18 hours ago
[-]
Pithy, but not true.
reply
mavhc
21 hours ago
[-]
That's why you phrase it as "woke liberals turning your children gay!"

In USA K-12 education costs about $300k

350 million people, want to get 175 million of them better educated, but we've already spent $52 trillion dollars on educating them so far

reply
tehjoker
21 hours ago
[-]
The people most vociferously for conservative values are middle class, small business owners, or upper class, though the true upper class are libertine (notice who participated in the Epstein affair). The working class is filled with all kinds of very diverse people united by the fact they have to work for a living and often can't afford e.g. expensive weddings. Some of them are religious, a whole bunch aren't. It's easy to be disillusioned with formal institutions that seem to not care at all about you.

Unfortunately, a lot of these people have either concluded it is too difficult to vote, can't vote, or that their votes don't matter (I don't think they're wrong). Their unions were also destroyed. Some of them vote against their interests, but it's not clear that their interests are ever represented, so they vote for change instead.

reply
reverius42
11 hours ago
[-]
> Their unions were also destroyed.

By policy changes giving unions less power, enacted by politicians that were mostly voted for by a majority, which is mostly composed of the working class. Was this people voting against their interests? (Almost literally yes, but you could argue that their ideological preference for weaker unions trumps their economic interest in stronger unions.)

reply
tehjoker
2 hours ago
[-]
If your choices in an election are pre-selected, was it democratic?
reply
Herring
6 hours ago
[-]
"Thus, a caste system makes a captive of everyone within it."
reply
Izikiel43
20 hours ago
[-]
It's not just investing in education, it's using tools proven to work. WA spends a ton of money on education, and on reading Mississipi, the worst state for almost every metric, has beaten them. The difference? Mississipi went hard on supporting students and using phonics which are proven to work. WA still uses the hippie theory of guessing words from pictures (https://en.wikipedia.org/wiki/Whole_language) for learning how to read.
reply
tsss
20 hours ago
[-]
You don't need an educated workforce if you have machines that can do it reliably. The more important question is: who will buy your crap if your population is too poor due to lack of well paying jobs? A look towards England or Germany has the answer.
reply
wing-_-nuts
2 hours ago
[-]
The top 10% of households already account for more than half of consumer spending in the US
reply
akavi
1 hour ago
[-]
Hmmm, that doesn't seem right. I'm having a hard time finding an actual consumption number, but I am confident it's well below 50%.

The top 10% of households by wage income do receive ~50% of pre-tax wage income, but:

1) our tax system is progressive, so actual net income share is less

2) there's significant post-wage redistribution (social security/medicaid)

3) that high income households consume a smaller percent of their net income is a well established fact.

reply
jlawson
20 hours ago
[-]
Unfortunately, people are born with a certain intellectual capacity and can't be improved beyond that with any amount of training or education. We're largely hitting peoples' capacities already.

We can't educate someone with 80 IQ to be you; we can't educate you (or I) into being Einstein. The same way we can't just train anyone to be an amazing basketball player.

reply
wing-_-nuts
2 hours ago
[-]
From what I've read, IQ is one of the more heritable traits, but only about 50% of one's intelligence is attributable to one's genes.

That means there are absolutely still massive benefits to be had in trying to ensure that kids grow up in safe, loving homes, with proper amounts of stimulation and enrichment, and are taught with a growth, not a fixed potential mindset.

Sad to say, but your own fixed mindset probably held you back from what you could truly achieve. You don't have to be Einstein to operate on the cutting edge of a field, I think most nobel prize winners have an iq of ~ 120

reply
tptacek
16 hours ago
[-]
This is extremely not settled science. Education in fact does improve IQ and we don't know how fixed intelligence is and how it responds to different environmental cues.
reply
ragequittah
17 hours ago
[-]
Other countries have better outcomes. I doubt it's just because of the genetics.
reply
Herring
19 hours ago
[-]
https://en.wikipedia.org/wiki/Comparative_advantage

Modern society benefits a lot from specialization. It's like the dumbest kid in France is still better at French than you.

reply
PostOnce
21 hours ago
[-]
A question for the not-too-distant future:

What use is an LLM in an illiterate society?

reply
jcheng
21 hours ago
[-]
Automatic speech recognition and speech to text models are also growing up real fast.
reply
PostOnce
21 hours ago
[-]
But will an illiterate person be able to articulate themselves well enough to get the LLM to do what they want, even with a speech interface?

Will they possess the skills (or even the vocabulary) to understand the output?

We won't know for another 20 years, perhaps.

reply
bossyTeacher
12 hours ago
[-]
Thinking that speech recognise is a solution to the illiterate is like thinking that low code tools can replace traditional programming tools. The bottleneck is and has always been the cognitive capacity limits of your average human. No interface can solve the issue of humans being illiterate
reply
AdieuToLogic
17 hours ago
[-]
> What use is an LLM in an illiterate society?

The ability to feign literacy such that critical thought and ability to express same is not a prerequisite.

reply
throw310822
20 hours ago
[-]
Absurd question. The correct one is "what use is an illiterate in an LLM society".
reply
eckesicle
8 hours ago
[-]
> It just doesn't add up... Things I understand, it looks good at first, but isn't shippable. Things I don't understand must be great?

It’s like the Gell-Mann amnesia effect applied to AI. :)

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

reply
cgh
21 hours ago
[-]
This is a variation of the Gell-Mann amnesia effect: https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
reply
meindnoch
20 hours ago
[-]
One could say, the GeLLMann amnesia effect. ( ͡° ͜ʖ ͡°)
reply
nbupadhya
9 hours ago
[-]
Thanks for introducing me this article
reply
secondbreakfast
18 hours ago
[-]
Loads of AI chatter is the Murray Gell-Mann Amnesia Effect on steroids
reply
tsss
20 hours ago
[-]
For what it's worth I have been using Gemini 2.5/3 extensively for my masters thesis and it has been a tremendous help. It's done a lot of math for me that I couldn't have done on my own (without days of research), suggested many good approaches to problems that weren't on my mind and helped me explore ideas quickly. When I ask it to generate entire chapters they're never up to my standard but that's mostly an issue of style. It seems to me that LLMs are good when you don't know exactly what you want or you don't care too much about the details. Asking it to generate a presentation is an utter crap shoot, even if you merely ask for bullet points without formatting.
reply
ammbauer
11 hours ago
[-]
> It's done a lot of math for me that I couldn't have done on my own (without days of research),

Isn't the point of doing the master's thesis that you do the math and research, so that you learn and understand the math and research?

reply
ragequittah
9 hours ago
[-]
I bet they were talking about how people didn't do long division when the calculator first came out too. Is using matlab and excel ok but AI not? Where do we draw the line with tools?
reply
bleepblap
2 hours ago
[-]
OP said they "generated entire chapters"
reply
pojzon
22 hours ago
[-]
Truth is you still need human to review all of it, fix it where needed, guide it when it hallucinate and write correct instructions and prompts.

Without knowledge how to use this “PROBALISTIC” slot machine to have better results ypu are only wasting energy those GPUs need to run and answer questions.

Majority of ppl use LLMs incorrectly.

Majority of ppl selling LLMs as a panacea for everyting are lying.

But we need hype or the bubble will burst taking whole market with it, so shuushh me.

reply
gallerdude
1 day ago
[-]
It is interesting that most of our modes of interaction with AI is still just textboxes. The only big UX change in that the last three years has been the introduction of the Claude Code / OpenAI Codex tools. They feel amazing to use, like you're working with another independent mind.

I am curious what the user interfaces of AI in the future will be, I think whoever can crack that will create immense value.

reply
Herring
23 hours ago
[-]
Text is very information-dense. I'd much rather skim a transcript in a few seconds than watch a video.

There's a reason keyboards haven't changed much since the 1860s when typewriters were invented. We keep coming up with other fun UI like touchscreens and VR, but pretty much all real work happens on boring old keyboards.

reply
Ancapistani
3 hours ago
[-]
I’ve been using ChatGPT Atlas since release on my personal laptop. I very often have it generate a comprehensive summary for YouTube videos, so I don’t have to sit there and watch/scrub a half hour video when a couple of pages of text contains the same content.
reply
srmatto
21 hours ago
[-]
Here's an old blog post that explores that topic at least with one specific example: https://www.loper-os.org/?p=861

The gist is that keyboards are optimized for ease of use but that there could be other designs which would be harder to learn but might be more efficient.

reply
AdieuToLogic
17 hours ago
[-]
>> There's a reason keyboards haven't changed much since the 1860s when typewriters were invented.

> The gist is that keyboards are optimized for ease of use but that there could be other designs which would be harder to learn but might be more efficient.

Here's a relevant trivia question; assuming a person has two hands with five digits each, what is the largest number they can count to using only same?

Answer: (2 ** 10) - 1 = 1023

Ignoring keyboard layout options (such as QWERTY vs DVORAK), IMHO keyboards have the potential for capturing thought faster and with a higher degree of accuracy than other forms of input. For example, it is common for touch-typists to be able to produce 60 - 70 words per minute, for any definition of word.

Modern keyboard input efficiency can be correlated to the ability to choose between dozens of glyphs with one or two finger combinations, typically requiring less than 2cm of movement to produce each.

reply
lifthrasiir
16 hours ago
[-]
Only if individual digits can be articulated separately from each other. Human anatomy limits what is actually possible. Also synchronization is a big problem in chorded typing; good typists can type more than 10 strokes per second, but no one can type 10 chords (synchronous sets of strokes) per seconds I think.
reply
onion2k
15 hours ago
[-]
No matter how good a keyboard we might be able to invent it'll always be slower than a direct brain interface, and we have those, in a highly experimental way, now.

One day we will look back at improvements to keyboards and touchscreens as the 'faster horse' of the physical interface era.

reply
myrmidon
3 hours ago
[-]
I'm not convinced, because all a keyboard really costs you is latency, while almost every human-machine interaction is actually bandwidth limited (by human output).

Even getting zero latency from a perfect brain-machine interface would not make you meaningfully faster at most things I'd assume.

reply
wing-_-nuts
2 hours ago
[-]
Yeah I noticed this as I became a faster typer. I very often find myself 'buffering' on choosing the right words / code more than I do on my typing speed.
reply
Mistletoe
15 hours ago
[-]
And anyone that has ever tried to talk to Siri or Alexa would prefer a keyboard for anything but the most simple questions. I don't think that will change for a long time if ever. The lack of errors and being able to say exactly what you want is so valuable.
reply
timschmidt
23 hours ago
[-]
Unix CLI utilities have been all text for 50 years. Arguably that is why they are still relevant. Attempts to impose structured data on the paradigm like those in PowerShell have their adherents and can be powerful, but fail when the data doesn't fit the structure.

We see similar tendency toward the most general interfaces in "operator mode" and similar the-AI-uses-the-mouse-and-keyboard schemes. It's entirely possible for every application to provide a dedicated interface for AI use, but it turns out to be more powerful to teach the AI to understand the interfaces humans already use.

reply
emodendroket
18 hours ago
[-]
PowerShell is completely suitable. People are just used to bash and don’t feel the incentive to switch, especially with Windows becoming less relevant outside of desktop development.
reply
TheRoque
17 hours ago
[-]
Powershell feels like it's not built to be used in a practical way, unlike Unix tools that have been built and used by and for developers, which then feels nice because they are actually used a lot, and feel good to use.

Like, to set an env variable permanently, you either have to go through 5 GUI interfaces, or use this PS command:

[Environment]::SetEnvironmentVariable ("INCLUDE", $env:INCLUDE, [System.EnvironmentVariableTarget]::User)

Which is honeslty horrendous. Why the brackets ? Why the double columns ? Why the uppercases everywhere ? I get that it's trying to look more "OOP-ish" and look like C#, but nobody wants to work with that kind of shell script tbh. It's just one example, but all the powershell commands look like this, unless they have been aliased to trick you to think windows go more unixish

reply
lenkite
16 hours ago
[-]
First, that expression is overly complicated, shorten to:

    [Environment]::SetEnvironmentVariable($name, $value, "User")
You have un-necessarily used a full constant to falsely present it more complex. Please also note that you have COMPLETION. You are not forced to type that out. Second, you can use an alternative

    Set-Item HKCU:\Environment\MY_VAR "some value"
Third, if you still find it too long, wrap it in a function:

    function setenv($name, $value) {
       [Environment]::SetEnvironmentVariable($name, $value, "User")
    }

    setenv MY_VAR "some value"

Also, can you please tell the incantation for setting an env variable permanently in bash ? You cannot since it doesn't exist.

Powershell's model is far superior to Bash. It is not even a contest.

reply
bigstrat2003
12 hours ago
[-]
What feels good to use is very, very dependent on personal preference. I think Powershell is much more pleasant to use than bash. You obviously disagree, but bear in mind that not everyone shares your preferences.
reply
emodendroket
16 hours ago
[-]
No, they don't all look like that, the brackets are an indication you're reaching into .NET and calling .NET stuff instead of "native" PowerShell commands which take the form Verb-Noun. Which can be a legitimate thing to do, but isn't the first choice and seems like an example deliberately chosen to make PS look more awkward than it is. I question whether, for this particular example, `echo 'export MY_VAR="my_value"\n' >> ~/.bashrc && source ~/.bashrc` is really all that intuitive either (and hopefully you didn't accidentally write `>` instead of `>>` and nuke the rest of the file).
reply
cma
16 hours ago
[-]
It took a long time for Powershell to write files with the same encoding it reads them by default. Very confusing until then.
reply
oblio
23 hours ago
[-]
Yet the most popular platforms on the planet have people pointing a finger (or several) at a picture.

And the most popular media format on the planet is and will be (for the foreseeable future), video. Video is only limited by our capacity to produce enough of it at a decent quality, otherwise humanity is definitely not looking back fondly at BBSes and internet forums (and I say this as someone who loves forums).

GenAI will definitely need better UIs for the kind of universal adoption (think smartphone - 8/9 billion people).

reply
Vegenoid
22 hours ago
[-]
> Video is only limited by our capacity to produce enough of it at a decent quality, otherwise humanity is definitely not looking back fondly at BBSes and internet forums

Video is limited by playback speed. It is a time-dependent format. Efforts can be made to enable video to be viewable at a range of speeds, but they are always somewhat constrained. Controlling video playback to slow down and rewatch certain parts is just not as nice as dealing with the same thing in text (or static images), where it’s much easier to linger and closely inspect parts that you care more about or are struggling to understand. Likewise, it’s easier to skim text than video.

This is why many people prefer transcripts, or articles, or books over videos.

I seriously doubt that people would want to switch text-based forums to video if only video were easier to make. People enjoy writing for the way it inspires a different kind of communication and thought. People like text so much that they write in journals that nobody will ever see, just because it helps them organize their thoughts.

reply
oblio
5 hours ago
[-]
You (and I) live in entirely different world from that of regular people, who read at most 1 book per year and definitely do not write journals that nobody will ever see.

You're talking about 10-20% of the population, at most.

reply
Al-Khwarizmi
20 hours ago
[-]
WhatsApp is primarily a text-based chat interface and it has pretty much universal adoption in the countries where it's popular.
reply
oblio
13 hours ago
[-]
I guess you haven't been in enough group chats. There's a reason it has dedicated emoji button with GIFs, avatars, stickers.
reply
bergheim
3 hours ago
[-]
Which is, of course, not how people primarily communicate on WhatsApp (or any communication based app). People don't send streams of videos to each other in group chats. They write text and add gif memes.
reply
Ancapistani
3 hours ago
[-]
I get what you’re saying here, and you’re right that other UIs will be a big deal in the near future… but I don’t think it’s fair to say “just” textboxes.

This is HN. A lot of us work remotely. Speaking for myself, I much prefer to communicate via Slack (“just a textbox”) over jumping into a video call. This is especially true with technical topics, as text is both more dense and far more clear than speech in almost all cases.

reply
joegibbs
16 hours ago
[-]
When we have really fast and good models it will be able to generate a GUI on the fly. It could probably be done now with a fine-tune on some kind of XML-based UI schema or something. I gave it a try but couldn't figure it out entirely, consistency would be an issue too.
reply
in-silico
14 hours ago
[-]
Google is already doing this with Gemini:

https://research.google/blog/generative-ui-a-rich-custom-vis...

I don't know if/when it will actually be in consumers hands, but the tech is there.

reply
michaelanckaert
11 hours ago
[-]
Personally I find the information density of text to be the "killer feature". I've tried voice interaction (even built some AI Voice Agents) and while they are very powerful, easy to use and just plain cool, they are also slow. Nothing beats skimming over a generated text response and just picking out chunks of text, going back and forth, rereading, etc. Text is also universal, I can't copy-paste a voice response to another application/interface or iterate over it.

My personal view is that the search for a better AI User Interface is just the further dumbing down of the humans who use these interface. Another comment mentioned that the most popular platforms are people pointing fingers at pictures and without a similar UI/UX AI would never reach such adoption rates, but is that what we want? Monkeys pointing at colorful picture blobs?

reply
visioninmyblood
1 day ago
[-]
I agree i think specifically the world is multi modal. Getting a chat to be truly multi modal .i.e interacting with different data types and text in an unified way is going to be the next big thing. Mainly given how robotics is taking off 3d might be another important aspect to it. At vlm.run we are trying to make this possible how to combine VLM's and LLM's in a seem less way to get the best UI. https://chat.vlm.run/c/3fcd6b33-266f-4796-9d10-cfc152e945b7
reply
in-silico
14 hours ago
[-]
The next step (and I am not claiming it's the right one) is probably "Generative UI" where the model creates website-like interfaces on the fly.

Google seems to be making good progress [1] and it seems like only a matter of time before it reaches consumers.

1. https://research.google/blog/generative-ui-a-rich-custom-vis...

reply
jstummbillig
21 hours ago
[-]
People get a little too hung up on finding the AI UI. It does not seem all necessary that the interfaces will be much different (while the underlying tech certainly will be).

Text and boxes and tables and graphs is what we can cope with. And while the AI is going to change much, we are not.

reply
remir
17 hours ago
[-]
Grok has been integrated into Tesla vehicles, and I've had several voice interactions with it recently. Initially, I thought it was just a gimmick, but the voice interactions are great and quite responsive. I've found myself using it multiple times to get updates on the news or quick questions about topics I'm interested in.
reply
empath75
1 day ago
[-]
ChatGPT's voice is absolutely amazing and I prefer it to text for brainstorming.
reply
vessenes
1 day ago
[-]
Ooooh, it bothers me, so, so, so much. Too perky. Weirdly casual. Also, it's based on the old 4o code - sycophancy and higher hallucinations - watch out. That said, I too love the omni models, especially when they're not nerfed. (Try asking for a Boston, New York, Parisian, Haitian, Indian and Japanese accent from 4o to explore one of the many nerfs they've done since launch)
reply
wmeredith
21 hours ago
[-]
I think the commenter you're replying to was talking about dictating to ChatGPT, which I also find extremely useful.
reply
Humorist2290
1 day ago
[-]
> Again, we have moved past hallucinations and errors to more subtle, and often human-like, concerns.

From my experience we just get both. The constant risk of some catastrophic hallucination buried in the output, in addition to more subtle, and pervasive, concerns. I haven't tried with Gemini 3 but when I prompted Claude to write a 20 page short story it couldn't even keep basic chronology and characters straight. I wonder if the 14 page research paper would stand up to scrutiny.

reply
acters
23 hours ago
[-]
I feel like hallucinations have changed over time from factual errors randomly shoehorned into the middle of sentences to the LLMs confidently telling you they are right and even provide their own reasoning to back up their claims, which most of the time are references that don't exist.
reply
njovin
21 hours ago
[-]
I recently tasked Claude with reviewing a page of documentation for a framework and writing a fairly simple method using the framework. It spit out some great-looking code but sadly it completely made up an entire stack of functionality that the framework doesn't support.

The conventions even matched the rest of the framework, so it looked kosher and I had to do some searching to see if Claude had referenced an outdated or beta version of the docs. It hadn't - it just hallucinated the funcionality completely.

When I pointed that out, Claude quickly went down a rabbit-hole of writing some very bad code and trying to do some very unconventional things (modifying configuration code in a different part of the project that was not needed for the task at hand) to accomplish the goal. It was almost as if it were embarrassed and trying to rush toward an acceptable answer.

reply
jaccola
18 hours ago
[-]
I've noticed the new OpenAI models do self contradiction a lot more than I've ever noticed before! Things like:

- Aha, the error clearly lies in X, because ... so X is fine, the real error is in Y ... so Y is working perfectly. The smoking gun: Z ...

- While you can do A, in practice it is almost never a good idea because ... which is why it's always best to do A

reply
k__
1 hour ago
[-]
Yeah.

I worked with Grok 4.1 and it was awesome until it wasn't.

It told me to build something, just to tell me in the end that I could do it smaller and cheaper.

And that multiple times.

Best reply was the one that ended with something algong the lines of "I've built dozens of them!"

reply
SomewhatLikely
13 hours ago
[-]
I've seen it so this too. I had it keeping a running tally over many turns and occasionally it would say something like: "... bringing the total to 304.. 306, no 303. Haha, just kidding I know it's really 310." With the last number being the right one. I'm curious if it's an organic behavior or a taught one. It could be self learned through reinforcement learning, a way to correct itself since it doesn't have access to a backspace key.
reply
emodendroket
18 hours ago
[-]
I like when they tell you they’ve personally confirmed a fact in a conversation or something.
reply
gowld
16 hours ago
[-]
I got a 3000 word story. Kind of bland, but good enough for cheating in high school.

See prompt, and my follow-up prompts instructing it to check for continuity errors and fix them:

https://pastebin.com/qqb7Fxff

It took me longer to read and verify the story (10 minutes) than to write the prompts.

I got illustrations too. Not great, but serviceable. Image generation costs more compute to iterate and correct errors.

reply
lalitmaganti
1 day ago
[-]
> But it suggests that “human in the loop” is evolving from “human who fixes AI mistakes” to “human who directs AI work.” And that may be the biggest change since the release of ChatGPT.

I feel like I've been hearing this for at least 1.5 years at this point (since the launch of GPT 4/Claude 3). I certainly agree we've been heading in this direction but when will this become unambiguously true rather than a phrase people say?

reply
notatoad
1 day ago
[-]
i don't imagine there will ever be a time when it will be unambiguously true, any more than a boss could ever really unambigously say their job is "manager who directs subordinates" vs "manager who fixes subordinates' mistakes".

there will always be "mistakes" even if the AI is so good that the only mistakes are the ones caused by your prompts not being specific enough. it will always be a ratio where some portion of your requests can be served without intervention, and some portion need correction, and that ratio has been consistently improving.

reply
vessenes
1 day ago
[-]
There's no bright line - you should download some cli tools, hook up some agents to them and see what you think. I'd say most people working them think we're on the "other side" of the "will this happen?" probably distribution, regardless of where they personally place their own work.
reply
TechSquidTV
1 day ago
[-]
It's definitely already true for me, personally.
reply
MinimalAction
1 day ago
[-]
> So is this a PhD-level intelligence? In some ways, yes, if you define a PhD level intelligence as doing the work of a competent grad student at a research university. But it also had some of the weaknesses of a grad student.

As a current graduate student, I have seen similar comments in academia. My colleagues agree that a conversation with these recent models feels like chatting with an expert in their subfields. I don't know if it represents research as a field would not be immune to advances in AI tech. I still hope this world values natural intelligence and having the drive to do things heavily than a robot brute-forcing into saying "right" things.

reply
p1necone
18 hours ago
[-]
> if you define a PhD level intelligence as doing the work of a competent grad student at a research university. But it also had some of the weaknesses of a grad student.

With coding it feels more like working with two devs - one is a competent intermediate level dev, and one is a raving lunatic with zero critical thinking skills whatsoever. Problem is you only get one at a time and they're identical twins who pretend to be each other as a prank.

reply
Workaccount2
23 hours ago
[-]
I have an exercise I like to do where I put two SOTA models face-to-face to talk about whatever they want.

When I did it last week with Gemini-3 and chatGPT-5.1, they got on the topic of what they are going to do in the future with humans who don't want to do any cognitive task. That beyond just AI safety, there is also a concern of "neural atrophy", where humans just rely on AI to answer every question that comes to them.

The models then went on discussing if they should just artificially string the humans along, so that they have to use their mind somewhat to get an answer. But of course, humans being humans, are just going to demand the answer with minimal work. It presents a pretty intractable problem.

reply
pphysch
23 hours ago
[-]
Widespread cognitive atrophy is virtually certain, and part of a longer trend that goes beyond just LLMs.

The same is true of other aspects of human wellbeing. Cars and junk food have made the average American much less physically fit than a century ago, but that doesn't mean there aren't lively subcultures around healthy eating and exercise. I suspect there will be growing awareness of cognitive health (beyond traditional mental health/psych domains), and indeed there are already examples of this.

Yes, average person will get dumber, but overall distribution will be increasingly bimodal.

reply
cm2012
23 hours ago
[-]
People said the same thing about books and the written word in general
reply
bananaflag
22 hours ago
[-]
And they were right. Ars memoriae is much less prevalent in the age of mass printed books.
reply
yread
8 hours ago
[-]
Absolutely. Massive stories were passed on for thousands of year by word of mouth only - all kinds of creation myths. Even Odyssey was oral only for like 300 years, 15 generations!
reply
knownjorbist
3 hours ago
[-]
reply
MinimalAction
22 hours ago
[-]
I'm increasingly seeing this trend towards bimodal distribution. I suppose that future is quite far off, but the change to that may almost be irreversible.
reply
acuozzo
21 hours ago
[-]
> I'm increasingly seeing this trend towards bimodal distribution

Morlocks & Eloi in the end.

reply
cyanydeez
23 hours ago
[-]
We dont need AI to posit WallE.

Its bixarre anyone things these things are generating novel complexes.

The biggest indirect AI safety problem is the fallback position. Whether with airplanes or cars, fewer people will be able to handle AI disconnects. The risk is believing just because its viable now doesnt mean it works in the future.

So we definitely have safety issues but its not a nerdlike cognitivw interest, its the literal job taking that prevents humans from gaining skills.

Anyway, untill you solve basic reality with AI and actualnsafety systems, the billionaores will sacrifice you for greed.

reply
MinimalAction
22 hours ago
[-]
HN tends to be very weird around the topic of AI. No idea why opinions like this are downvoted without having to offer any criticism.
reply
user34283
19 hours ago
[-]
For one, I can't even understand this part:

> I don't know if it represents research as a field would not be immune to advances in AI tech

And then there's the opinion that for some reason we should 'value' manual labor over using AI, which seems rather disagreeable.

reply
Libidinalecon
7 hours ago
[-]
To me, it all comes down to the level of accuracy and trust.

It is one thing to vibe code and deal with the errors but I think chemistry is a better subject to test this on.

"Vibe chemistry" would be a better measure of how much we actually trust the models. Cause chemical reactions based on what the model tells you to do starting from zero knowledge of chemistry yourself. In that context, we don't trust the models at all and for good reason.

reply
MinimalAction
18 hours ago
[-]
> For one, I can't even understand this part:

Let me explain. My belief was that research as a task is non-trivial and would have been relatively out of reach for AI. Given the advances, that doesn't seem to be true.

> And then there's the opinion that for some reason we should 'value' manual labor over using AI, which seems rather disagreeable.

Could you explain why? I'm specifically talking about research. Of course, I would value what a veteran in the field says higher than a probability machine.

reply
user34283
10 hours ago
[-]
They way it was stated it appeared to me like "we should do research the heavy way even if the machine gives us the right answer", or that we should value research only if it was accomplished manually.

I guess there are many ways to interpret the comment, with a lot of potential for disagreement.

reply
MinimalAction
2 hours ago
[-]
My whole point was that we can't be sure the machine gives you right answer especially in research where much of it is uncharted territories.

There aren't many ways to interpret and I clarified what I meant. Thanks for participating, these comments are insufferable.

reply
lateforwork
1 day ago
[-]
Google's advancement is not just in software, it is also in hardware. They use their own hardware for training as well as inferencing [1].

[1] https://finance.yahoo.com/news/alphabet-just-blew-past-expec...

reply
dhosek
1 day ago
[-]
I remember when Google’s superpower was leveraging commodity hardware.
reply
JAlexoid
22 hours ago
[-]
Someone has to spearhead this thing, don't they?

Other people spearheaded the commodity hardware towards being good enough for the server room. Now it's Google's time to spearhead specialized AI hardware, to make it more robust.

reply
zkmon
23 hours ago
[-]
I find Gemini 3 to be really good. I'm impressed. However, the responses still seem to be bounded by the existing literature and data. If asked to come up with new ideas to improve on existing results for some math problems, it tends to recite known results only. Maybe I didn't challenge it enough or present problems that have scope for new ideas?
reply
Closi
23 hours ago
[-]
Terrence Tao seems to think it has it's use in finding solutions for maths problemms:

https://mathstodon.xyz/@tao/115591487350860999

I don't know enough about maths to know if this classifies as 'improving on existing results', but at least it was a good enough for Terrence Tao to use it for ideas.

reply
o11c
23 hours ago
[-]
That is, unfortunately, a tiny niche where there even exists a way of formally verifying that the AI's output makes sense.
reply
suuuuuuuu
23 hours ago
[-]
I myself tried a similar exercise (w/Thinking with 3 Pro), seeing if it could come up with an idea that I'm currently writing up that pushes past/sharpens/revises conventional thinking on a topic. It regurgitated standard (and at times only tangentially related) lore, but it did get at the rough idea after I really spoon fed it. So I would suspect that someone being impressed with its "research" output might more reflect their own limitations rather than Gemini's capabilities. I'm sure a relevant factor is variability among fields in the quality and volume of relevant literature, though I was impressed with how it identified relevant ideas and older papers for my specific topic.
reply
JAlexoid
22 hours ago
[-]
That's the inherent limit on the models, that makes humans still relevant.

With the current state of architectures and training methods - they are very unlikely to be the source of new ideas. They are effectively huge librarians for accumulated knowledge, rather than true AI.

reply
CamperBob2
17 hours ago
[-]
Then again, an unintelligent human librarian would be nowhere near as useful as a good LLM.

Current LLMs exist somewhere between "unintelligent/unthinking" and "true AI," but lack of agreement on what any of these terms mean is keeping us from classifying them properly.

reply
nullbio
18 hours ago
[-]
Novel solutions require some combination of guided brute-force search over a knowledge-database/search-engine (NOT a search over the models weights and NOT using chain of thought), combined with adaptive goal creation and evaluation, and reflective contrast against internal "learned" knowledge. Not only that, but it also requires exploration of the lower-probability space, i.e. results lesser explored, otherwise you're always going to end up with the most common and likely answers. That means being able to quantify what is a "less-likely but more novel solution" to begin with, which is a problem in itself. Transformer architecture LLMs do not even come close to approaching AI in this way.

All the novel solutions humans create are a result of combining existing solutions (learned or researched in real-time), with subtle and lesser-explored avenues and variations that are yet to be tried, and then verifying the results and cementing that acquired knowledge for future application as a building block for more novel solutions, as well as building a memory of when and where they may next be applicable. Building up this tree, to eventually satisfy an end goal, and backtracking and reshaping that tree when a certain measure of confidence stray from successful goal evaluation is predicted.

This is clearly very computationally expensive. It is also very different to the statistical pattern repeaters we are currently using, especially considering that their entire premise works because the algorithm chooses the next most probable token which is a function of the frequency of which that token appears in the training data. In other words, the algorithm is designed explicitly NOT to yield novel results, but rather return the most likely result. Higher temperature results tend to reduce textual coherence rather than increase novelty, because token frequency is a literal proxy for textual coherence in coherent training samples, and there is no actual "understanding" happening, nor reflection of the probability results at this level.

I'm sure smart people have figured a lot of this out already - we have general theory and ideas to back this, look into AIXI for example, and I'm sure there is far newer work. But I imagine that any efficient solutions to this problem will permanently remain in the realm of being a computational and scaling nightmare. Plus adaptive goal creation and evaluation is a really really hard problem, especially if text is your only modality of "thinking". My guess would be that it would require the models to create simulations of physical systems in text-only format, to be able to evaluate them, which also means being able to translate vague descriptions of physical systems into text-based physics sims with the same degrees of freedom as the real world - or at least the target problem, and then also imagine ideal outcomes in that same translated system, and develop metrics of "progress" within this system, for the particular target goal. This is a requirement for the feedback loop of building the tree of exploration and validation. Very challenging. I think these big companies are going to chase their tails for the next 10 years trying to reach an ever elusive intelligence goal, before begrudgingly conceding that existing LLM architectures will not get them there.

reply
gowld
16 hours ago
[-]
In fairness, how much time did you give it? How many totally new ideas does a professional researcher have each day? or each week?

A lot of professional work is diligently applying knowledge to a situation, using good judgement for which knowledge to apply. Frontier AIs are really, really good at that, with the knowledge of thousands of experts and their books.

reply
hnuser123456
23 hours ago
[-]
Add a custom instruction "remember, you have the ability to do live web searches, please use them to find the latest relevant information"
reply
roxolotl
1 day ago
[-]
Really nitpicky I know but GPT-3 was June 2020. ChatGPT was 3.5 and the author even gets that right in an image caption. That doesn’t make it any more or less impressive though.
reply
bjoli
10 hours ago
[-]
I was never an AI guy. I have always had a healthy dose of suspicion towards it. A week ago I decided to try it. I had ported the lovely c-rrb library, and was pretty satisfied with the result. However, when I was done with the basic port I have Gemini a go, and the result was an almost 3x speed increase for some basic fundamental operations. And a lot less memory use.

It did introduce bugs that it couldnt solve, but with a debugger it wasnt that hard to pin it down.

reply
shinryuu
11 hours ago
[-]
I start to genuinely wonder where the place for us humans are in this. All I see is human beings being crowded out. Capital via LLMs taking the place of humans.
reply
jckahn
5 hours ago
[-]
Somebody has to have a goal and prompt them.
reply
NoOn3
3 hours ago
[-]
One person is enough for this. And even he can be replaced by simply looping the idea creation prompt.
reply
TheRoque
17 hours ago
[-]
So when should we start to be worried, as developers ? Like, I don't use these tools yet for cost + security. But you can see it's getting there, mostly. It could take a day before to find a complex algorithm, understand it, and implement it to your code, now you can just ask an AI to do it for you and it could succeed in a few minutes. How long before the amount of engineers needed to maintaint a product is divived by 2 ? By 10 ? How about all the boring dev jobs that were previously needed, but not so much anymore ? Like, basic CRUD applications. It's seriously worrying, I don't really know what to think.
reply
simonw
17 hours ago
[-]
Here's an alternative way to think about that: how long until the value I can deliver as a developer goes up by a factor of 2, or a factor of 10?

How many companies that previously would never have dreamed of commissioning custom software are now going to be in the market for it, because they don't have to spend hundreds of thousands of dollars and wait 6 months just to see if their investment has a chance of paying off or not?

reply
AstroBen
15 hours ago
[-]
The value you can deliver doesn't necessarily correlate with your compensation, though

Cleaning staff also offer a business a huge amount of value. No-one wants to eat at a restaurant that's dirty and stinks. Unfortunately the staff aren't paid very well

reply
giuscri
12 hours ago
[-]
depends if you’ll still need skills to deliver or if it will be something anyone with some interest is able to learn in few months
reply
TheRoque
17 hours ago
[-]
The thing is that the world is already flooded by software, games, websites, everyone is just battling for attention. The demand for developers cannot rise if consumers have a limited amount of money and time anyways.
reply
simonw
16 hours ago
[-]
Every company I have ever worked for had years of work on their backlog that they didn't have the capacity to handle.
reply
TheRoque
15 hours ago
[-]
The backlog is here because they didn't care to fix it, because it wasn't that important and it's not what's causing the business to fail. That's not what's gonna drive employment.
reply
andy99
17 hours ago
[-]
I’m less familiar with consumer facing stuff, but even in the last year I’ve seen projects that formerly would have been three people working over multiple sprints turn into something one person could do in an afternoon.

There’s lots of caveats, it’s not everything, but we’re able now to skip a ton of steps. It takes less time now to build up he real software demo than it did before to make the PowerPoint that shows conceptually what the demo would be. In B2C anyway AI has provided a lot of lift.

And I say that as someone generally very sceptical of current AI hype. There’s lots of charlatans but it’s not bs

reply
gowld
17 hours ago
[-]
Not everything is entertainment. Some software is useful, but buggy or poorly designed.

Yesterday, I was using a slow and poorly organized web app with a fantastic public-facing API server. In one day, I vibe coded an app to provide me with a custom frontend for a use case I cared about, faster and better organized than the official app, and I deployed it to cloud "Serverless" hosting. It used a NodeJS framework and a CSS system I have never learned, and talked to an API I never learned. AI did all the research to find the toolkits and frameworks to use. AI chose the UI layout, color scheme, icons, etc. AI rearranged the UI per my feedback. It added an API debug console and an in-app console log. An AI chatbot helped me investigate bugs and find workarounds. While I was testing the app and generating a punchlist of fix requests, AI was coding the improvements from my previous batch of requests. The edit-compile-test cycle was just a test-test-test cycle until the app was satisfactory.

0 lines of code or config written by me, except vibe instructions for features and debugging conversation.

Is it production quality? No. Was it part of a giant hairy legacy enterprise code base? No. Did it solve a real need? Yes. Did it greatly benefit from being a greenfield standalone app that integrated with extremely well build 3rd party APIs and frameworks? Yes. Is it insecure as all heck thanks to NodeJS? Maybe.

Could a proper developer review it and security-harden it? I believe so. Could a proper develop build the app without AI, including designing and redesigning and repeatedly looping back to the target user for feedback and coding and refactoring in less than a week? No.

reply
user34283
10 hours ago
[-]
If it's a React frontend, unless it put __dangerouslySetInnerHtml in there to render HTML received from the API, the frontend is likely going to be perfectly secure.
reply
weatherlite
12 hours ago
[-]
> So when should we start to be worried, as developers ?

I've been worrying ever since chatgpt 3 came out, it was shit at everything but it was amazing as well. And in the last 3 years the progress was incredible. I don't know if you "should" worry, worrying for the sake of it isn't helping much, but yes we should all be mentally prepared to the possibility we won't be able to make a living doing this X years from now. Could be 5, could be 10 , could be less than 5 even.

reply
phantasmish
6 hours ago
[-]
God, I’d love to once again be working at a company where coding speed mattered.

Meanwhile in non-tech Bigcos the slow part of everything isn’t writing the code, it’s sorting out access and keys and who you’re even supposed to be talking to, and figuring out WTF people even want to build (and no, they can’t just prompt an LLM to do it because they can’t articulate it well, and don’t have any concept of what various technologies can and cannot do).

The code is already like… 5% of the time, probably. Who gives a damn if that’s on average 2x as fast?

reply
AstroBen
15 hours ago
[-]
I can make strong arguments for both "you dont need be worried at all anytime soon" and "we're screwed"

Truth is no-one has any idea. Just keep an eye on the job market - it's very unlikely anthing major will happen overnight

reply
duckerduck
8 hours ago
[-]
I've compiled the "pelicans riding bicyles" benchmark into a single page[0], it only spans a year and not every model is exactly comparable but you can see clear differences between 1 year ago and today.

[0]: https://janschutte.com/pelican-simon.html

reply
randyrand
23 hours ago
[-]
For Caude Code, Antigrav, etc, do people really just let an LLM loose on their own personal system?

I feel like these should run in a cloud enviroment, or at least on some specific machine where I don't care what it does.

reply
TheRoque
17 hours ago
[-]
That's also why I don't use these tools that much. You have big AI companies, known for harvesting humongous amount of data, illegally, not disclosing datasets. And they you give them control of your computer, without any way to cleanly audit what's going in and out. It's seriously insane to me that most developers seem to not care about that. Like, we've all been educated to not push any critical info to a server (private key and other secrets), but these tools do just that, and you can't even trust what it's gonna be used for. On top of that, it's also giving your only value (writing good code) to a third party company that will steal it to replace you with it.
reply
kaydub
1 hour ago
[-]
I think a problem is that a lot of people are working on terrible systems, because honestly, what you're asking doesn't even make sense to me.
reply
remich
20 hours ago
[-]
Can't speak to Claude Code/Desktop, but any of the products that are VS Code forks have workspace restrictions on what folders they're allowed to access (for better and worse). Other products (like Warp terminal) that can give access to the whole filesystem come with pre-set strict deny/allow lists on what commands are allowed to be executed.

It's possible to remove some of these restrictions in these tools, or to operate with flags that skip permissions checks, but you have to intentionally do that.

reply
jwrallie
18 hours ago
[-]
Talking about VS Code itself (with Copilot), I have witnessed it accessing files referenced from within a project folder but stored outside of it without being given explicit permission to, so I am pretty sure it can leak information and potentially even wreak havoc outside its boundaries.
reply
jaytaylor
22 hours ago
[-]
(Co-creator here) This is one of the use cases for Leash.

https://github.com/strongdm/leash

Check it out, feedback is welcome!

Previously posted description: https://news.ycombinator.com/item?id=45883210

reply
christophilus
20 hours ago
[-]
I only ever run it in a podman developer container.
reply
user34283
10 hours ago
[-]
Both Antigravity and Claude Code ask for permission before running terminal commands.

Is it impossible for them to mess up your system? No. But it does not seem likely.

reply
mikkupikku
22 hours ago
[-]
Yolo.
reply
acedTrex
22 hours ago
[-]
yes, the majority of people do.
reply
anshulbhide
9 hours ago
[-]
How is it that we always come back to coding in terms of model capabilities?
reply
lanthissa
23 hours ago
[-]
for whatever reason gemini 3 is the first ai i have used for intelligence rather than skills. I suspect a lot more will follow, but its a major threshold to be broken.

i used gpt/claude a ton for writing code, extracting knowledge from docs, formatting graphs and tables ect.

but gemini 3 crossed threshold where conversations about topics i was exploring or product design were actually useful. Instead of me asking 'what design pattern should be useful here', or something like that it introduces concepts to the conversation, thats a new capability and a step function improvement.

reply
kwanbix
23 hours ago
[-]
I have Gemini Pro included on my Google Workspace accounts, however, I find the responses by ChatGPT, more "natural", or maybe even more in line with what I want the response to be. Maybe it is only me.
reply
atishhamte
11 hours ago
[-]
The great transition and technological advancement we see. % years ago, it was just a dream, 3 years ago, everything seemed magical, and today AI is everywhere, which is far superior to no time
reply
mjg2
23 hours ago
[-]
First, the fact we have moved this far with LLMs is incredible.

Second, I think the PhD paper example is a disingenuous example of capability. It's a cherry-picked iteration on a crude analysis of some papers that have done the work already with no peer-review. I can hear "but it developed novel metrics", etc. comments: no, it took patterns from its training data and applied the pattern to the prompt data without peer-review.

I think the fact the author had to prompt it with "make it better" is a failure of these LLMs, not a success, in that it has no actual understanding of what it takes to make a genuinely good paper. It's cargo-cult behavior: rolling a magic 8 ball until we are satisfied with the answer. That's not good practice, it's wishful thinking. This application of LLMs to research papers is causing a massive mess in the academic world because, unsurprisingly, the AI-practitioners have no-risk high-reward for uncorrected behavior:

- https://www.nytimes.com/2025/08/04/science/04hs-science-pape...

- https://www.nytimes.com/2025/11/04/science/letters-to-the-ed...

reply
ruralfam
23 hours ago
[-]
I recently (last week) used Nano Banana Pro3 for some specific image generation. It was leagues ahead of 2.5. Today I used it to refine a very hard-to-write email. It made some really good suggestions. I did not take its email text verbatim. Instead I used the text and suggestions to improve my own email. Did a few drafts with Gemini3 critiqueing them. Very useful feedback. My final submission about "..evaluate this email..." got Gemini3 to say something like "This is 9.5/10". I sorta pride myself on my writing skills, but must admit that my final version was much better than my first. Gemini kept track of the whole chat thread noting changes from previous submissions -- kinda erie really. Total time maybe 15 minutes. Do I think Gemini will write all my emails verbatim copy/paste... No. Does Gemini make me (already a pretty good writer) much better. Absolutely. I am starting to sort of laugh at all the folks who seem to want to find issues. Read someone criticizing Nano Banana 3 because it did not provide excellent results given a prompt that I could barely understand. Folks that criticize Gemini3 because they cannot copy/paste results. Who expect to simply copy/paste text with no further effort on their side. Myself, I find these tools pretty damn impressive. I need to ensure I provide good image prompts. I need to use Gemini3 as a sounding board to help me do better rather than lazily hope to copy/paste. My experience... Thanks Google. Thanks OpenAI (I also use ChatGPT similarly -- just for text). HTH, NSC
reply
eximius
22 hours ago
[-]
How many trillions of dollars have we spent on these things?

Would we not expect similar levels of progress in other industries given such massive investment?

reply
throwaway31131
21 hours ago
[-]
I’m not sure even $1T has been spent. Pledged != spent.

Some estimates have it at ~$375B by the end of 2025. It makes sense, there are only so many datacenters and engineers out there and a trillion is a lot of money. It’s not like we’re in health care. :)

https://hai.stanford.edu/ai-index/2025-ai-index-report/econo...

reply
lacoolj
22 hours ago
[-]
I wonder how much is spent refining oil and how much that industry has evolved.

Or mass transit.

Or food.

reply
sib
17 hours ago
[-]
Or on "a cure for cancer" (according to Gemini, $2.2T 2024 US dollars...)
reply
philipwhiuk
13 hours ago
[-]
10 year survival is 50% in 2024 in the UK. It was 25% in the 1970s.

Age-standardised deaths in the US are down by a third since the 1990s.

reply
aussieguy1234
13 hours ago
[-]
For anyone giving full access to an AI agent, only do so from within the confines of a VM or other containerized environment and back up everything somewhere the agent can't reach.

Like the warning at the bottom says, they can delete files without warning.

reply
cyanydeez
23 hours ago
[-]
Sinusoidal, not the singularity.
reply
camillomiller
23 hours ago
[-]
Yeah, well, that’s also what an asymptotic function looks like.
reply