FilterHN

efficax

1 hour ago

[-]

I was worried this time last year that by this time this year, companies would have slashed their engineering teams down to a handful and everything would be driven by mostly autonomous agents with human guidance. But it just hasn't happened. Do I write all my code with an agent now? Yes. Can you just give an agent a desired outcome and let it work, unsupervised? Absolutely not. I can produce more code than I used to, but if I want it to be good, to be stable, to do what the product manager and designers want, it's only about 2 to 3 times more code than before. And that productivity is impacted by the fact that I'm reviewing 2 to 3 times more code than before (and you have to review, even more so now than before, because if you just let opus or gpt 5 do its thing, you'll get some terrible results, and I've found a lot of engineers on my team are just letting it do it's thing without a lot of iteration).

supern0va

1 hour ago

[-]

>I was worried this time last year that by this time this year, companies would have slashed their engineering teams down to a handful and everything would be driven by mostly autonomous agents with human guidance. But it just hasn't happened.

I find this somewhat puzzling. I thought things were moving quickly, but at this time last year I couldn't even get Claude (using Cursor) to spin me up a service skeleton that would compile, let alone do anything meaningful.

I know it feels like a long time somehow, but it was only between November and February that things started to actually somewhat work without significant hand holding. Even now, it seems like we're still figuring out how to fully leverage the current models and tooling, even in organizations that have largely gotten on board.

iepathos

55 minutes ago

[-]

It's not all that surprising that people were worried and believed this. The AI companies and infrastructure companies partnering with them have spent a lot of money and time trying to convince people this is the case year after year. The critical clue people miss is that everyone claiming that has very clear financial incentives to convince people that's the case even when they know it isn't. Anyone who was actually building with LLMs and judging for themselves based on its performance knew fully well that wasn't the case year after year.

atomicnumber3

11 minutes ago

[-]

I've said this before: if anthropic (et al) thought they genuinely had a shot at replacing even 30% of white collar work, they would ABSOLUTELY NOT warn ANYONE. They would do what oil, leaded gas, and cigarette companies did. Swear under oath this is completely safe, commit GRIEVOUS societal harm that you explicitly promised wouldn't happen, and then end up in history books instead of jail for reasons beyond my ability to fathom.

No. The very fact they are trying to "warn" us means it's all marketing.

This has been corroborated for me on the engineering front that I can't find a single IC I respect who actually thought there was any evidence AI was going to live up to the hype. I saw a lot of people I always thought were idiots/sycophants/brown nosers go insane with AI. Never saw anyone id trust to help me cross a street blindfolded say more that "I may be wrong, but I'm not seeing any evidence yet".

deaux

1 hour ago

[-]

> at this time last year I couldn't even get Claude (using Cursor) to spin me up a service skeleton that would compile, let alone do anything meaningful

I've been using it to do this for 2 years now. And many people with me. The change you mention is one of is primarily one of Overton windows, of vibes.

simonw

1 hour ago

[-]

Which harness software were you using for this 2 years ago? VS Code Copilot? Cursor?

zamalek

1 hour ago

[-]

> Can you just give an agent a desired outcome and let it work, unsupervised? Absolutely not.

Ignoring instructions - whether in AGENTS.md or my prompt - is the worst of it, and it routinely happens. It just waives things that I explicitly told it to do as part of the design.

Vibe coders (in the true sense, zero oversight) claim that you just need to prompt it carefully. That's completely untrue when faced with your careful prompt being ignored.

I even have "don't overrule me without asking" in my global AGENTS.md, and it simply doesn't do that.

1 hour ago

[-]

Your context isn’t to give it orders, they just don’t work like that. Your context (AGENTS.me, skills, per-request context we are sending in for each request to bots) is to give it the info it needs in the language category it’s trained for the answers you want; you have to give it a clear instruction each prompt. Basically, when you have a long session, you can see this by saying, ok, now moving onto another thing, blah blah blah (implicitly ignoring all previous instructions). It can even back fire - nagging too much about don’t skip tests in the context can make it slip into the linguistic space where there is some emergency and faking the results might be justified (I imagine there is a certain amount of training out there “just making the tests pass for now, will fix later, I promise.” If you rarely mention tests except “this one is failing, please investigate what is going on” (an informational outcome not a test outcome), it doesn’t really “cheat” (tho it can leap to conclusions as always). The tests need to be some deterministic step in the process anyways, tests don’t need fuzzy word directed search capabilities. But the models just don’t have the structure to allow feeding in a ten page set of rules and follow them. You can add a step to say, please check this git commit for compliance with the 23 rules in this standards file, and it will work better to catch the gaps.

grey-area

1 hour ago

[-]

These are word generators, not agents, I’m really not sure why people think they could be capable agents (ie independent) when they consistently ignore instructions, generate the wrong things and then double down when questioned, etc etc.

You’ve been sold something that simply doesn’t work for the purported use case (intelligence) and instead is like a stupid database of all world knowledge with the appearance of intelligence.

Useful tools at times (if you bear in mind their limitations), but not close to intelligent, independent agents.

https://github.com/gitsense/pi-brains

sdesol

1 hour ago

[-]

> I even have "don't overrule me without asking" in my global AGENTS.md, and it simply doesn't do that.

You really need to look into hooks based on your coding agent. This is very much a solved problem as I demonstrate with

I have a test repo

https://github.com/gitsense/gsc-rules-demos

that shows how you can block and warn and do other things.

You obviously can't have a "Don't make a mistake" rule though.

rogerrogerr

1 hour ago

[-]

I’m convinced the magic bullet is deterministic checks. Linters, static analyzers, etc. Whatever you can do to create deterministic gates that the LLM simply must overcome to reach a “done” state, do it. Has been making a huge difference for my team, but sister teams are so invested in writing the perfect Make No Mistakes prompt that they just can’t see it.

Basically I treat it like a junior dev. We don’t get junior devs to write code correctly by cajoling them just right, we add CI gates. It still works.

sdesol

13 minutes ago

[-]

Why aren't the teams using shared checks? Are the codes in different repos?

rogerrogerr

5 minutes ago

[-]

They’re very, very different projects.

codemog

1 hour ago

[-]

Also noticed this. Their intelligence is very jagged. I’ve had them produce some highly optimized code yet fail to follow basic code guidelines.

ls612

55 minutes ago

[-]

In my limited testing Fable is far better at obeying CLAUDE.MD than Opus is.

alt227

1 hour ago

[-]

I have experienced and feel very much the same, and it is refreshing to see a realistic post about the success of agentic coding instead of the usual hype or doom.

https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing...

ramoz

1 hour ago

[-]

As crazy as it may sound, my workflow today does not look too different from a year ago - where I was already heavy into claude code.

Im not certain things will look too different a year from now either. We still have serious bottlenecks in terms of focus/attention you have for both delegating agent work and being able to review it. Even if we solve the "trust what ai does" problem, these cognitive deficit issues still exist - for teams coordinating work, even users adopting new shit, etc.

As an industry we are leaning heavy into accepting "slop" as the status quo - we care more about efficiency of output right now. Slop will get better & we can become more adaptive to living with the paradox of amazing yet delicate systems generated by AI. But I feel big shifts coming in this regard and if/when it does we may find ourselves in the dystopia of broader unemployment with worse net outcomes.

I do think the teams that ship quality with AI will do so by learning to slow down

simonw

1 hour ago

[-]

This is a thinner TechCrunch rewrite of this Reuters story: https://finance.yahoo.com/technology/ai/articles/exclusive-z...

The exact quote appears to be:

> In retrospect, he said, the "trajectory of the agentic development over at least the last four months hasn't really accelerated in the way that we expected," and that the company's bets on the new structure "haven't come to fruition yet." Zuckerberg was referring to AI agents, automated systems that can execute tasks on behalf of a user.

Hard to guess exactly what he means by "trajectory of the agentic development" but my best guess is that he means that Meta's own internal efforts to improve the agent (aka longer form tool-using) capabilities of their own in-house models hasn't improved to the point that they can drive an agent harness like Codex or Claude Code in a comparable manner to the best OpenAI and Anthropic models.

At a further guess, that was part of their goal in reassigning large numbers of employees to help label data for their AI efforts.

cyanydeez

23 minutes ago

[-]

the pessimistic take is their harness is no better than thise available and he thinks they all suck together.

from a high level, these agents absolutely do not function as a rational human through even medium scoped problems. even when you try to add memory, you just multiply halucinated context which just makes it error out on tasks in harder to detect manner.

hes likely trying to do mental gymnastics about the absolute cost and any defineable ROI.

simonw

19 minutes ago

[-]

I expect it's a model problem and not a harness problem, purely because some of the best harnesses (including OpenAI Codex itself) are open source and can be very easily tried against a new model.

cyanydeez

4 minutes ago

[-]

and I'm saying all the harnesses in the world arn't going to solve the myopic ability.

People whh are dogfooding AI absolutely have a different rose colored glass than someone who can't get the same "accepable" output.

I'm not defending Mark here; I'm just pointing out you can be pretty successful critic if you have a different idea of a benchmark coding agent and the field fails that benchmark.

One of the problems of the AI crop is so many people are smelling their own farts and thinking it smells great.

vishalkundar

2 hours ago

[-]

The gap between "useful chatbot" and "useful agent" is way bigger than people realize. A chatbot can be wrong 10% of the time and still help you. An agent that's wrong 10% of the time is sending bad emails and making wrong API calls with no one checking.

skybrian

1 hour ago

[-]

I see this as the gap between an general-purpose agent and a coding agent. A coding agent can imagine something to be true, test it, discover that it's wrong, and recover.

But if you go beyond what can be tested easily, asking the agent to do real work rather than writing a patch, imagining things to be true is a problem.

steveBK123

1 hour ago

[-]

This to me is the big leap from being good at coding to being good at many other tasks.

Coding could be treated as a low stakes (time & money consequences for retries) closed loop system where most other tasks cannot.

If it screws up booking your flight/hotel room, how does the agent verify this, and even if it verifies.. there is an actual cost to changes/cancellations.

Similar with agentic e-commerce, lots of ability to screw that up and just seems ripe for fraud / being picked off by bad actors.

skybrian

50 minutes ago

[-]

Seems like to make agents safe we need tentative, reversible transactions. How do you set up a travel plan and then review it? How do you modify it later?

Unfortunately, travel keeps getting less flexible, with worse cancelation policies.

steveBK123

1 hour ago

[-]

To reply to myself here..

I can STILL replicate this behavior in Google AI summaries 10% of the time:

"is <SOMEPLANT> ok for cats"

to which it replies: "Yes, <SOMEPLANT LONG SCIENTIFIC NAME VERBOSE PHRASING> is toxic for cats"

The other one going around this weekend: "how long hot dogs on grill"

Summary: "The hot dogs on your grill are likely around 5-6 inches long .. "

So scale this category of error to unsupervised agents with access to your credit card.

csomar

1 hour ago

[-]

The problem is that with text/code, judgement is hard. Here is what it looks like for physical activity: https://www.youtube.com/shorts/lK7TjujKQLw It's hard to see how that it's not useful at best and could be a disaster for any unsupervised use.

blcknight

1 hour ago

[-]

The gulf is bridgeable. The problem is that a lot of people are building agents without strong enough judgment layers around them. Work that can be verified with reasonable accuracy are the sweet spot right now.

ben_w

1 hour ago

[-]

> The gulf is bridgeable.

Only with an LLM that's actually at agent-quality.

If "useful chatbot" and "useful agent" are two rungs on a ladder, the rung before them is "useful autocomplete". Autocomplete that only gets the next token right 90% of the time won't give you compiling code.

Avicebron

1 hour ago

[-]

How many of these layers are just trying to rediscover/rebuild the idempotence of code?

_fat_santa

2 hours ago

[-]

I think what everyone underestimated was the absolute bonkers amount of compute it will take and how that compute must scale in order to keep up with larger and larger models.

darth_avocado

1 hour ago

[-]

More than that, I think people overestimate how much AI will progress as you throw more compute at it. It’s the “9 women can’t deliver a baby in a month” equivalent of AI. Additional compute won’t magically give you AGI.

paytonjjones

1 hour ago

[-]

Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarkable how well it works to mostly just throw scale at the problem.

root_axis

31 minutes ago

[-]

The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

codemog

1 hour ago

[-]

They already tried that with GPT-4 and GPT-4.5

They were allegedly massive but the cost and returns were not worth it.

2 hours ago

[-]

I was involved in three efforts to commercialize foundation models before they were ready in the 2010s so I have a good picture of how progress works at this sort of thing and the pace a lot of the industry has been talking about is unrealistic: like people were disappointed with the rate of development of Apple Intelligence but it's actually progressed at about the rate I expected.

joshuastuden

1 hour ago

[-]

That seems to be because Apple's AI division sucks. OpenAI came in 2018 and chatGPT 2.0 was already way better than anything Apple ever did.

tyre

2 hours ago

[-]

I mean, Apple Intelligence has been a boondoggle. Siri has been consistently 3+ years behind in capabilities compared to even open source equivalents.

Feels less like the pace of foundation model development and more so a specific failure of one organization to do something important.

1 hour ago

[-]

Bad capabilities but maybe less wrong output? All the funny memes of Google explaining some fake aphorism is t really something Apple product would go for. Successful navigation of technology over the decades requires some timing finesse. I don’t know.

jalev

2 hours ago

[-]

Is that a problem for Meta though? They recently announced they're going to sell their excess compute, so I imagine the actual problem is they're resorting to doing that because AI isn't having nearly the effect/usage it was supposed to and now Zuck is being a sore winner about it

AnotherGoodName

2 hours ago

[-]

I agree, i don't think it is the core problem.

Meta doesn't seem to be able to produce anything close to a frontier model. The selling of compute capacity seems to be acceptance of "compute is wasted on this crappy avocado model, we'd be better off allowing something better to run".

The problem is clearly in the model architecture, the training and the data fed into the model which is causing them to give up on using their compute exclusively for their own models. They can't get it right so may as well sell the compute to someone that can.

SoftTalker

2 hours ago

[-]

If their training base is dominated by Facebook and Instagram posts then it makes sense that their model is full of shit.

ridgeguy

1 hour ago

[-]

A modern instance of that old saw "you are what you eat".

orochimaaru

1 hour ago

[-]

Does meta have the research talent to create a SOTA frontier model? Yann LeCun has left Meta and I don’t think either alexandr wang or zuck have enough credibility to attract talent to create one.

fatline

1 hour ago

[-]

it's possible Yann LeCun wasn't the right guy either. He seemed to be more focused at finding the next model architecture rather than iterating on the current LLM architecture to build a competitive frontier model.

GCA10

1 hour ago

[-]

Meta has made some very strange decisions in terms of who it's hired to lead various aspects of AI, including the model-building efforts. Also lots to marvel at re: its ability to coordinate (or not coordinate) various efforts by all these big brains.

Can't help but think that Meta's digital networking expertise is built atop a human-networking clusterf*ck

appplication

1 hour ago

[-]

I was never really sold their acquihire of Alexandr Wang as their head of AI being a coherent strategic decision. I just don’t see how his experience and background actually applies for frontier LLM model building.

I think there would easily be a few other hundred engineers and execs at frontier labs who are more in the loop for cutting edge architecture/secret sauce - with a track record of actually doing it - that could be had for a fraction of the price.

ijk

1 hour ago

[-]

From the outside Meta's attempts to pivot from open source releases to fast follow closed models fell flat when they tried to prematurely monetize it. They could have owned the open weight model world but tried to pivot to closed weight chatbots before an actually viable revenue model appeared.

Barrin92

1 hour ago

[-]

If Meta is selling their compute and Twitter is selling their compute and the stuff doesn't do anything you don't need an economics degree to figure out what's going to happen to the price of compute. In particular because 'compute' is a euphemism given that this is far from general purpose capacity, those are specialized chips that largely do one thing

All these companies are going to sit on their gazillion data centers once the mania dies down and will have a big problem about what to do with their mountain of hardware

memoriyato3

2 hours ago

[-]

well, Google refused to increase Meta quote of tokens, even Google can't supply so many (paid) tokens as Meta is burning

ralphington

1 hour ago

[-]

It will scale inefficiently until efficiency breakthroughs occur, but it's really hard to predict when those breakthroughs will happen. Plan on the worst, but be ready and capable of capitalizing when it happens!

0xcafefood

2 hours ago

[-]

That seems like such an easy thing to estimate with a bit of basic napkin math.

laweijfmvo

2 hours ago

[-]

for us, maybe, but for someone who never really used the workflow, or looked at the “thinking” output where models spin their tokens on the stupidest shit, i can see how it wasn’t obvious.

isityettime

2 hours ago

[-]

I thought thats exactly what everyone anticipates? "Scaling laws" are all about exponential increased in compute and all that.

MattDamonSpace

1 hour ago

[-]

Altman was trying to get $1T of infra investment years ago

https://uk.pcmag.com/ai/165970/meta-exploring-option-to-sell...

2 hours ago

[-]

And yet this doesn't turn out to be Meta's problem at all.

Meta bought too many GPUs, has spare GPU capacity and they are exploring renting that capacity out.

The problem is not that the models need too much to do the job. If that were the case, Meta would not have spare capacity.

The problem is that the models currently can't be made to do the job.

laweijfmvo

2 hours ago

[-]

I think Meta’s massive compute investment was never about its 100,000 engineers running coding models, but its 3,500,000,000 users wanting to use AI in every single product (and some new ones: Meta AI, glasses, etc.) So I would think that’s the part that’s not being utilized anywhere near the amount they hoped...

maccard

2 hours ago

[-]

Do the 3.5 billion users want to use AI, or do meta want to not get left behind and have shoehorned AI into all their products?

1 hour ago

[-]

Literally the only value the Facebook AI provides is amusement when the suggestions are so comically wrong/off-colour/surreal etc.

1 hour ago

[-]

Right. But that's the same thing, isn't it? AI can't be made to do the job in those products. The only products it can do are shallow toys.

1 hour ago

[-]

Meta's AI is the stupidest in the business.

Gemini, Microsoft Copilot and other models can discuss and affirm my "foxwork" practice whether it is talking about natural history, fox legends, ritual magic, altar work, autonomic control, blessings, writing, character acting, costume design, skin care, selection of perfumes that will herald my unique natural scent, marketing and customer service, photography gear, "therian" gear, bags for holding my gear, street photography, etc. They always write like somebody who's read much more widely than anyone I've ever met and rival the legendary Tamamo-no-Mae for "speaking intelligently about any subject" [1]

Meta AI can crack jokes and that's about it. I guess there's a market for "stupid talk" but it's not that big.

[1] Like help me fix my washing machine that won't drain, come up with master narratives for the "polycrisis", talk about why Casey Handmer is wrong about space manufacturing, find papers about the social network of who sleeps with who at a high school, etc.

TheOtherHobbes

2 hours ago

[-]

The idea that users wanted AI was always a fantasy. Especially for Meta's products.

The whole hype cycle has been pure delusion. Just like the Metaverse hype cycle before it.

1 hour ago

[-]

I think this is the problem for companies with a single person atop - when the company needs things they aren’t good at, the company cannot respond effectively. Zuckerberg was good at running a company to sell ads on an addictive platform; whether that will make him good at the next ten years of profitable tech innovation is difficult to see; people hate ads and dislike the addictions, so Anthropic or whom ever has to walk a different path; they have multiple smart people working together to find that path; Meta does not seem to have that collective vision of competing experts to draw on.

alex1138

46 minutes ago

[-]

Yeah this type of conflation gets used a lot

A common one is "users don't care about privacy. that's why they use facebook. [zuckerberg was right?]"

No, you silly, silly people. People want to use products that allow them to communicate or reconnect with people or ...

They don't 'want' constantly changing privacy settings or changing TOS. If this is the best HN can come up with, ostensibly filled with S Valley people... well, it says a lot

dboreham

1 hour ago

[-]

I suspect there are many things AI can do to help people and make their lives better. But that's not how business works: products get made and marketed because they make their owners more money. Totally different goal.

maccard

2 hours ago

[-]

Did we? Many of us have been saying that the amount of compute going into the models is unsustainable and that the models aren’t improving enough to justify that for over a year. The emperor has no clothes is true yet again.

teeray

2 hours ago

[-]

They also believed they would be able to build that compute without restrictions. Between hardware costs and massive public opposition, scaling as they had anticipated is in jeopardy.

skeledrew

1 hour ago

[-]

Bonkers compute only in the beginning. Over time it'll reduce as models are made more efficient.

wrxd

1 hour ago

[-]

Or it will stay the same as the efficiency gains will be eaten up by bigger models

skeledrew

6 minutes ago

[-]

Nah they'll hit a ceiling. Can only get so big before things collapse. And besides, they've already churned through the Internet's data. Not much new content left in the wild and patterns in other data forms (audio, image, etc should be pretty low by comparison.

simianwords

2 hours ago

[-]

No I don't think there was any systemic underestimation of compute. I see the opposite - every company understands compute is important and tries to get hold of it.

https://fred.stlouisfed.org/series/OPHNFB

mattas

1 hour ago

[-]

There's a disconnect between measured productivity and "anecdotal" productivity. I love this chart because it also demonstrates one of the most effective ways to increase productivity: simply reducing the workforce.

skybrian

1 hour ago

[-]

Output per worker is the formal definition of productivity, but that doesn't mean we should assume fixed output.

Under conditions of scarcity, it's usually beneficial to increase output or to produce different kinds of output. At least, if someone will pay for it.

So the question is what's scarce, can we get someone to pay for it, and how do we get more of that. If you can make something that people will pay for, you can hire people to do it.

Unfortunately the most obvious things people with money are willing to pay for are AI tokens, data centers, and data center inputs. It's unclear how this gets us more of other things we want.

mmooss

1 hour ago

[-]

> it also demonstrates one of the most effective ways to increase productivity: simply reducing the workforce.

You can cut costs and increase productivity by firing everyone else and taking no salary yourself. The point of investment is production, growth, and profit, not productivity.

adam12

1 hour ago

[-]

Maybe they'd make faster progress if they worked in the Metaverse.

haddr

31 minutes ago

[-]

So what happened to Meta after those successful llama 3 model releases? They really made competing models back then. If felt like they have right people, strategy and good results. Now it feels they have neither of those…

natbennett

1 hour ago

[-]

This article is at least the sixth restatement of a single Reuters article that has been posted here.

Legend2440

1 hour ago

[-]

Because it tells people something they desperately want to be true: AI will not take their jobs and CEOs will regret trying to do so.

hx8

1 hour ago

[-]

Zuckerberg was always excellent at knowing how to capture the attention of the internet....

sebringj

1 hour ago

[-]

with coding, you have sort of a framework for doing it right, if you have good specs, good testing practices, strict grounding in expected results deterministically, good linting, etc... this is much easier to automate with AI for the coding part within that assuming you did your homework around it... i don't have experience with all the business layers but it seems a bit more nuanced and fuzzy as you get away from that "harness" of sorts as it doesn't have to work in the same way as code for execution and evaluation... and even if code works, it still needs tastemakers in the final ok. maybe the taste maker ability still needs a lot of work/scale to be feasible, idk, like its still earlier than later on that. maybe Elon already cracked this to an extent given his automation in various companies.

mullingitover

1 hour ago

[-]

Having agents is like going from walking to having a bicycle.

Business executives look at this and think "at this rate of progress we'll have self-driving cars in a few years!" and start making serious plans for that world.

In reality I think we're going to be riding bikes for a long time. That situation of increased individual contributor productivity makes engineers more valuable, and increases the utility of engineers rather than making them a burden on your budget.

Thus, cutting headcount right as they had huge potential to become vastly more productive was a stupid move. It's an admission that you don't know how to manage people effectively, which is embarrassing when you're paid mountains of money for your management skills.

orphea

57 minutes ago

[-]

  Having agents is like going from walking to having a bicycle.

To having roller skates at best. And even then - they are probably with hexagonal wheels.

Jyaif

1 hour ago

[-]

Nobody knows if we are going to "just" be riding bikes for a long time. To give time for society to adapt I hope it's the case, but we really have no idea.

ilaksh

2 hours ago

[-]

My instinct (for better or worse) is usually contrarian. Most people seem very skeptical of what Meta is doing with AI. But, what if, in a way at least, it makes sense?

Maybe Wang has correctly identified that the programming and agentic ability that Anthropic and OpenAI models have has largely come from armies of software engineers creating massive datasets by writing out coding and agentic problems and solutions?

So he told Zuckerberg that. The reason it may be turning into so much friction is that at companies like Anthropic or OpenAI, training engineers were either hired specifically for that purpose or probably mostly handled through contracts with third parties (which again, hired them to train AI). And honestly many of them may be overseas or just happy to have a job in a difficult period. But anyway they wouldn't have very high salary expectations etc.

But Zuckerberg already had 25000 engineers. Why not take say 1/5 of them and get them working on the the dataset? The problem is that those engineers were hired for different prestigious highly paid positions at Meta/Facebook. They were not hired to do tedious grading of AI answers or quiz construction.

But Zuckerberg either has to do this, or spend additional billions on doing it all with external contractors. A third option would be to try to create a massive distillation operation. Or just hope that his engineers could invent some magical new training trick that manifested the agentic and programming skills without the large scale human input.

Or he could release a model trained largely by existing open weights models. Which without some huge breakthrough probably has no chance of surpassing them, so is pointless.

I think most of the substantive criticism of Zuckerberg has been about burning funds. If he gives up the "your job is to grade AI homework now" plan because his engineers refuse, he would need to go through third parties. The additional billions and billions this would cost would create more pressure on the bottom line and shareholder pressure.

It would also give up any potential advantage that Wang may have optimistically sold the operation as, on that using "real" engineers as opposed to lower paid data labelling engineers might result in a higher quality dataset.

At some point, model architectures that don't need such massive datasets or can be created automatically in a way that advances the frontier will probably come about. But right now it doesn't exist.

Further, the way AI works currently, business advantage from AI comes from encoding existing internal intelligence and knowledge. Meta's massive engineering corp effectively has that in their heads. Having them create these datasets is possibly the only way to leverage this knowledge asset in this paradigm.

I guess the problem is it means forcing thousands of people to do a different job from the one they were hired for.

TheOtherHobbes

1 hour ago

[-]

None of that makes sense.

What's the end goal? Meta-specific engineering, with baked-in knowledge of how FB, Threads, and WhatsApp work? General and/or coding products to compete with Anthropic and OpenAI? Some special Magic Thing which only Meta can invent which will bedazzle Meta's users?

You don't need giant datasets unless you know what you're going to do with them. OpAI and Anthropic are having enough issues making their products profitable. And those are, if not beloved, then at least respected, with a real, if patchy, reputation for usefulness.

What was Meta's pitch in this market? There were hints of interest when LeCun was still doing original R&D, and there was some distant possibility of a next-gen revolutionary product.

But now the goal seems to be to flail around doing something incoherently AI-branded with no obvious strategy.

The troops are being marched around, but no one knows where the battle is supposed to be.

blitzar

1 hour ago

[-]

Ai remains a solution looking for a problem.

Code autocomplete is a success, password reset via ai is a failure - everything else ... still busy tokenmaxxxing in search of a problem it fits into.

theflyinghorse

1 hour ago

[-]

They are making more money than ever before. Maybe Meta leadership doesn’t really care about having a coherent strategy at this point. They can afford to flail around to see if something sticks. Reminds me of Rich kids who have ability to travel the world and find themselves before settling into a career

1 hour ago

[-]

One problem is that the AI agent market is fiercely competitive. Why build when you can buy? For the foreseeable future there will be a number of competitive models on the "efficient frontier" and I don't think one vendor will pull ahead.

In that market you can build a model and spend a lot of money on it and at best get something that's on the same frontier as everybody else but just as likely end up with uncompetitive models like the ones they have now.

You might save a bit running your own models, doing your own inference, etc. Why not take advantage of "last mover advantage" and buy whatever is best when you need it and figure the odds are good that everybody else is going to buy more GPUs than they need and as a large customer you'll be able to buy in bulk at fire sale prices?

ilaksh

1 hour ago

[-]

That makes sense in a way, but remember that Meta had previously seen some brief developer glory in the initial Llama release. Going the off-the-shelf route would essentially be giving up on being on the technology frontier in this area, and not monetizing their knowledge assets.

1 hour ago

[-]

The most effective use of that knowledge might be feeding it into RAG instead of feeding into the base of the pyramid.

ungovernableCat

1 hour ago

[-]

>I think most of the substantive criticism of Zuckerberg has been about burning funds.

I'm not in the org myself I know some Meta SWEs tangentially. My understanding is that the biggest criticism is just the chaos of it all. Jumping constantly from one thing to another like headless chickens and accomplishing nothing.

It created an environment where it's kind of impossible to plan and progress your career.

Syzygies

1 hour ago

[-]

> I think most of the substantive criticism of Zuckerberg has been about burning funds.

The 2017 Rohingya massacre in Myanmar? They handed him the death toll. He filed it under growth.

winstonp

1 hour ago

[-]

While I mostly agree with your post, I do want to point out one thing:

> Or he could release a model trained largely by existing open weights models. Which without some huge breakthrough probably has no chance of surpassing them, so is pointless.

This seems to be categorically untrue. Composer 2.5 is a substantial improvement on its underlying Kimi base model.

ilaksh

1 hour ago

[-]

If that is backed up by benchmarks then maybe they should imitate whatever Cursor did. What did they do?

They may eventually have to do that. Or they might be starting with an existing Llama model. Maybe I should have said "huge breakthrough or additional dataset".

throwaway27448

2 hours ago

[-]

I wonder when he'll admit his hopes were baseless

abirch

1 hour ago

[-]

right after he stops trying to steal everyone's privacy. Not only on the internet but IRL too with those Meta Raybans

skeledrew

1 hour ago

[-]

I think there are seriously misplaced expectations here. The primary role of AI is transference of effort, while "increased productivity" is just a side-effect (since computers are so much faster than humans at highly repetitive tasks). It's about not having to directly do X anymore (or as often), even though it may take a few rounds to get X to a satisfactory point. But even if following up is needed, most of the effort budget can then be used for Y.

Also those with very heavy investment in AI are looking for bonkers results, which is the cause of their disappointment. They need to reduce their expectations. I for one am loving the results so far.

amelius

2 hours ago

[-]

"I was hoping AI had progressed enough so I could fire you. But you failed to make it so. Therefore, you're fired!"

fantasizr

2 hours ago

[-]

tokenmaxxing will be a funny footnote like nfts on the tonight show 2 years post-hype

2 hours ago

[-]

Or: you wasted too much money on failing to replace yourselves so now I have to lay you off. Which is one of the two possible grand outcomes of the AI bubble, which both result in laying people off, because that is all these companies know how to do as a response to stress.

jrockway

2 hours ago

[-]

I am not sure that it has to be so zero sum. The AI truth is probably somewhere in the middle; it probably doesn't replace software engineers and it probably won't be deleted as completely useless. My current feeling is that it's a powerful tool I'm happy to pay to use; it doesn't replace me, but it makes it easier to do higher quality work. It feels a lot like IntelliSense, or faster compilers, or getting a 32" monitor. That probably doesn't sustain the bubble, but it's something that people are going to be poking at and making money off of for a long time.

I agree that people are investing as though the world is going to run itself while the ultra-wealthy run off in yachts to compare sizes. If it wasn't AI, it would just be tulips or something. That's just how people are. But maybe they'll be right, who knows.

1 hour ago

[-]

> The AI truth is probably somewhere in the middle; it probably doesn't replace software engineers and it probably won't be deleted as completely useless.

This is not really somewhere in the middle, I think. It is very close to one of the ends. Because the fear-promise to the idiot-investor class was that it would have those impacts across all industries, not just us nerds. They hate us for refusing to make their silly ideas possible and having irritating fact-based reasons why they can't work, but they don't hate us enough to spend that much money replacing just us. They have lots of other people they hate paying too, and we haven't even made a dent.

daveguy

2 hours ago

[-]

Bottom-line win-win! All hail the shareholder value!

AnotherGoodName

2 hours ago

[-]

I'm guessing this is specifically about Avocado which everyone at Meta would acknowledge is terrible.

roschdal

1 hour ago

[-]

AI agents are no good.

ChrisArchitect

1 hour ago

[-]

[dupe] https://news.ycombinator.com/item?id=48767058

mmooss

1 hour ago

[-]

If a Meta employee screws up a major project, what happens? What will happen to the executives behind these mass firings and realignment - executives of one of the very top SV companies whose job is dealing with the landscape of disruptive technology development and overreacted to the latest thing? What is the standard for them?

threethirtytwo

1 hour ago

[-]

why havent big tech employees formed a union?

54 minutes ago

[-]

Poor skills at working together and love of high salaries over work place autonomy.

holoduke

2 hours ago

[-]

Mark is really a bad leader with a mwah mwah vision. He is maybe correct in some things. But the execution is really really poor. Plus he does not have followers and believers. He only got money that can simulate followers to a certain extent

andybak

2 hours ago

[-]

If it was still possible to get verbatim results from Google then I believe "mwah mwah vision" would have been an authentic Googlewhack pointing at this comment thread.

alt227

1 hour ago

[-]

Nice thought, but I think strict googlewhacking frowned upon quote usage.

alex1138

1 hour ago

[-]

I'm sorry if it's a non sequitur but I feel even beyond superintelligence/AI/LLM whatever of the last few years... they've always done this, it's always been somewhat hamfisted

Examples abound of "I reported Nazi hate page. Didn't violate community guidelines. I called my friend a jerk, jokingly, got a month ban

For years. Not restricted to when ChatGPT et al arrived on the scene

(Because, AI in theory makes sense. If you want to monitor things at scale you might use AI - however that's defined - to make your workload easier. When is an account being hijacked? When are bad actors infiltrating the system? Or whatever)

penpendian

3 hours ago

[-]

i bet he wants some calculative shit

yepyoukno

2 hours ago

[-]

Or some fuzzy yet inevitably reliable shit.

The modern trend is to think intelligence is generative “like compression” or “predicting next in sequence” rather than iteratively reducing uncertainty, like those fault tolerant humans.

AnotherGoodName

2 hours ago

[-]

Compression can be defined as reducing uncertainty. If you can predict the next sequence you can compress it to 0 bytes using arithmetic coding. Reliable prediction is what enables compression and it's the link between compression and AI that everyone is talking about.

No one ever in comp sci says artificial intelligence is "like compression", they correctly state that "artificial intelligence IS compression". It's absolutely known and accepted that artificial intelligence (defined as predicting outcomes with a measure of certainty and taking chosen actions towards goals using those predictions) has equivalence to compression in a very hard science way. The hardest part of artificial intelligence is compression and the remaining part, the choice of actions based on predictions is just a tree search to a goal.

detourdog

1 hour ago

[-]

Compression in image, video, sound, and text. These items to compressed are all created by humans and we will say represented by files. The difference between an instant of reality and the files is vast. Reality also doesn’t stand still and each instant needs to be captured and interpreted before AI happens.

AI can be just like compression but currently the compute power is no match for details.

Finally these reality details need consideration in any successful implementation. Which means the implementator needs to be aware of the details and successfully relate them to everything else in the model.

I think anyone surprised by these things is not fully engaged with what they are doing.

44 minutes ago

[-]

The factor that is missing in that analysis to me is a time based dynamic stability perspective. Humans have a pretty good ability to go off the rails in reasoning one day and wake up reasonable; a pretty good ability to pursue tasks, despite a multitude of distractions, for ten years or longer. The best models get appreciably worse over a half million tokens. Even using a bunch of limited context agents over time, they lack mental stability. They keep coming up with ideas contrary to the long term idea, and every so often generate ideas that make no sense but they have a hard time letting go of. So the pure functional LLM is compression, but AGI needs some centering process, some high level of dynamic stability to stay sane over time and in the face of 10,000 shiny pretty things to chase.

The harnesses get better, but I haven’t seen much experimentation on long term stability, at least since the “let the LLM run the candy machine” papers from a while ago.

Because the thing missing, even with the largest agentic swarms, is independent intelligence, where it’s given something to own, like say “end to end data quality as we add more clients” (for a SaaS) and it just figures out what that means at each time, mutating its role and solutions to fix the external world, without getting silly.