Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
Even then it makes more sense to rent the bigger GPU and get your answer faster.
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
OK. I guess that's good, too.
Software engineer quality of life.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
Uber engineers do not define their revenue stream; the product leadership team does.
$1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...
OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
as far as we know there's no evidence that they can produce any profits at all
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.