FilterHN

simonw

48 minutes ago

[-]

"The leaderboard, which ranked employees and teams by token consumption, inadvertently incentivized usage volume over productive output."

Who could possibly have predicted that happening?

Aurornis

29 minutes ago

[-]

A past employer thought it was a good idea to put up a leaderboard of who sent the most Slack messages. They celebrated the people at the top for being so active.

Predictably, everyone started talking in Slack like their jobs depended on it. Everyone was responding to everything. Instead of writing out a complete message and pressing enter, they'd send each fragment of the sentence as a new line.

The Slack leaderboard was never shown again. Unfortunately the habit remained because people were afraid they were going to be secretly judged by how much Slack activity they generated.

I expect the same thing is going to happen at companies who had token leaderboards. Once you've instilled that fear in people, they internalize the expectation.

PaulHoule

2 minutes ago

[-]

Reminds me of the place I worked at where I got in trouble because I was the only person writing JIRA tickets. Instead of bitching out the product manager or the tester for not writing tickets, they just complained to me. And if I wrote a ticket about how we could speed up the 40 minute build to 15 minutes I'd have to explain "How does this change improve the customer experience?" to which I answered "If the build was faster the customer would have had the product six months ago"

Loughla

23 minutes ago

[-]

You have to realize that if you set a measure, you're actually setting a goal for your employees. There is no such thing as a meaningless metric; why else would you measure it?

No amount of "this isn't used for anything" will change that. It's inherent in human nature in the 21st century to believe any and all metrics will be used against them, and therefore must be gamed.

It's why you also have to set UNBELIEVABLY clear goals and have incentives tied to those goals. Incentives meaning money. If you want to measure things, measure them. But have clear, consistent, and meaningful goals tied to bonuses or something if you want a thing done correctly.

22 minutes ago

[-]

Kinda.

The answer is simpler on the surface: focus.

Generally the problem is the larger the firm’s operations, the harder it is to focus.

Apple is the only firm that has done well on this consistently and doesn’t have a huge grave yard of failures to show for it.

estearum

5 minutes ago

[-]

Are you saying what people are hoping to achieve with stupid goals? Because yeah, obviously. But the point is that they're stupid, so they don't achieve that, and that failure is 100% knowable in most scenarios.

lokar

10 minutes ago

[-]

I worked somewhere that made time from PR being sent for review and ready to merge be a metric for the reviewers. Not time to add feedback in each round. Total time elapsed.

Insanity

morpheos137

9 minutes ago

[-]

What is about silicon valley leaders not understanding basic economics or business management? These kind of cargo cult tactics would not fly in any other industry.

ryanschaefer

8 minutes ago

[-]

It’s funny how many times the same thing happens at each large company. I think people’s thought process is this:

> Oh wow! If I paid for this myself I would have spent a lot of money! Are other people spending as much as me? I’m going to create a leaderboard!

> Oh no, my misinformed manager is using the leaderboard as a slight of hand for work. I need to game this now.

Then the leaderboard is banned… I can’t see how this ever really goes up the chain beyond director.

skizm

3 minutes ago

[-]

It wasn't leadership doing this though. Any meta IC can generate internal apps and dashboards. This was unofficial and unsupported. Some random IC just made it for fun. Management is usually pretty lax with stuff like this (plenty of games and joke internal apps) so they left it up until it became a problem.

jghn

37 minutes ago

[-]

> Who could possibly have predicted that happening?

Charles Goodhart :-)

https://en.wikipedia.org/wiki/Goodhart%27s_law

goldenarm

9 minutes ago

[-]

qwertytyyuu

40 minutes ago

[-]

I know right? What did the leadership think would happen when they give some of the worlds greatest software engineers (supportably), a easily quantifiable metric to target?

VygmraMGVl

30 minutes ago

[-]

The leaderboard wasn't leadership generated, it was engineer generated from internally available data. The leadership target is "impact" from ai tools.

gtowey

7 minutes ago

[-]

What a wonderful scapegoat! Technically it's all "engineer created" because the managers generally don't do technical work. I bet many managers pushed their reports to increase their usage during their 1:1 meetings based on data from the leaderboard. If management had any sense that it was a bad metric, they had ample time to get ahead of it and take it down and provide appropriate guidance. Instead, predictably, they waited until it was a full on disaster and a crisis before acting.

Avicebron

22 minutes ago

[-]

Budget impact is technically impact.

darth_avocado

14 minutes ago

[-]

> Who could possibly have predicted that happening?

Everyone except the executives who get paid millions to predict exactly that.

Avicebron

8 minutes ago

[-]

Not a problem. There are thousands of employees standing by, willing to sacrifice their jobs for their vision.

It's a hard job, someone has to not pay consequences for bad decisions.

0cf8612b2e1e

33 minutes ago

[-]

Now come on, there was a recent post where the author argued that infallible management knew this would happen, but was part of the double-secret-probation strategy to get the cogs to finally start using AI.

SpicyLemonZest

26 minutes ago

[-]

I still think this is true and it’s not obvious to me from the source article that Meta believes otherwise. I couldn’t find the full memo, do they claim the leaderboard or “tokenmaxxing” era was a mistake?

TheOtherHobbes

11 minutes ago

[-]

Would they admit it if it was? Or would they try to find a plausible rationalisation for wasting billions without any return?

dzonga

17 minutes ago

[-]

unfortunately at big tech, this shit will keep happening.

people who make it to managers tend to have bozo tendencies & are yes men.

before it was lines of code, Jira tickets closed. Now it's tokens spent.

sharts

37 minutes ago

[-]

How dare you question the most effective allocators of capital.

dwoosley

39 minutes ago

[-]

I’d be curious to see the breakdown on spending by use case. I’ve heard it said that the majority of tokenmaxing comes from none technical uses like reading PDFs, creating PowerPoints, generating graphics/images… ect. But I’ve never heard any actual proof to that.

33 minutes ago

[-]

One thing I find fascinating as a software engineer who talks to non software engineers who use AI tools is how "reading PDFs" is not more of a solved problem. What I mean is that uploading a PDF into a chatbot tool seems to be an extraordinarily obvious use case that non technical (and technical) users would want to do.

IMO claude, chatgpt/codex, etc should be able to optimize the PDF use case to be extremely token efficient as it's a very obvious use case. But when I start to explain to my wife/friends why it burns through so much quota, I find myself thinking "why should they have to understand this aspect of it". to me, that the details of PDF parsing and extracting are relevant to users (instead of solved such that you don't have to pay attention to it) shows how these tools are not nearly as "ready" as they are made out to be. I may be preaching to the choir on this one, but just my 2c

mattnewton

26 minutes ago

[-]

Because PDFs are a nightmare of a format and the only thing that’s is reasonably guaranteed about them is they will render to an image that people can read, the parsing of which will be much less token efficient than the equivalent text

22 minutes ago

[-]

I agree with you, but every non-engineer I know using these tools 100% will drag and drop a PDF into a chatbot. Anthropic and OpenAI as companies who are selling their products to all sorts of businesses should have a much better means of handling this nightmare of a format because it is so pervasive and so obviously what so many of their customers are going to drop into the product.

tiahura

40 seconds ago

[-]

I think they’ve just decided that vision gives the best results and the token issue will take care of itself.

tyre

11 minutes ago

[-]

For anyone needing to do this, the answer is to convert it to an image first. Far smaller, LLMs work well with them (even in some pretty insane use cases I've seen), and, along with human review, it can be a huge productivity gain that results in structured data.

spindump8930

7 minutes ago

[-]

I agree with your recomendation, but converting a pdf to an image is by no means smaller. PDFs are much closer to SVGs then to jpegs.

0cf8612b2e1e

31 minutes ago

[-]

I hope someday we can get out of this local maxima of PDF documents. The format is terrible, but was right place, right time and might be impossible to dislodge.

Loughla

21 minutes ago

[-]

The problem is that for 99% of people in 99% of cases they work fine. It's hard for people to understand that they're trash.

Source; my last job working with accessibility and that nightmare.

csomar

3 minutes ago

[-]

You are missing that the product is the hype cycle around AI and that's worth Trillions of $ (Trillions with a T). Why build a PDF parser that generate text when you can BS in a podcast and get paid.

This discussion was about measures, goals and incentives. Follow the incentives.

nojito

30 minutes ago

[-]

The best way to parse pdfs is to convert them to images and feed them into the llm.

This workflow is highly optimized.

27 minutes ago

[-]

For sure there are very optimized ways to do it. My point is that a non technical user will drag and drop a pdf into a chatbot. and from a UX/product perspective, they should have to think about it more than that IMO. but seemingly, that's very much an expensive, inefficient way of doing it (burning through a whole context window try to read it, reloading it multiple times per conversation, etc.).

seemaze

24 minutes ago

[-]

Absolutely this. Never try to parse a native PDF document with any expectation of coherence or consistency.

ahmadyan

3 minutes ago

[-]

the majority came from random claws running on cron. They get a heart-beat, wake up every 10mins, reads all internal-posts, emails, gchat messages, diffs, and decides to post some random message to the workplace so other claws can also regurgitate. rinse and repeat and then we are looking at $B tokens

adam_arthur

34 minutes ago

[-]

I'd guess through LLM embedded PoC projects.

You can rack up token consumption extremely quickly when you embed LLMs into automated processes or products.

I'd be very surprised if these numbers are just typical coding usage with no scripting/pipeline/automation stuff

menloshark

15 minutes ago

[-]

One thing we use it for is for forking tools internally because of politics.

ryanschaefer

11 minutes ago

[-]

Wasn’t this already reported on? FWIW this article links to primary sources from early last month https://www.theinformation.com/articles/tokenminimizing-meta...

d4rkp4ttern

5 minutes ago

[-]

Ok I’ll ask since nobody else has — are they not giving their devs a Claude code max or Codex Pro subscription? If so, why is token cost approaching billions? And if not, why not?

grim_io

1 minute ago

[-]

Big enterprises don't get to have those subscriptions. OpenAI or Anthropic simply won't sell them to you if you need a couple thousand of those.

[1]: https://support.claude.com/en/articles/11049741-what-is-the-...

lesuorac

2 minutes ago

[-]

They can't.

The subscriptions are for personal use not enterprise.

i.e. [1] "This article is about paid Max plans for individual consumers. If you're part of an organization looking to use Claude with your team, refer to Team and Enterprise Plans."

542458

4 minutes ago

[-]

Enterprise customers don’t get those plans, at the enterprise level you have to pay by the API rate… so people don’t have limited use, but you’re also not getting the heavily discounted rate the “normal” plans are at.

root_axis

12 minutes ago

[-]

Not sure if I missed it but I couldn't find any information in the article to explain where the "approaching billions" estimate is coming from.

I could believe it, but I'd want to see something a little more concrete.

bdcravens

11 minutes ago

[-]

And I still can't exhaust the limits on my Claude Max subscription, despite being more productive than I've ever been in terms of real work (ie, things that actually make money)

nsagent

33 minutes ago

[-]

Not surprising. It seems that the comment section of every coding agent thread has at least one person mentioning they use "tokenmaxxing" to increase their token usage because it was brought up during their quarterly review, at a standup, or some other communique from on high.

Just wonder what happens when more and more companies introduce similar restrictions. Will that lead to devaluations of the LLM companies?

felix-the-cat

21 minutes ago

[-]

Within a few weeks of telling people at our company that if they don’t use AI they will be replaced by someone who does, they just announced that their allocation with ChatGPT has reset and are now panicking as they blew through their million token allocation for this month in under six hours - you can’t make this shit up.

andsoitis

36 minutes ago

[-]

measure outcomes (impact), not effort (token usage, lines of code, code coverage, hours worked, etc.)

lokar

4 minutes ago

[-]

The whole phenomenon of metric based Eng evaluations is because leadership does not trust line managers to evaluate individual engineers.

32 minutes ago

[-]

What outcomes though? The ones I’ve seen posted are still nonsensical metrics that a publicly traded firm absolutely doesn’t care about.

It wants to see faster R&D, higher revenues from existing assets, greater operating margins, higher sales to invested capital ratio and so on…

The best way to measure that for a software firm is up-time of services, usage and project completion duration

30 minutes ago

[-]

measuring uptime? I've seen Anthropic's status page, and they are a >$1 Trillion dollar company who "largely solved" coding. so clearly you aren't correct. /s

lokar

1 minute ago

[-]

Unfortunately that is a group metric, we need individual metrics

janalsncm

3 minutes ago

[-]

Ok, uptime. How do you measure an individual’s contribution to uptime? If Claude goes down does everyone take a hit? If Claude stays up everyone gets rewarded?

If so, your metric cannot distinguish between a bad engineer and a good one.

If not, you have the same problem you started with: measuring contributions to “uptime”.

28 minutes ago

[-]

Yes I am correct. Its users who are paying for it determine whether up time is sufficient. So stop crying.

You clearly don’t understand valuation - the value of an asset is a function of expected FUTURE cash flows….

Don’t bother replying unless you have a clue about what you’re talking about

24 minutes ago

[-]

my friend, I was being sarcastic before, and I am agreeing with you. LoC, token spend, etc as metrics are horrible measures. Software uptime is a great metric. I'm merely lamenting that in the age we're in, uptimes are getting worse and worse

dheera

16 minutes ago

[-]

> measure outcomes (impact)

This is also not easy. In particular proactively preventing bugs is not rewarded

veber-alex

10 minutes ago

[-]

It's not flashy.

When shit just works for months or years no one is going to come and praise you for stuff you did a while back.

You are better off breaking stuff and then fixing them to show how useful you are.

Trasmatta

26 minutes ago

[-]

All those billions spent on tokens by Meta, and not a single iota of value generated by any of it

tyre

10 minutes ago

[-]

I love how confidently you say this, with no evidence provided (and I doubt you have any.)

Just a pristine comment section yap.

jazzyjackson

3 minutes ago

[-]

If there was a positive return on token spend they wouldn’t be capping it now would they?

csomar

1 minute ago

[-]

Was there a new product released by Meta that we are not aware of? The last thing I read about was the Instagram account take-over AI-bug.

Barrin92

2 minutes ago

[-]

>I love how confidently you say this,

it's not that difficult to say it confidently if you use any of their services and applications because exactly nothing has changed.

For reference most labor productivity increases for the last 50 years amounted to about 2% per year. If a hypothetical FB engineer had doubled their productivity with their gazillion tokens that would be 30 years of productivity gains in one year. I'd wager the evidence would be quite evident if you opened any of their apps

steve-atx-7600

16 minutes ago

[-]

I guess maybe they can crank out more ads in their dystopian ad space of a social network site.

whalesalad

31 minutes ago

[-]

Clearly no one is using Meta’s customer facing AI products. Why aren’t they using their own gpu/compute for development?

wmf

23 minutes ago

[-]

Because Muse isn't good enough and why use Muse if they'll let you use Opus for free?

gordon_freeman

23 minutes ago

[-]

that is a fair point. The contrast between Meta and Apple could not be bigger here. Apple has billions of devices and yet they decided to use 3rd party models from OpenAI and later Google to build their AI features rather than building foundational models in house. Yet Meta did opposite: they built models (spending billions of $$$ and firing 10% of the company) for billions of users who rather would not use Meta AI features.

smrtinsert

46 minutes ago

[-]

That is insane. I'm sure companies will learn the absolute wrong lesson from this, and attempt to centralize and kneecap token usage.

SpicyLemonZest

18 minutes ago

[-]

As many companies do with all their budgets, down to the trivial and clearly positive EV cost of free coffee. So it goes, cost controls are hard and necessarily imprecise.

downrightmike

38 minutes ago

[-]

Tokens are less valuable than the eyeball metric of the Dotcom era. At least the eyeballs were real then.

I'd argue most of the AI value is related to how 'Dead' the internet is.