FilterHN

State of AI: An Empirical 100T Token Study with OpenRouter

55 points

by anjneymidha

1 hour ago

| past

| 6 comments

| openrouter.ai

| HN

▲

sosodev

1 minute ago

[-]

The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?

▲

lukev

25 minutes ago

[-]

Super interesting data.

I do question this finding:

> the small model category as a whole is seeing its share of usage decline.

It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.

It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.

▲

maikakz

8 minutes ago

[-]

Thank you & totally agree! The findings are purely observational through OpenRouter’s lens, so they naturally reflect usage on the platform, not the entire ecosystem.

▲

syspec

29 minutes ago

[-]

According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.

I'm pretty surprised by that, but I guess that also selects for people who would use openrouter

▲

djfergus

3 minutes ago

[-]

Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.

▲

raincole

4 minutes ago

[-]

If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.

▲

asadm

9 minutes ago

[-]

Who is using grok code and why?

▲

themanmaran

1 hour ago

[-]

> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.

I'd be interested in a clarification on the reasoning vs non-reasoning metric.

Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).

Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.

▲

ribosometronome

1 minute ago

[-]

As would models that that are overly verbose. My experience is the Claude tends to do more than is asked for (e.g. immediately move on to creating tests and documentation) while other models like Gemini tend to be more concise in what they do.

▲

reeeli

1 hour ago

[-]

I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?

▲

themanmaran

45 minutes ago

[-]

"reasoning" models like GPT 5 et al do a pre-generation step where they:

- Take in the user query (input tokens)

- Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)

- Answer (output tokens)

Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.

▲

typs

54 minutes ago

[-]

I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)

▲

maikakz

44 minutes ago

[-]

that's exactly right!

▲

typs

1 hour ago

[-]

This is really amazing data. Super interesting read