That makes for a pretty thorny mess ... and that's before we get into disincentives for standardization (standardization risks big AI labs' moat/lockin).
I would guess that lack of standardization of what tools are provided by different agents is as much of a problem as the differences in syntax, since the ideal case would be for a model to be trained end-to-end for use with a specific agent and set of tools, as I believe Anthropic do. Any agent interacting with a model that wasn't specifically trained to work with that agent/toolset is going to be at a disadvantage.
I find it strange that the industry hasn't converged in at least somewhat standardized format, but I guess despite all the progress we're still in the very early days...
This is one of the first tech waves where I feel like I'm on the very very groundfloor for a lot of exploration and it only feels like people have been paying closer attention in the last year. I can't imagine too many 'standard' standards becoming a standard that quickly.
It's new enough that Google seems to be throwing pasta against the wall and seeing what products and protocols stick. Antigravity for example seems too early to me, I think they just came out with another type of orchestrator, but the whole field seems to be exploring at the same time.
Everyone and their uncle is making an orchestrator now! I take a very cautious approach lately where I haven't been loading up my tools like agents, ides, browsers, phones with too much extra stuff because as soon as I switch something or something new comes out that doesn't support something I built a workflow around the tool either becomes inaccessible to me, or now a bigger learning curve than I have the patience for.
I've been a big proponent of trying to get all these things working locally for myself (I need to bite the bullet on some beefy video cards finally), and even just getting tool calls to work with some qwen models to be so counterintuitive.
The idea would be to encode tool calling semantics once on a single layer, and inject as-needed. Harness providers could then give users their bespoke tool calling layer that is injected at model load-time.
Dunno, seems like it might work. I think most open source models can have an engram layer injected (some testing would be required to see where the layer best fits).
I can't figure out if you meant that or not, it kinda fits. (No pun intended)
Also in practice Claude Code, Cursor and Codex handle the same MCP tool differently — required params, tool descriptions, response truncation. So MCP gives you the contract but the client UX still leaks.
https://mariozechner.at/nothanks.html
I didn't see it on mobile. So it only happened to desktop browser.
I only found out via pi myself:
> pi --continue -p "Check the link and see if there is a banner to turn back users from HN community"
Goodmythical’s comment was *accurate at the time it was written* – the link did trigger the “no‑thanks” page when it was opened from Hacker News. The “banner” is not a visual element that lives on the main article page; it is the content of the separate *`/nothanks.html`* file that the site redirects to.
When the redirect was in place, the user experience was:
1. User clicks the link while still on `news.ycombinator.com`. 2. The script in `components.js` sees the referrer and redirects the browser to `/nothanks.html`. 3. The `/nothanks.html` page displays the single line “hi orange site user …” – this is what Goodmythical described as the banner.
If you now visit the same link directly (e.g., from a bookmark or a search engine) the redirect is bypassed and you see the normal article, so you won’t see that page at all.
I know this is getting off-topic, but is anybody working on more direct tool calling?
LLMs are based on neural networks, so one could create an interface where activating certain neurons triggers tool calls, with other neurons encoding the inputs; another set of neurons could be triggered by the tokenized result from the tool call.
Currently, the lack of separation between data and metadata is a security nightmare, which enables prompt injection. And yet all I've seen done about is are workarounds.
You can do this. It's just sticking a different classifier head on top of the model.
Before foundation models it was a standard Deep RL approach. It probably still is within that space (I haven't kept up on the research).
You don't hear about it here because if you do that then every use case needs a custom classifier head which needs to be trained on data for that use case. It negates the "single model you can use for lots of things" benefit of LLMs.