It doesn't have class support yet!
But it doesn't matter, because LLMs that try to use a class will get an error message and rewrite their code to not use classes instead.
Notes on how I got the WASM build working here: https://simonwillison.net/2026/Feb/6/pydantic-monty/
I’m especially curious about where the Pydantic team wants to take Monty. The minimal-interpreter approach feels like a good starting point for AI workloads, but the long tail of Python semantics is brutal. There is a trade-off between keeping the surface area small (for security and predictability) and providing sufficient language capabilities to handle non-trivial snippets that LLMs generate to do complex tasks
I think in the near term we'll add support for classes, dataclasses, datetime, json. I think that should be enough for many use cases.
disclaimer: i work at E2B, opinions my own
But to be clear, we're not even targeting the same "computer use" use case I think e2b, daytona, cloudflare, modal, fly.io, deno, google, aws are going after - we're aiming to support programmatic tool calling with minimal latency and complexity - it's a fundamentally different offering.
Chill, e2b has its use case, at least for now.
And Python VM had/has its sandboxing features too, previously rexec and still https://github.com/zopefoundation/RestrictedPython - in the same category I'd argue.
Then there's of course hypervisor based virtualization and the vulnerabilities and VM escapes there.
Browsers use belt-and-suspenders approaches of employing both language runtime VMs and hardware memory protection as layers to some effect, but still are the star act at pwn2own etc.
It's all layers of porous defenses. There'd definitely be room in the world for performant dynamic language implementations with provably secure foundations.
although you’d still need another boundary to run your app in to prevent breaking out to other tenants.
Everyone was using git for reasons to me that seemed bandwagon-y, when Mercurial just had such a better UX and mental model to me.
Now, everyone is writing agent `exec`s in Python, when I think TypeScript/JS is far better suited for the job (it was always fast + secure, not to mention more reliable and information dense b/c of typing).
But I think I'm gonna lose this one too.
I do like Typescript (not JS) better, because of its highly advanced type system, compared to Python's.
TS/JS is not inherently fast, it just has a good JIT compiler; Python still ships without one. Regarding security, each interpreter is about as permissive as the other, and both can be sealed off from environment pretty securely.
LLMs are really good at writing python for data processing. I would suspect its due to Python having a really good ecosystem around this niche
And the type safety/security issues can hopefully be mitigated by ty and pyodide (already used by cf’s python workers)
Monty’s overhead is so low that, assuming we get the security / capabilities tradeoff right (Samuel can comment on this more), you could always have it enabled on your agents with basically no downsides, which can’t be said for many other code execution sandboxes which are often over-kill for the code mode use case anyway.
For those not familiar with the concept, the idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see (all of) the intermediate value. Every step that depends on results from an earlier step requires a new LLM turn, limiting parallelism and adding a lot of overhead, expensive token usage, and context window bloat.
With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.
These posts by Cloudflare: https://blog.cloudflare.com/code-mode/ and Anthropic: https://platform.claude.com/docs/en/agents-and-tools/tool-us... explain the concept and its advantages in more detail.
Yes, I was also thinking.. y MCP den
But even my simple class project reveals this. You actually do want a simple tool wrapper layer (abstraction) over every API. It doesn't even need to be an API. It can be a calculator that doesn't reach out anywhere.
as the article puts it: "MCP makes tools uniform"
In hindsight, it's pretty funny and obvious
Yep still using good old hg for personal repos - interop for outside project defaults to git since almost all the hg host withered.
Why would one drag this god forsaken abomination on server-side is beyond me.
Even effing C# nowdays can be run in script-like manner from a single file.
—
Even the latest Codex UI app is Electron. The one that is supposed to write itself with AI wonders but couldn’t manage native swiftui, winui, and qt or whatever is on linux this days.
Typescript’s types are far more adaptable and malleable, even with the latest C# 15 which is belatedly adding Sum Types. If I set TypeScript to its most strict settings, I can even make it mimic a poor man’s Haskell and write existential types or monoids.
And JS/TS have by far the best libraries and utilities for JSON and xml parsing and string manipulation this side of Perl (the difference being that the TypeScript version is actually readable), and maybe Nushell but I’ve never used Nushell in production.
Recently I wrote a Linux CLI tool for managing podman/quadlett containers and I wrote it in TypeScript and it was a joy to use. The Effect library gave me proper Error types and immutable data types and the Bun Shell makes writing shell commands in TS nearly as easy as Bash. And I got it to compile a single self contained binary which I can run on any server and has lower memory footprint and faster startup time than any equivalent .NET code I’ve ever written.
And yes had I written it in rust it would have been faster and probably even safer but for a quick a dirty tool, development speed matters and I can tell you that I really appreciated not having to think about ownership and fighting the borrow checker the whole time.
TypeScript might not be perfect, but it is a surprisingly good language for many domains and is still undervalued IMO given what it provides.
Perhaps if the interpreter is in turn embedded in the executable and runs in-process, but even a do-nothing `uv` invocation takes ~10ms on my system.
I like the idea of a minimal implementation like this, though. I hadn't even considered it from an AI sandboxing perspective; I just liked the idea of a stdlib-less alternative upon which better-thought-out "core" libraries could be stacked, with less disk footprint.
Have to say I didn't expect it to come out of Pydantic.
Pretty much all morn software tooling, removing the parts that aim at appeal to humans, becomes much more reliable tools. But it's not clear if the performance will be better or not.
Just beware of panics!
While I think all LLMs are shit, they probably eventually will not be shit, and it will because people like you contributed to their progress. Nothing good will come of it for you or your peers. The Billionaires who own everything will kick you out to the curb as soon as you train your replacement that doesn't sleep, eat or complain. Have some class solidarity.
Open source has been responsible for enormous productivity boosts in our industry, because we don't all have to build duplicates of exactly the same thing time and time again.
But think of all of the jobs that were lost by people who would otherwise been employed building the 500th version of a CSS design system, or a template engine, or code to handle website logins!
What makes AI tools different? (And I actually do agree that they feel different, but I'm interested in hearing arguments stronger than "it feels different".)
To put it gently, yes it feels different: for people who haven't already saved a lifetime of SWE wages, this is the first credible threat to the sector in which they're employed since the dot com bubble. People need to work to eat.
You cannot compare any open source software, even as a whole, to the impact that LLMs have had on labor and are projected too. However, I might now argue it would have been better to not have so much open source, as its clearly being processed through these plagiarism laundering training regimes.
I don't really think LLMs, robotics and ML in general are going to increase GDP globally, they will instead just replace the inputs that were maintain the status quo (the workers). If they can't successfully replace human labor, it will at minimum greatly reduce its value, which is extremely dangerous.
Jobs grew greatly during the last 30 years of open source development but over the last 16 months we've had 350-400k SWE layoffs in the last 16 months in the USA. Many of these layoffs have been directly correlated to AI enhanced productivity. 25% of recent college graduates are unemployed. Jobs data is super unreliable at the moment, but we also will see large swaths of the lower skilled sectors, customer service for example, see huge layoffs in the coming 24 months.
Despite what C-Suites say about AI giving them more free time for their hobbies or whatever, they've yet to answer how people are going to afford those hobbies. Working as a barista lol? These same mouthpieces will say that llms are going to allow the same amount of engineers to get 10x more done, but they're not reflecting that in their business decisions. They are laying people off in swaths when equities are at all time highs, its abnormal.
I think its more likely the ruling classes will give us something to do by making us so poor that young men will beg to go fight wars. Put us to use on behalf of their conquest for more resources, that certainly did the trick in the 20s, 30s and 40s :/
The invention of the digital calculator turned human calculators into accountants, and that's great! We're contributing to the same process now
Corporations and billionaires will get Ti-Nspires we get Ti-83s.
I do not agree that inference will get more affordable in time to prevent harm. It will cause way more problems with the devaluation of labor before it starts to solve those problems, and in that period they will solidify their control over society.
We already see it in how ML is being used on a vast scale to build advanced surveillance infrastructure. Lets not build the advanced calculators for them for free in open source please, they'd like nothing better. I wrote a lot more in the comments above also.
If anyone has time, this is required reading imho: https://archive.nytimes.com/www.nytimes.com/books/97/05/18/r...
I don't know how to prevent people from stopping this without shaming them. I think more shaming might be required, as uncomfortable as that may be. It's a societal wide prisoner's dilemma (well if I don't build it, someone else will), except we this isn't a prisoners dilemma and we can coordinate, sort of.
It would be one thing if GPUs and Tokens were cheap and everyone could take these implementations and out compete the corporations, but that's not the game theoretical terms we're on here. They have the resources, and I promise they are not going to let the average joe be able afford to out compete them. They are the ones that are going to be able to get the most advantage from these tools.. Why give them the extra leverage. It will be used to displace you. The ruling class or those with the resources, have zero intention of letting the tide rise all boats. And if there are any in the ruling class that do have good intentions, they will be rooted out.
We see this evidence all across literature, history, and in their own actions. This year in Telluride Colorado the Ski Patrol Union went on strike over wages. The billionaire owner who lives in California, Chuck Horning, did not want to concede to the Ski Patrolers over a $66k spread out over 3 years, like 22k a year over the contract length. He shutdown the ski resort during the Christmas holidays, and brought the town to its knees. This is just one example, but there are many. It is ideological to these people, its about maintaining their control over the working class. We are at the beginning of a class struggle that Earth has never witnessed before, with way more lives at stake.
I do not think LLMs are going to lead to super intelligence btw, I do believe it will get decent enough to uproot many lives when its used as a weapon against the value of labor and to accelerate concentration of resources into the few(er). We are up against people like Chuck Horner, who'd rather destroy an entire town of workers over 22k a year than concede any power. They have zero interest in building a equitable society, or we wouldn't see this type of behavior. This will 100% get used to replace you, then what will they do with us? They aren't going to just let everyone chill, I promise you that.
I believe the devaluation (and surveillance )of labor because of LLMs, robotics (machine learning in general) is the most pressing issue of our time.
I get the draw to building cool tools with these things, but please don't do it in the open. Let someone else do it, and then we can call them out too. The slower these developments can happen the better.
everything that you don’t want your agent to access should live outside of the sandbox.
And now for something, completely different.
Will explore this for https://toolkami.com/, which allows plug and play advanced “code mode” for AI agents.
Of course it's slow for complex numerical calculations, but that's the primary usecase.
I think the consensus is that LLMs are very good at writing python and ts/js, generally not quite as good at writing other languages, at least in one shot. So there's an advantage to using python/js/ts.
My reasoning is 1) AIs can comprehend specs easily, especially if simple, 2) it is only valuable to "meet developers where they are" if really needing the developers' history/experience which I'd argue LLMs don't need as much (or only need because lang is so flexible/loose), and 3) human languages were developed to provide extreme human subjectivity which is way too much wiggle-room/flexibility (and is why people have to keep writing projects like these to reduce it).
We should be writing languages that are super-strict by default (e.g. down to the literal ordering/alphabetizing of constructs, exact spacing expectations) and only having opt-in loose modes for humans and tooling to format. I admit I am toying w/ such a lang myself, but in general we can ask more of AI code generations than we can of ourselves.
But I'd be interested to see what you come up with.
I think skills and other things have shown that a good bit of learning can be done on-demand, assuming good programming fundamentals and no surprise behavior. But agreed, having a large corpus at training time is important.
I have seen, given a solid lang spec to a never-before-seen lang, modern models can do a great job of writing code in it. I've done no research on ability to leverage large stdlib/ecosystem this way though.
> But I'd be interested to see what you come up with.
Under active dev at https://github.com/cretz/duralade, super POC level atm (work continues in a branch)
Tokenization joke?
Or is all Rust code secure unquestionably?
The idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see the intermediate value. Every step that depends on results from an earlier step also requires a new LLM turn, limiting parallelism and adding a lot of overhead.
With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.
These posts by Cloudflare: https://blog.cloudflare.com/code-mode/ and Anthropic: https://platform.claude.com/docs/en/agents-and-tools/tool-us... explain the concept and its advantages in more detail.
For example, incorrect levels of indentation. Let me use dots instead of space because of HN formatting:
for key,val in mydict.items():
..if key == "operation":
....logging.info("Executing operation %s",val)
..if val == "drop_table":
....self.drop_table()
This uses good syntax, and I the logging part is not in the stdlib, so I assume it would ignore it or replace it with dummy code? That shouldn't prevent it from analyzing that loop and determining that the second if-block was intended to be under the first, and the way it is written now, the key check isn't done.
In other words, if you don't want to do validate proper stdlib/module usage, but proper __Python__ usage, this makes sense. Although I'm speculating on exactly what they're trying to do.
EDIT: I think I my speculation was wrong, it looks like they might have developed this to write code for pydantic-ai: https://github.com/pydantic/pydantic-ai , i'll leave the comment above as-is though, since I think it would still be cool to have that capability in pydantic.
(Genuine question, I've been trying to find reliable, well documented, robust patterns for doing this for years! I need it across macOS and Linux and ideally Windows too. Preferably without having to run anything as root.)
https://danwalsh.livejournal.com/28545.html
One might have different profiles with different permissions. A network service usually wouldn't need your hone directory while a personal utility might not need networking.
Also, that concept could be mixed with subprocess-style sandboxing. The two processes, main and sandboxed, might have different policies. The sandboxed one can only talk to main process over a specific channel. Nothing else. People usually also meter their CPU, RAM, etc.
INTEGRITY RTOS had language-specific runtimes, esp Ada and Java, that ran directly on the microkernel. A POSIX app or Linux VM could run side by side with it. Then, some middleware for inter-process communication let them talk to each other.
https://github.com/microsoft/litebox might somehow allow it too if a tool can be built on top of it, but there is no documentation.
I trust Firecracker more because it was built by AWS specifically to sandbox Lambdas, but it doesn't work on macOS and is pretty fiddly to run on Linux.
https://en.wikipedia.org/wiki/List_of_Python_software#Python...