Learnings from 100K lines of Rust with AI (2025)
74 points
2 hours ago
| 12 comments
| zfhuang99.github.io
| HN
chadd
26 minutes ago
[-]
We're working on a large Rust codebase, heavily assisted development with Claude and Codex, and one critical workflow is after you have written a spec, have the other LLM critique it thoroughly.

This back and forth will take quite a while, but the resulting implementation plan will be 10x better than the original.

You can automate this by giving Codex a goal, and a skill to call Claude to review the implementation spec until they both agree it's done.

Then, for critical code, have them both implement the spec in a worktree, then BOTH critique each other's implementation.

More often than not, Claude will say to take 2 or 3 pieces from it's design over to Codex, but ship the Codex implementation.

reply
motoboi
6 minutes ago
[-]
I strongly believe you don’t need to call another model for that. The same model can do result fine. Just not as part of the same context.

I mean that if you ask codex on gpt 5.5 to submit to a plan reviewer subagent that uses gpt5.5, this is enough to have a very good reviewing and reassessment of the plan.

My hypothesis is that it’s even better than opus.

The reason why submitting the product of one LLM to another to review is that you need a fresh trajectory. The previous context might have “guided” the planer into some bias. Removing the context is enough to break free from that trajectory and start fresh.

reply
ai_fry_ur_brain
4 minutes ago
[-]
I hate how seriously people take the output of an LLMs or how reliable they think it is.

Have Claude produce that spec 10 times, use the same prompt and same context. Identical requests, but you'll get 10 unique answers that wil contradict each other with each response being seeming extermely confident sounding.

Its scary how confident you people are in these outputs.

reply
jdw64
1 hour ago
[-]
I'm also shifting to an vibe coding workflow, but I have a genuine question: whenever I use AI for Rust, it makes an insane amount of lifetime errors. I have no idea how people are churning out so many lines of code so quickly.

Honestly, despite all the hype around Rust in the community, the fact that AI can't handle lifetimes reliably makes me reluctant to use it. The AI constantly defaults to spamming .clone() or wrapping things in Rc, completely butchering idiomatic Rust and making the output a pain to work with.

On the other hand, it writes higher-level languages better than I do. For those succeeding with it, how exactly are you configuring or prompting the AI to actually write good, idiomatic Rust

reply
embedding-shape
1 hour ago
[-]
> I'm also shifting to an vibe coding workflow, but I have a genuine question: whenever I use AI for Rust, it makes an insane amount of lifetime errors. I have no idea how people are churning out so many lines of code so quickly.

What harness and model you've been using? For the last few months, essentially since I did the whole "One Human + One Agent = One Browser From Scratch" experiment, I've almost exclusively been doing cross-platform native desktop development with Rust, currently with my own homegrown toolkit basically written from scratch, all with LLMs, mostly with codex.

But I can't remember a single time the agent got stuck on lifetime errors, that's probably the least common issue in regards with agents + Rust I come across. Much bigger issue is the ever-expanding design and LLMs being unable to build proper abstractions that are actually used practically and reduces the amount of code instead of just adding to the hairball.

The issue I'm trying to overcome now is that each change takes longer and longer to make, unless you're really hardcore about pulling back the design/architecture when the LLM goes overboard. I've only succeeded in having ~10 minute edits in +100K LOC codebases in two of the projects I've done so far, probably because I spent most of the time actually defining and thinking of the design myself instead of outsourcing it to the LLM. But this is the biggest issue I'm hitting over and over with agents right now.

reply
tomtom1337
24 minutes ago
[-]
Have you split your 100k loc codebases into smaller crates? If you take a look at eg gitoxide's repo, they've split it in many smaller crates. I think that might help with keeping the scope for the ai small and maybe help with keeping contracts tight and well-defined.
reply
embedding-shape
4 minutes ago
[-]
Yes, that absolutely helps (and yes, doing that :) ), I'm going even further and basically hard-enforcing a LOC limit per file too, which helps a lot as well.

The complexities LLMs end up putting themselves in is more about the bigger architecture/design of the program, rather than concrete lines, where things end up so tangled that every change requires 10s of changes across the repository, you know, typical "avoid the hairball" stuff you come across in larger applications...

reply
rurban
2 minutes ago
[-]
I see the complete opposite. The lower level the language, the less babysit the agent. Pure asm is the best, only with very advanced SIMD flags it has problems. C is excellent.

But python or typescript are full of errors all the time. I rather fallback to perl than python. Perl has been excellent all along.

reply
hydra-f
1 hour ago
[-]
A lefthook:

format: glob: ".rs" run: cargo fmt -- --check

lint: glob: ".rs" run: cargo clippy -- -D warnings

tests: run: cargo test

audit: run: cargo audit

+ hooks that shove the lefthook automatically in the ai's face

---

rustfmt.toml:

edition = "2021" newline_style = "Unix" use_small_heuristics = "Max" max_width = 100

reply
ramon156
1 hour ago
[-]
use "stage_fixed" to automatically persist the formatting :)
reply
hydra-f
26 minutes ago
[-]
Thank you!
reply
dijit
1 hour ago
[-]
The feedback loop is the interesting part, if you use standard software engineering practices (modularise, test/document your interfaces, etc) then I find things like Claude Code do an exceptional job: since they can actually run cargo check/test themselves and can validate the tests too.
reply
insanitybit
49 minutes ago
[-]
I'm surprised to hear this. I have not had any issues here at all. The AI might clone things but I don't really care/ mind, I can ask it to refactor to make things zero-copy after, which is how I've often written Rust myself. I've never seen it overly wrap things in Rc.

I've not done any particular/ special prompting.

reply
vermilingua
1 hour ago
[-]
The irony of the machines having no mechanical sympathy is just too good
reply
onlyrealcuzzo
41 minutes ago
[-]
> whenever I use AI for Rust, it makes an insane amount of lifetime errors.

What model are you using, and what frameworks are you using?

This is not a hard problem for LLMs to solve.

Rust is nearly the perfect language for LLMs.

It's exceptionally expressive, and it forbids entirely the most common globally complex bugs that LLMs simply do not (and won't for some time) have the context window size to properly reason about.

Dynamically typed languages are a disaster for LLMs because they allow global complexity WRT to implicit type contracts (that they do not and cannot be relied on to withhold).

If you're going to add types, as someone pointed out earlier, why are you even telling an LLM to write Python anyways?

Rust is barely harder to read than Python with types. It's highly expressive.

You have the `&mut` which seems alien, verbose (safe) concurrency, and lifetimes - which - if you're vibe coding... you don't really need to understand that thoroughly.

You want an LLM to write code in a language where "if it complies, it works" - because... let me tell you, if you vibe code in a language where errors are caught at runtime instead of compile time... It will definitely NOT work.

reply
mike_hearn
34 minutes ago
[-]
It's not nearly the perfect language for LLMs and Rust is dramatically harder to read and reason about than Python with types. Other options work better for nearly all apps. I found Kotlin works well:

- Garbage collected so no reasoning tokens or dev cycles are wasted on manual memory management. You say if you're vibe coding you can ignore lifetimes, but in response to a post that says AI can't do a good job and constantly uses escape hatches that lose the benefits of Rust (and can easily make it worse, copying data all over the place is terrible for performance).

- Very fast iteration speed due to JIT, a fast compiler and ability to use precompiled libraries. Rust is slow to compile.

- High level code that reads nearly like English.

- Semantically compatible with Java and Java libs, so lots of code in the training set.

- Unit tests are in separate files from sources. Rust intermixes them, bloating the context window with tests that may not be relevant to the current task.

reply
rirze
18 minutes ago
[-]
Then your domain problem you’re trying to solve doesn’t benefit from Rust.

Sounds like your work doesn’t need Rust and that’s ok.

But don’t generalize.

reply
onlyrealcuzzo
32 minutes ago
[-]
Write a 250k LOC compiler in Python and then get back to me how well LLMs write in Python...

Sure if you want to vibe code a TODO app where it's literally just copying and pasting one it's already seen 10,000 times before, it can do it in Python.

reply
faitswulff
1 hour ago
[-]
What kinds of programs are you writing and with what models? I'm curious if the lifetimes your programs require are trickier than most.
reply
jdw64
1 hour ago
[-]
I'm actually vibe coding a game engine right now using a Hexagonal Architecture, and I ran into this exact same issue when trying to synchronize the feedback loop between the viewport and the editor. To be fair, I probably messed up the domain boundaries myself in the first place, but honestly, the AI-generated code wasn't very effective at solving it either
reply
mountainriver
1 hour ago
[-]
I’ve been writing almost exclusively Rust with LLMs and rarely ever hit this. I guess maybe the kind of work you are doing?
reply
nvader
1 hour ago
[-]
I wrote and maintain this library of skills and workflows called Rust Bucket[0]

It sets up your repo to ensure agents use a workflow which breaks your user requests down into separate beads, works on them serially, runs a judge agent after every bead is complete to apply code quality rules, and also strict static checks of your code. It's really helpful in extracting long, high-quality turns from the agent. It's what we used to build Offload[1].

0: https://github.com/imbue-ai/rust-bucket : A rusty bucket to carry your slop ;)

1: https://github.com/imbue-ai/offload

reply
arpinum
37 minutes ago
[-]
rust-bucket is 404, did you make it private?
reply
nvader
10 minutes ago
[-]
Thanks for the flag--I guess I must have never made it public.

Fixed.

reply
altmanaltman
1 hour ago
[-]
I think it's due to the lack of quality instructions on what is good Rust code; AI often literally doesn't know what idiomatic Rust is. It can be good to have a reference where you write the basic rules that you want it to follow (ideal to assume it has no idea why spamming clone is bad and you're speaking to someone who has just watched one of those youtube videos with a dude in black t-shirt speaking very slowly and going over basic programming concepts as if they're breaking you out of the matrix).
reply
mohsen1
50 minutes ago
[-]
Lots and lots of guardrails to not allow slop.

In tsz I have hard gates that disallow doing work in the wrong crate etc.

https://github.com/mohsen1/tsz

reply
embedding-shape
48 minutes ago
[-]
> have hard gates that disallow doing work in the wrong crate

Maybe I'm using agents wrong, but I'm not sure how you'd end up in that situation in the first place? When I start codex, codex literally only has access to the directory I'm launching it, with no way to navigate, read or edit stuff elsewhere on my disk, as it's wrapped in isolation with copied files into it, with no sync between the host.

Hearing that others seemingly let agents have access to their full computer, I feel like I'm vastly out of date about how development happens nowadays, especially when malware and virus lurks around all the package registries.

reply
mohsen1
38 minutes ago
[-]
tsz is an experiment in giving coding agents full control. On my day job I am a lot more careful. But I've moved on from manually approving every change and instead review the final diff. I noticed manually approving was counterproductive.
reply
embedding-shape
1 minute ago
[-]
Right, I'm giving my agents full control too, but not sure why that'd exclude putting them in a sandbox?
reply
ramon156
1 hour ago
[-]
Clone is not "butchering idiomatic Rust", we gotta stop this nonsense
reply
jdw64
57 minutes ago
[-]
Sorry, should clarify. .clone() itself isn't inherently unidiomatic when used .

My issue is specifically with how the AI uses it. In AI code, .clone() is almost always used as a brute-force escape hatch

reply
izietto
46 minutes ago
[-]
Just like for me as an amateur Rust enjoyer then
reply
andai
49 minutes ago
[-]
So .clone() significantly reduces the mental overhead of using rust with a small performance impact? I'm intrigued :)

Maybe it's harder to reason about the lifetime semantics while also writing code, and works better as a second phase (the de-cloning).

reply
boitiga
53 minutes ago
[-]
Honestly Rust is an UGLY language. For whatever powers it possesses in memory safety, its cryptic symbology is reminiscent of assembly.

This is a problem when language designers are mathematicians and don’t understand typographical nuance and visual weights.

reply
peter-m80
25 seconds ago
[-]
To me it looks clean and concise
reply
embedding-shape
50 minutes ago
[-]
If I was forced to write it myself, then I'd agree, I'd use Clojure all day before Rust, because it's such a chore to write, edit and read.

The whole "with AI" kind of reduces my hate for Rust though, and increases the appreciation for how strict the language is, especially when the agents themselves does the whole "do change > see error/warning > adjust code > re-check > repeat" loop themselves, which seems to work better the more strict the language is, as far as I can tell.

The "helpful" error messages from Rust can be a bit deceiving though, as the agents first instinct seems to be to always try what the error message recommends, but sometimes the error is just a symptom of a deeper issue, not the actual root issue.

reply
pelasaco
36 minutes ago
[-]
If I was forced to write it myself, i would love to keep writing ruby. What a wonderful language. I dont write ruby anymore, mostly using golang and python.. but ruby still a joy.
reply
boitiga
42 minutes ago
[-]
It’s funny I got downvoted immediately as expected.

I mean God help us should a crustacean try to understand the merits of my claim.

“Oh he’s saying something negative about rust…” Downvote!

I think with AI the language should still be readable. Humans need to be able to understand what’s going on!

reply
torben-friis
1 hour ago
[-]
>Testing is the first layer of defense. My system now includes 1,300+ tests — from unit tests to minimal integration tests (e.g., proposer + acceptor only), all the way to multi-replica full integration tests with injected failures. See the project status.

I know LOC is a silly metric, but ~1300 tests for 130k lines averages out to a test per 100 lines - isn't this awfully low for a highly complex piece of code, even discounting the fact that it's vibecoded? 100 LOC can carry a lot of logic for a single test, even for just happy paths.

reply
embedding-shape
42 minutes ago
[-]
Considering the domain being distributed systems, and aiming to implement "a Rust-based multi-Paxos consensus engine that not only implements all the features of Azure’s Replicated State Library (RSL)", I don't think we even have to look so deep into it, it's severely lacking tests.

If you're building a distributed system and you don't have more tests and testing code than actual code, by an order of magnitude most likely, then you're missing test coverage.

reply
kawogi
1 hour ago
[-]
IIUC only 50k LoC are non-test code, which improves the metric. Whether that's enough tests still depends on the code. If most are getters and setters, the coverage might be ok.
reply
risyachka
1 hour ago
[-]
I may have missed it but are those tests written by person or generated? Otherwise how do you know they even test anything (like actually test, not appear to test)
reply
icemanx
1 hour ago
[-]
How many of those tests have you actually read yourself if all of them are generated by AI (also when you're sleeping) ?

This is from 2025 - I would like to see an update now how that system turned out to be after the vibe hype

reply
ramon156
27 minutes ago
[-]
I feel like there's very little blogs that actually follow up on their experiment. It's just dopamine city.
reply
chemex
27 minutes ago
[-]
How are you keeping the requirement, design, and tasks docs in sync as the code evolves? I'm curious if anyone's landed on a good workflow for this.
reply
danbruc
51 minutes ago
[-]
Paxos is certainly non-trivial in the sense that tiny changes can break it, but in terms of functionality it is not that big. 50 KLOC just seems like a lot of code to me.
reply
staszewski
1 hour ago
[-]
It's almost guaranteed with agents you could do the same job with less than half of 100k lines. I don't know whats impressive in lines of code generated by agent.
reply
ndr
1 hour ago
[-]
It just an anchor. If it were 50k would you say the same down to 25k? And if so how many more times would it apply?

The interesting thing is that it was manageable solo (in many ways it's _more_ manageable solo+AIs than with coworkers+(their)AIs), and in such a short amount of time.

reply
kikimora
14 minutes ago
[-]
Original RSL library is 36k LoC. And this is C++. Rust should be like 50% smaller, that is, 18k LoC. This library is so big that I bet the author has no idea if it works or not. 1300 test generated by AI say nothing about actual quality.

In the end it is just a lot of unmaintainable code quickly generated by AI.

reply
rimliu
1 hour ago
[-]
the interesting thing is how fast it becomes unmanagable.
reply
ndr
49 minutes ago
[-]
Also that, I suspect that's correlated to how practical is to have multiple people (with their agents) iterating on it.
reply
ashirviskas
1 hour ago
[-]
> It's almost guaranteed with agents you could do the same job with less than half of 100k lines.

That's great, non-test code is only ~47k lines of code.

reply
sreekanth850
1 hour ago
[-]
For a startup with limited funding, building a product is no more a bottleneck. every one doesn't have the same access to funding!
reply
nilirl
1 hour ago
[-]
Is the idea of the runtime contracts similar to the idea of runtime validation? Or are they different in some way?
reply
pramodbiligiri
1 hour ago
[-]
It is described in the "Code Contracts" section of the article: "Code contracts specify preconditions, postconditions, and invariants for critical functions. These contracts are converted into runtime asserts during testing but can be disabled in production builds for performance". The .NET framework article that he links to: https://learn.microsoft.com/en-us/dotnet/framework/debug-tra...
reply
andai
46 minutes ago
[-]
Is this basically what Dijkstra was saying? I've been thinking how his approach was considered impractical, but may eventually become necessary for security/stability reasons the way things are going. (Seems like new zeroday on HN front page every day now.)
reply
nilirl
1 hour ago
[-]
Ah, I missed the reference. Thanks a lot!
reply
kikimora
18 minutes ago
[-]
This is great example of AI slop and a big problem with AI coding.

Original RSL library has 36 KLoC across C++ source and headers files. Rust supposed to be more expressive and concise. Yet, AI generated 130k LoCs. I guess nobody understands how this code works and nobody can tell if it actually works.

reply
bharxhav
1 hour ago
[-]
Rust is about abstractions more than code. You can ask AI to "Optimize/Test/Clarify" but at the end of the day you should be willing to blindly agree to it's output or spend more time reviewing someone else's code.
reply
10g1k
45 minutes ago
[-]
Lessons. There's no such thing as learnings.
reply
criddell
34 minutes ago
[-]
Learnings is irritating to me. The way kids use the word aesthetic is irritating too. I wonder if I might be that old man shaking his fist at the clouds, but I have gotten over begs the question, and literally, so maybe not yet...
reply
tskj
28 minutes ago
[-]
A lesson would be a specific learning activity happening at a specific place and time, administered by a person more knowledgeable than you; like a teacher or mentor "giving a lesson".

If you're fine with the generalized form "learned a lesson", then surely "learnings" is fine too. There's no point in trying to police a completely normal and sensible use of language.

reply
faangguyindia
1 hour ago
[-]
Rust code generation consumes lot of token

Go is much better target, i've observed rails/ruby code is also much easier for AI to spit out.

And Haskell flies with AI

reply
jgilias
1 hour ago
[-]
Yes, but it comes with much better “built-in” guardrails to rein in the autocomplete. Especially if compared to something runtime-surprise-prone-if-lovable like Ruby.
reply