In December 2024, during the frenzied adoption of LLM coding assistants, we became aware that such tools tended—unsurprisingly—to produce Go code in a style similar to the mass of Go code used during training, even when there were newer, better ways to express the same idea. Less obviously, the same tools often refused to use the newer ways even when directed to do so in general terms such as “always use the latest idioms of Go 1.25.” In some cases, even when explicitly told to use a feature, the model would deny that it existed. [...] To ensure that future models are trained on the latest idioms, we need to ensure that these idioms are reflected in the training data, which is to say the global corpus of open-source Go code.
The way you should think of RL (both RLVR and RLHF) is the "elicitation hypothesis[1]." In pretraining, models learn their capabilities by consuming large amounts of web text. Those capabilities include producing both low and high quality outputs (as both low and high quality outputs are present in their pretraining corpora). In post training, RL doesn't teach them new skills (see E.G. the "Limits of RLVR"[2] paper). Instead, it "teaches" the models to produce the more desirable, higher-quality outputs, while suppressing the undesirable, low-quality ones.
I'm pretty sure you could design an RL task that specifically teaches models to use modern idioms, either as an explicit dataset of chosen/rejected completions (where the chosen is the new way and the rejected is the old), or as a verifiable task where the reward goes down as the number of linter errors goes up.
I wouldn't be surprised if frontier labs have datasets for this for some of the major languages and packages.
[1] https://www.interconnects.ai/p/elicitation-theory-of-post-tr...
In Stackoverflow data is trivial to edit and the org (previously, at least) was open to requests from maintainers to update accepted answers to provide more correct information. Editing is trivial and cheap to carry out for a database - for a model editing is possible (less easy but do-able), expensive and a potential risk to the model owner.
And then you point out issues in a review, so the author feeds it back into an LLM, and code that looks like it handles that case gets added... while also introducing a subtle data race and a rare deadlock.
Very nearly every single time. On all models.
Claude 4.6 has been excellent with Go, and truly incompetent with Elixir, to the point where I would have serious concerns about choosing Elixir for a new project.
That's a langage problem that humans face as well, which golang could stop having (see C++'s Thread Safety annotations).
For a long time my motto around software development has been "optimize for maintainability" and I'm quite concerned that in a few years this habit is going to hit us like a truck in the same way the off-shoring craze did - a bunch of companies will start slowly dying off as their feature velocity slows to a crawl and a lot of products that were useful will be lost. It's not my problem, I know, but it's quite concerning.
Maybe the best way is to do the scaffolding yourself and use LLMs to fill the blanks. That may lead to better structured code, but it doesn’t resolve the problem described above where it generates suboptimal or outdated code. Code is a form of communication and I think good code requires an understanding of how to communicate ideas clearly. LLMs have no concept of that, it’s just gluing tokens together. They litter code with useless comments while leaving the parts that need them most without.
Even though I don't like Go, I acknowledge that tooling like this built right into the language is a huge deal for language popularity and maturity. Other languages just aren't this opinionated about build tools, testing frameworks, etc.
I suspect that as newer languages emerge over the years, they'll take notes from Go and how well it integrates stuff like this.
https://lwn.net/Articles/315686
Also IDE tooling for C#, Java, and many other languages; JetBrains' IDEs can do massive refactorings and code fixes across millions of lines of code (I use them all the time), including automatically upgrading your code to new language features. The sibling comment is slightly "wrong" — they've been available for decades, not mere years.
Also JetBrains has "structural search and replace" which takes language syntax into account, it works on a higher level than just text like what you'd see in text editors and pseudo-IDEs (like vscode):
https://www.jetbrains.com/help/idea/structural-search-and-re...
https://www.jetbrains.com/help/idea/tutorial-work-with-struc...
For modern .NET you have Roslyn analyzers built in to the C# compiler which often have associated code fixes, but they can only be driven from the IDE AFAIK. Here's a tutorial on writing one:
https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/t...
Would jscodeshift work for this? Maybe in conjunction with claude?
"cargo clippy --fix" for Rust, essentially integrated with its linter. It doesn't fix all lints, however.
We maintain a large Go monorepo and every internal API migration turns into a grep+sed adventure. Half the time someone misses edge cases, the other half the sed pattern breaks on multiline. Having an AST-aware rewriter that understands Go's type system is a massive upgrade over regex hacks.
The -diff preview flag also makes this practical for CI. Run go fix -diff, fail if output is non-empty. That alone could replace a bunch of custom linters people maintain.
The real key: there's no ignore comment as with other linters. The most you can do is run a suppress command so that every file gets its current number of violations of each rule recorded in JSON, and then you can only ever decrease those numbers.
Real kudos to the golang team.
The Go team has built such trust with backwards compatibility that improvements like this are exciting, rather than anxiety-inducing.
Compare that with other ecosystems, where APIs are constantly shifting, and everything seems to be @Deprecated or @Experimental.
This tool is way cooler, post-redesign.
We have `go run main.go` as the convention to boot every apps dev environment, with support for multiple work trees, central config management, a pre-migrated database and more. Makes it easy and fast to dev and test many versions of an app at once.
See https://github.com/housecat-inc/cheetah for the shared tool for this.
Then of course `go generate`, `go build`, `go test` and `go vet` are always part of the fast dev and test loop. Excited to add `go fix` into the mix.