The unexpected effectiveness of one-shot decompilation with Claude
184 points
8 days ago
| 20 comments
| blog.chrislewis.au
| HN
simonw
9 hours ago
[-]
For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.

I also wasn't familiar with this terminology:

> You hand it a function; it tries to match it, and you move on.

In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.

The author's previous post explains this all in a bunch more detail: https://blog.chrislewis.au/using-coding-agents-to-decompile-...

reply
your_sweetpea
1 hour ago
[-]
I'd like to see this given a bit more structure, honestly. What occurs to me is constraining the grammar for LLM inference to ensure valid C89 (or close-to, as much can be checked without compilation), then perhaps experimentally switching to a permuter once/if a certain threshold is reached for accuracy of the decompiled function.

Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.

reply
elitan
8 hours ago
[-]
helpful
reply
saagarjha
10 hours ago
[-]
It's worth noting here that the author came up with a handful of good heuristics to guide Claude and a very specific goal, and the LLM did a good job given those constraints. Most seasoned reverse engineers I know have found similar wins with those in place.

What LLMs are (still?) not good at is one-shot reverse engineering for understanding by a non-expert. If that's your goal, don't blindly use an LLM. People already know that you getting an LLM to write prose or code is bad, but it's worth remembering that doing this for decompilation is even harder :)

reply
zdware
8 hours ago
[-]
Agree with this. I'm a software engineer that has mostly not had to manage memory for most of my career.

I asked Opus how hard it would be to port the script extender for Baldurs Gate 3 from Windows to the native Linux Build. It outlined that it would be very difficult for someone without reverse engineering experience, and correctly pointed out they are using different compilers, so it's not a simple mapping exercise. It's recommendation was not to try unless I was a Ghrida master and had lots of time in my hands.

reply
dimitri-vs
8 hours ago
[-]
FWIW most LLMs are pretty terrible at estimating complexity. If you've used Claude Code for any length of time you might be familiar with it's plan "timelines" which always span many days but for medium size projects get implemented in about an hour.

I've had CC build semi-complex Tauri, PyQT6, Rust and SvelteKit apps for me without me having ever touched that language. Is the code quality good? Probably not. But all those apps were local-only tools or had less than 10 users so it doesn't matter.

reply
zdware
7 hours ago
[-]
That's fair, I've had similar experiences working in other stacks with it. And with some niche stacks, it seems to struggle more. Definitely agree the more narrow the context/problem statement, higher chance of success.

For this project, it described its reasoning well, and knowing my own skillset, and surface level info on how one would start this, it had many good points that made the project not realistic for me.

reply
hobs
5 hours ago
[-]
Disagree - the timelines are completely reasonable for an actual software project, and that's what the training data is based on, not projects written with LLMs.
reply
theturtle32
4 hours ago
[-]
Yes, this is my experience as well.
reply
ph4evers
10 hours ago
[-]
Are they not performing well because they are trained to be more generic, or is the task too complex? It seems like a cheap problem to fine-tune.
reply
motoboi
4 hours ago
[-]
The knowledge probably is o the pre-training data (the internet documenta the LLM is trained at to get a good grasp), but probably very poorly represented in the reinforcement learning phase.

Which is to say that probably antropic don’t have good training documents and evals to teach the model how to do that.

Well they didn’t. But now they have some.

If the author want to improve his efficiency even more, I’d suggest he starts creating tools that allow a human to create a text trace of a good run on decompilating this project.

Those traces can be hosted in a place Antropic can see and then after the next model pre-training there will be a good chance the model become even better at this task.

reply
pixl97
9 hours ago
[-]
Sounds like a more agentic pipeline task. Decompile, assess, explain.
reply
t_mann
8 hours ago
[-]
> The ‘give up after ten attempts’ threshold aims to prevent Claude from wasting tokens when further progress is unlikely. It was only partially successful, as Claude would still sometimes make dozens of attempts.

Not what I would have expected from a 'one-shot'. Maybe self-supervised would be a more suitable term?

reply
voiper1
4 hours ago
[-]
I definitely didn't expect one-shot to mean "let it run itself in an indefinite loop"
reply
wavemode
7 hours ago
[-]
"one-shot" usually just means, one example and its correct answer was provided in the prompt.

See also, "zero-shot" / "few-shot" etc.

reply
t_mann
2 hours ago
[-]
The article says that having decompiled some functions helps with decompiling others, so it seems like more than one example could be provided in the context. I think the OP was referring to the fact that only a single prompt created by a human was used. But then it goes off into what appears to be an agentic loop with no hard stopping conditions outside of what the agent decides.

We're essentially trying to map 'traditional' ML terminology to LLMs, it's natural that it'll take some time to get settled. I just thought that one-shot isn't an ideal name for something that might go off into an arbitrarily long loop.

reply
simonw
6 hours ago
[-]
I've seen one-shot used to mean two different things in LLMs:

1. Getting an LLM to do something based on a single example

2. Getting an LLM to achieve a goal from a single prompt with no follow-ups

I think both are equally valid.

reply
baq
4 hours ago
[-]
One-shot as in ‘given one example’ is the ML term. One-shot as in ‘in a single prompt’ is the colloquial meaning. Both are useful, but it can be confusing when discussing LLMs in ML topics.
reply
johnfn
5 hours ago
[-]
One shot just means one prompt. What Claude decides to do during that prompt is up to it.
reply
hombre_fatal
7 hours ago
[-]
Meh, the main idea of one-shot is that you prompted it once and got a good impl when it decided it was done. As opposed to having to workshop yourself with additional prompts to fix things.

It doesn't do it in one-shot on the GPU either. It feeds outputs back into inputs over and over. By the time you see tokens as an end-user, the clanker has already made a bunch of iterations.

reply
rlili
11 hours ago
[-]
Makes me wonder if decompilation could eventually become so trivial that everything would become de-facto open source.
reply
jasonjmcghee
9 hours ago
[-]
It would be "source available", if anything, not "open source".

> An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified or shared (with or without modification) under defined terms and conditions.

https://en.wikipedia.org/wiki/Open_source

Companies have been really abusing what open source means- claiming something is "open source" cause they share the code and then having a license that says you can't use any part of it in any way.

Similarly if you ever use that software or depending on where you downloaded it from, you might have agreed not to decompile or read the source code. Using that code is a gamble.

reply
mkatx
6 hours ago
[-]
So instead of reverse engineering.. an llm/agent/whatever could simply produce custom apps for everyone, simply implementing the features an individual might want. A more viable path?
reply
DrNosferatu
7 hours ago
[-]
But, for example, isn't Cannonball (SEGA Outrun source port) open source?

https://github.com/djyt/cannonball

reply
jasonjmcghee
7 hours ago
[-]
No it is not. There is no license in that repository.

Relevant: https://github.com/orgs/community/discussions/82431

> When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.

https://choosealicense.com/no-permission/

reply
sa1
9 hours ago
[-]
But clean room reverse engineered code can have its own license, no?
reply
comex
47 minutes ago
[-]
If we're talking about actual clean-room reverse engineering where only the overall design or spec is copied and not the specific code, then yes. In this process, one person would decompile the original and turn it into a human-readable spec, and another person would write their own implementation. But the decompiled code itself is never distributed.

That's very different from the decompilation projects being discussed here, which do distribute the decompiled code.

These decompilation projects do involve some creative choices, which means that the decompilation would likely be considered a derivative work, containing copyrightable elements from both the authors of the original binary and the authors of the decompilation project. This is similar to a human translation of a literary work. A derivative work does have its own copyright, but distributing a derivative work requires permission from the copyright holders of both the original and the derivative. So a decompilation project technically can set their own license, and thereby add additional restrictions, but they can't overwrite the original license. If there is no original license, the default is that you can't distribute at all.

reply
vunderba
6 hours ago
[-]
In fact, the story of how Atari tried to circumvent the lockout chip on the original NES is a good example of this.

They had gotten surprisingly close to a complete decompilation, but then they tried to request a copy of the source code from the copyright office citing that they needed it as a result of ongoing unrelated litigation with Nintendo.

Later on this killed them in court.

reply
simonw
9 hours ago
[-]
Yeah, I think it can. I'm reminded of the thing in the 80s when Compaq reverse engineered and reimplemented the IBM BIOS by having one team decompile it and write a spec which they handed to a separate team who built a new implementation based on the spec.

I expect that for games the more important piece will be the art assets - like how the Quake game engine was open source but you still needed to buy a copy of the game in order to use the textures.

reply
yieldcrv
7 hours ago
[-]
Open source never meant free to begin with and was never software specific, that’s a colloquialism and I’d love to say “language evolves” in favor of the software community’s use but open source is used in other still similar contexts, specifically legal and public policy ones

FOSS specifically means/meant free and open source software, the free and software words are there for a reason

so we don’t need another distinction like “source available” that people need to understand to convey an already shared concept

yes, companies abuse their community’s interest in something by blending open source legal term as a marketing term

reply
jasonjmcghee
7 hours ago
[-]
Whether or not something is "free" is a separate matter and subject to how the software is licensed. If there is no license it is, by definition "source available", not open source. "source available" is not some new distinction I'm making up.

See my other comment: https://news.ycombinator.com/item?id=46175760

reply
viraptor
5 hours ago
[-]
This is not a space for "language evolves". Open source has very specific definitions and the distinctions there matter for legal purposes https://opensource.org/licenses
reply
yieldcrv
5 hours ago
[-]
the software community is the one trying to evolve the language in favor of this software license specific use case
reply
TheAceOfHearts
25 minutes ago
[-]
If progress continues, someday it'll be possible to generate the source code for any binary and make a native port to any other platform. Some companies might be upset, but it'll be a huge boon for game and software preservation.
reply
VikingCoder
9 hours ago
[-]
I wonder when you're never going to run expensive software on your own CPU.

It'll either all be in the cloud, so you never run the code...

Or it'll be on a chip, in a hermetically sealed usb drive, that you plug in to your computer.

reply
jonhohle
3 hours ago
[-]
That runs into copyright issues. As someone who does a reasonable amount of decompilation, I wouldn’t ever use an LLM. It falls too close to mechanical transformation territory which is not protected, fair use.

Obviously others aren’t concerned or don’t live in jurisdictions where that would be an issue.

reply
johnfn
5 hours ago
[-]
Surely then people start using LLMs to obfuscate compiled source to the point that another LLM can’t deobfuscate it. I imagine it’s always easier to make something messy than clean. Something like a rule of thermodynamics or something :)

Though, that’s only for actively developer software. I can imagine a great future where all retro games are now source available.

reply
tuhgdetzhh
5 hours ago
[-]
But on the other hand, at the current speed of LLM progression, a game that might have been obfuscated with the help of Opus 4.5 might in two years be decompiled within hours by Opus 6.5.
reply
tcdent
9 hours ago
[-]
That's definitely a possible future abstraction and one are about the future of technology I'm excited about.

First we get to tackle all of the small ideas and side projects we haven't had time to prioritize.

Then, we start taking ownership of all of the software systems that we interact with on a daily basis; hacking in modifications and reverse engineering protocols to suit our needs.

Finally our own interaction with software becomes entirely boutique: operating systems, firmware, user interfaces that we have directed ourselves to suit our individual tastes.

reply
anabis
3 hours ago
[-]
Would some sparks fly when easy decompile of MSOffice and Photoshop are available, I wonder.
reply
DrNosferatu
9 hours ago
[-]
This day will arrive.

And it will be great for retro game preservation.

Having more integrated tools and tutorials on this would be awesome.

reply
js8
9 hours ago
[-]
Yes, I believe it will. What I predict will happen is that most commercial software will be hosted and provided through "trusted" platforms with limited access, making reverse engineering impossible.
reply
Aeolun
9 hours ago
[-]
When the decompilation like that is trivial, so is recreation without decompilation. It implies the LLM know exactly how thins work.
reply
Xmd5a
10 hours ago
[-]
This deserves a discussion
reply
ronsor
10 hours ago
[-]
I've used LLMs to help with decompilation since the original release of GPT-4. They're excellent at recognizing the purpose of functions and refactoring IDA or Ghidra pseudo-C into readable code.
reply
galangalalgol
10 hours ago
[-]
How does it do on things that were originally written in assembly?
reply
saagarjha
10 hours ago
[-]
This is typically easier because the code was written for humans already.
reply
euroderf
10 hours ago
[-]
Someone please try this on an original (early 1980s) IBM-PC BIOS.
reply
tadfisher
4 hours ago
[-]
I don't believe that was written in a compiled language, so any old 8086 disassembler should suffice. I would love to see what comments an LLM adds to the assembly code, though.
reply
mh-
6 hours ago
[-]
Got a bin?
reply
stevemk14ebr
9 hours ago
[-]
We're very far away from this.
reply
reactordev
4 hours ago
[-]
I’ve been having fun sending Claude down the old school MUD route, giving it access to a SMAUG derivative and once it’s mastered the play, give it admin powers to create new play experiences.

I stayed away from decompilation and reverse engineering, for legal reasons.

Claude is amazing. It can sometimes get stuck in a reason loop but will break away, reassess, and continue on until it finds its way.

Claude was murdered in a dark instance dungeon when it managed to defeat the dragon but ran out of lamp oil and torches to find its way out. Because of the light system it kept getting “You can’t seem to see anything in the darkness” and randomly walked into a skeleton lair.

Super fun to watch from an observer. Super terrifying that this will replace us at the office.

reply
ACCount37
12 hours ago
[-]
If you aren't using LLMs for your reverse engineering tasks, you're missing out, big time. Claude kicks ass.

It's good at cleaning up decompiled code, at figuring out what functions do, at uncovering weird assembly tricks and more.

reply
keepamovin
10 hours ago
[-]
The article is a useful resource for setting up automated flows, and Claude is great at assembly. Codex less so, Gemini is also good at assembly. Gemini will happily hand roll x86_64 bytecode. Codex appears optimized for more "mainstream" dev tasks, and excels at that. If only Gemini had a great agent...
reply
xnx
5 hours ago
[-]
Is Gemini CLI not a good agent?
reply
skerit
10 hours ago
[-]
I've been using Claude for months with Ghidra. It is simply amazing.
reply
djmips
2 hours ago
[-]
What's your workflow? Are you mainly going after x86 targets? Are you using a plugin?
reply
amelius
11 hours ago
[-]
Makes sense because LLMs are quite good at translating between natural languages.

Anyway, we're reaching the point where documentation can be generated by LLMs and this is great news for developers.

reply
saagarjha
10 hours ago
[-]
Documentation is one place where humans should have input. If an LLM can generate documentation, why would I want you to generate it when I can do so myself (probably with a better, newer model)?
reply
simonw
8 hours ago
[-]
I definitely want documentation that a project expert has reviewed. I've found LLMs are fantastic at writing documentation about how something works, but they have a nasty tendency to take guesses at WHY - you'll get occasional sentences like "This improves the efficiency of the system".

I don't want invented rationales for changes, I want to know the actual reason a developer decided that the code should work that way.

reply
ACCount37
8 hours ago
[-]
That's great if those humans are around to have that input.

Not so much when you have a lot of code from 6 years ago, built around an obscure SDK, and you have to figure out how it works, and the documentation is both incredibly sparse and in Chinese.

reply
amelius
8 hours ago
[-]
Because it takes time and effort to write documentation.

If people __can__ actually read undocumented code with the help of LLMs, why do you need human-written documentation really?

reply
gr4vityWall
3 hours ago
[-]
It doesn't need to be written by a human only, but I think generating it once and distributing it with source code is more efficient. Developers can correct errors in the generated documentation, which then can be used by humans and LLMs.
reply
baq
4 hours ago
[-]
Docs are a form of error correcting coding for code. Docs+code allows you to spot discrepancies and ask which one is the intended behavior.
reply
james_marks
10 hours ago
[-]
I stumbled across a fun trick this week. After making some API changes, I had CC “write a note to the FE team with the changes”.

I then pasted this to another CC instance running the FE app, and it made the counter part.

Yes, I could have CC running against both repos and sometimes do, but I often run separate instances when tasks are complex.

reply
monsieurbanana
11 hours ago
[-]
Maybe documentation meant for other llms to ingest. Their documentation is like their code, it might work, but I don't want to have to be the one to read it.

Although of course if you don't vibe document but instead just use them as a tool, with significant human input, then yes go ahead.

reply
dunham
9 hours ago
[-]
Although with code it's implementing functions that don't exist yet and with documentation, it's describing functions that don't exist yet.
reply
knackers
8 days ago
[-]
I've been experimenting with running Claude in headless mode + a continuous loop to decompile N64 functions and the results have been pretty incredible. (This is despite already using Claude in my decompilation workflow).

I hope that others find this similarly useful.

reply
djmips
2 hours ago
[-]
Thanks, this is very cool! I've started to dip my toes into this and it's good to see it has potential.
reply
viraptor
6 hours ago
[-]
One thing I don't annoying in really old sources is that sometimes you can't go function by function, because the code will occasionally just use a random register to pass results. Passing the whole file works better at that point.
reply
plastic-enjoyer
11 hours ago
[-]
This sounds interesting! Do you have some good introduction to N64 decompiliation? Would you recommend using Claude right from the start or rather try to get to know the ins and outs of N64 decomp?
reply
garrettjoecox
12 hours ago
[-]
What game are you working on?
reply
wk_end
11 hours ago
[-]
Last sentence of the first paragraph says it’s Snowboard Kids 2.
reply
rat9988
11 hours ago
[-]
For his defense, it is missing a "Tell HN"
reply
dpkirchner
10 hours ago
[-]
And it isn't always obvious when the commenter is the submitter (no [S] tag like you see on other sites).
reply
garrettjoecox
10 hours ago
[-]
whoops, I did indeed miss that this was OP
reply
turnsout
10 hours ago
[-]
This is super cool! I would be curious to see how Gemini 3 fares… I've found it to be even more effective than Opus 4.5 at technical analysis (in another domain).
reply
Nevermark
4 hours ago
[-]
There are quite a few comments here on code obfuscation.

The hardest form of code obfuscation is called homomorphic computing, which is code transformed to act on encrypted data isomorphically to regular code on regular data. The homomorphic code is hard obfuscated by this transformation.

Now create a homomorphic virtual machine, that operates on encrypted code over encrypted data. Very hard to understand.

Now add data encryption/decryption algorithms, both homomorphically encrypted to be run by the virtual machine, to prepare and recover inputs, outputs or effects of any data or event information, for the homomorphic application code. Now that all data within the system is encrypted by means which are hard obfuscated, running on code which is hard obfuscated, the entire system becomes hard^2 (not a formal measure) opaque.

This isn't realistic in practice. Homomorphic implementations of even simple functions are extremely inefficient for the time being. But it is possible, and improvements in efficiency have not been exhausted.

Equivalent but different implementations of homomorphic code can obviously be made. However, given the only credible explanations for design decisions of the new code are, to exactly match the original code, this precludes any "clean room" defenses.

--

Implementing software with neural network models wouldn't stop replication, but would decompile as source that was clearly not developed independent from the original implementation.

Even distilling (training a new model on the "decompiled" model) would be dead giveaway that it was derived directly from the source, not a clean room implementation.

--

I have wondered, if quantum computing wouldn't enable an efficient version of homomorphic computing over classical data.

Just some wild thoughts.

reply
YesBox
2 hours ago
[-]
Im a encryption noob. Less than a noob. But something I've been wondering about is how can homomorphic computing be opaque/unencryptable?

If you are able to monitor what happens to encrypted data being processed by an LLM, could you not match that with the same patterns created by unencrypted data?

Real simple example, let's say I have a program that sums numbers. One sends the data to an LLM or w/e unencrypted, the other encrypted.

Wouldn't the same part of the LLM/compute machine "light up" so to speak?

reply
viraptor
6 hours ago
[-]
Yeah, it works great for porting as well. I tried it on the assembler sources of Prince of Persia for Apple ii and went from nothing to basics being playable (with a few bugs but still) on modern Mac with SDL graphics within a day.
reply
djmips
2 hours ago
[-]
That's impressive. Did you convert POP from 6502 to C?
reply
viraptor
1 hour ago
[-]
Yup. Still fighting some collision bugs, but it mostly works. I'll post it when it's complete. What I actually wanted to do is try to put fluid movement into it - something closer to Dead Cells, just for fun to see how it would change the feel of it.
reply
benmccann
8 hours ago
[-]
I used Gemini to compare the minimized output of the Rollup vs Rolldown JavaScript bundlers to find locations where the latter was not yet at the same degree of optimization. It was astoundingly good and I'm not sure how I would have been able to accomplish the task without an LLM as an available tool.
reply
sehugg
7 hours ago
[-]
I ran Node with --print-opt-code and had Opus look at Turbofan's output. It was able to add comments to the JIT'ed code and give suggestions on how to improve the JavaScript for better optimization.
reply
wiz21c
4 hours ago
[-]
Last day I asked Claude to estimate a loop of a dozen 6502 instructions. It failed but his estimate was not bad at all. Amazing!
reply
grim_io
4 hours ago
[-]
I need to try using a frontier LLM for deobfuscation. That's a huge pain in the ass for a noob like me.
reply
butz
10 hours ago
[-]
Are there any similar specialized decompilation LLM models available to be used locally?
reply
heavyset_go
6 hours ago
[-]
Am I just wrong in thinking doing decompilation of copyrighted code via the cloud is a bad idea?

Like, if it ever leaks, or you were planning on releasing it, literally every step you took in your crime is uploaded to the cloud ready to send you to prison.

It's what's stopped me from using hosted LLMs for DMCA-legal RE. All it takes is for a prosecutor/attorney to spin a narrative based on uploaded evidence and your ass is in court.

reply
Juliate
6 hours ago
[-]
It wouldn't fit most of the current LLM cloud providers narrative about privacy and copyright either, so, not sure they would be as cooperative with a prosecutor as they are today with lawmakers and right holders.
reply
xnx
5 hours ago
[-]
Great use case. Curious to see how Gemini fares when tested.
reply
DrNosferatu
8 hours ago
[-]
More than an overview, a step by step tutorial on this would be awesome!
reply
VikingCoder
9 hours ago
[-]
I've been waiting for decompilation to show up in this space.
reply
jamesbelchamber
11 hours ago
[-]
This is a refreshingly practical demonstration of an LLM adding value. More of this please.
reply
knallfrosch
3 hours ago
[-]
We're wasting Energy reverse-engineering code, which, by definition, already exists now. Oh god.

Have you tried asking them to simply open source the code?

reply
looperhacks
3 hours ago
[-]
Have you ever tried to get a game developer to open source a game? And a Japanese one at that?

Even if they were willing to (they're not) and if they still have the code (they don't), it will contain proprietary code from Nintendo and you'll never get your hands on that (legally)

reply
djmips
2 hours ago
[-]
from 1999! Plus they probably don't even have the source anymore! A lot of game companies just never kept it!
reply