Rules to avoid common extended inline assembly mistakes
79 points
9 days ago
| 6 comments
| nullprogram.com
| HN
wyldfire
9 days ago
[-]
> Because it’s so treacherous, the first rule is to avoid it if at all possible. Modern compilers are loaded with intrinsics and built-ins that replace nearly all the old inline assembly use cases.

If you take away anything from this article, it should be at least this. Intrinsics/builtins should be your first approach. Only use inline assembly if you can't express what you need using intrinsics.

reply
bjackman
9 days ago
[-]
I have a fun exception to this!

When writing BPF programs, sometimes it's tricky to get the compiler to generate code that passes the verifier. This can lead you down a path of writing bizarre C in order to try and produce the right order of checks and register allocations.

So, use inline asm!

It's not portable any more... But it's a BPF program! There's nothing to port it to.

It's less readable... Wait no, it's _more_ readable because BPF asm has good syntax and you can avoid the bizarre C backflips to satisfy the verifier.

It's unsafe... Wait no, it's checked for safety at load time!

reply
ryandrake
9 days ago
[-]
For the curious, BPF in this case might mean Berkeley Packet Filter[1]. Not sure. Kind of an obscure acronym but Google search seems to have a consensus.

1: https://en.wikipedia.org/wiki/Berkeley_Packet_Filter

reply
wyldfire
9 days ago
[-]
It does. These days, it's probably eBPF and a popular target is the linux kernel [1].

You can write C hooks for tracing and profiling, etc. - with inline asm!

[1] https://docs.kernel.org/bpf/

reply
AlotOfReading
8 days ago
[-]
It's surprisingly easy to find things the language and intrinsics don't allow you to express when you look closely enough at what the compiler is generating. I recently wrote code that uses inline assembly not to generate instructions, but to confuse the optimizer just enough that it stops breaking the correct instructions it's already generating.
reply
tom_
9 days ago
[-]
It's always been a mystery to me why people put up with this stuff. Adding strings to the assembler output is fine if you want to assemble some unsupported instruction, and a useful getout clause. But as the only option, it sucks, and it's no fun if you want to insert more than 1 instruction.

I used CodeWarrior for PowerPC about 15 years ago, and its inline assembler was very easy to use. No markup required, and you didn't even have to really understand the ABI. Write a C function, add "register" to the parameters, put register variables inside the function, add an asm block inside it, then do your worst. It'd track variable liveness to allocate registers, and rearrange instructions to lengthen dependency chains. Any problems, you'd get an error. Very nice.

reply
mystified5016
8 days ago
[-]
Sometimes you genuinely know better than the compiler. Not often, but sometimes.

I'm an embedded developer and recently for work I had to do inline assembly. The function was there or four lines of C, two nested for loops. The runtime cost of the C is 200-300 cycles.

I rewrote the function in assembly using features particular to my CPU and brought the runtime cost to 90 cycles with no loops.

In this application, the 60% runtime reduction was worth the week it took to engineer.

reply
kccqzy
9 days ago
[-]
I haven't used CodeWarrior for PowerPC, but that approach sounds like it requires the C compiler to understand the assembler instructions you are using. Is it? But most use cases of inline assembler I've seen these days is for using instructions that the compiler will not emit.
reply
tom_
8 days ago
[-]
Yes, but just because the C compiler doesn't emit them, that doesn't mean it can't understand them. If there are already tables for the assembler, specifying acceptable addressing modes and encodings, those can probably be reused - though that might require the designers to think of the toolchain as a cohesive whole. I don't think would automatically be a bad thing.
reply
Conscat
9 days ago
[-]
Raw multi-line R"()" strings in C++ reduce some of the tedium. I wrote myself an Emacs tree sitter pattern to highlight asm syntax nicer than a string normally would, which helps. There is also the stasm library (which I haven't used) that looks like a pleasant syntax. https://github.com/stasinek/stasm

Clang (but not GCC) also supports the MSVC assembly syntax which is derived from Borland inline assembly. Unlike MSVC, Clang supports it in 64-bit mode and also for arm.

reply
astrange
9 days ago
[-]
Most of the time I've used inline assembly it's because the compiler was optimizing something badly. I don't want it to rearrange anything.

(Scheduling is almost useless on modern desktop CPUs anyway, except for some instruction fusion patterns.)

reply
throwaway_1224
9 days ago
[-]
+1, ^5, ditto and Amen for CodeWarrior and its inline asm. CW was way ahead of its time in terms of UX. Its C++ compiler was well above average too, particularly in terms of codegen (although all C++ compilers were effectively broken in that era.)

The only thing that held it back was the lack of scripting. It was probably a rebound rejection of the MPW days, when everything was script-based (and with a crazy custom language.) I remember thinking that the design team probably didn't want to open that Pandora's box, lest scripting might lazily become required and spoil the UX.

Unfortunately, this made CW unsuited to the advent of CI. Even then, I still think it was stupid for Apple not to acquire Metrowerks. The first 5-10 years of Xcode versions had a worse UX and way worse codegen.

reply
fuhsnn
9 days ago
[-]
One of my earliest surprises is: input-only and output-only may be mapped to the same register, and explicitly mapping one of them will not prevent this: https://godbolt.org/z/bo3r749Ge
reply
mst
9 days ago
[-]
> Despite this, please use volatile anyway! When I do not see volatile it’s likely a defect. Stopping to consider if it’s this special case slows understanding and impedes code review.

There are quite a few things I reflexively write where I know that in the specific case I don't actually need to do that, but also know that it'll make the code easier to skim read later.

I hate having my skimming interrupted by "is this the case where this is safe?" type situations.

One can, of course, overdo it - and to what extent it's worth doing it depends on who you expect to be reading it - but it can often be quite handy.

A concrete example:

    this.attrs = { style: {}, ...attrs }
will work fine if the 'attrs' variable is null, because ...<null> is not an error in javascript and expands to nothing ... but I'll still often write

    this.attrs = { style: {}, ...(attrs??{}) }
instead because it makes it clear that the code is allowing the null case deliberately and because it means the person reading it doesn't have to remember that the null case would've worked anyway (and also because my brain finds it weird that the null case does work so it often makes me pause even though I well know it's fine once I stop and think for a second).
reply
smitelli
9 days ago
[-]
Is that vanilla JavaScript or TypeScript? I had always thought that one of the main benefits of TS is that it would probably yell about the first case. (I currently only dabble in the JS world.)
reply
zdragnar
9 days ago
[-]
The spread syntax is native to JavaScript. TS wouldn't complain about the first case, because as the parent said, it is a valid operation.

TS only complains about valid operations if there's some potential mistake due to ambiguity, usually when relying on to strong conversions such as adding a string and an array together or some other nonsense.

reply
mst
8 days ago
[-]
TS would require you to have attrs typed as object|null rather than object before you even got to that line if you wanted to be able to pass a null value to the function/whatever.

It wouldn't complain about the ... usage itself because ...null is well defined and therefore not a type error.

(it would I think complain about trying to apply ... to something that isn't allowed, though I don't recall making that particular mistake yet in TS code to find out)

Though also, trivial example is trivial - I was looking for the simplest thing that was valid code and let me illustrate the general concept.

reply
nubinetwork
9 days ago
[-]
If I'm using assembly, the entire project is assembly... granted, I don't do any low level programming on modern hardware (anything newer than 586)...
reply
brigade
9 days ago
[-]
There aren’t many reasons to write an inline asm block that the compiler will elide because of no apparent effects; more likely you screwed up the constraints. If it’s due to ensuring correct memory accesses relative to the compiler, it’s usually better to define appropriate “m” constraints to give the compiler appropriate visibility, or if it’s complex/loopy enough to make that impossible then that is what the “memory” clobber is for, not volatile.

So I strongly disagree with 2 and 3.

reply