Ask HN: Favourite Assembly Instructions?
2 points
2 hours ago
| 2 comments
| HN
One for the low-level gurus among you: what is/are your favourite assembly-language instructions, accross the ages & instruction sets?

To get this started: for the good 'ol Z80, I'm torn between Decrement-and-Jump-if-Not-Zero (DJNZ), and Decimal Adjust Accumulator (DAA). Rarely used ComPare, Increment & Repeat (CPIR) could be another contender.

On RISC-V, I much like the Set-Less-Than (SLT..) instructions. Great for replacing conditional branches with branchless, sequential math.

Your favourites?

gblargg
1 hour ago
[-]
I always liked rlwimi on PowerPC. It rotates the source n bits, then writes any contiguous section of bits over the corresponding bits in the destination register. This allows copying any bitfield from any position in one register into another. Basically either of these:

  out = (out & ~mask) | (in << shift & mask)
  out = (out & ~mask) | (in >> shift & mask)
Z80's EXX to swap with the shadow registers was interesting (meant for fast interrupt response so you didn't have to save registers to memory).
reply
brucehoult
20 minutes ago
[-]
> rlwimi / rlwinm

Definitely a nice and pretty much pioneering feature on PowerPC in 1994 (and I guess RS/6000 before that, but I never used one).

Today's Arm64 BFM does both those jobs in one, minus the ability to create a split mask via rotating, but plus adding a choice of sign or zero extension to extracted fields (including extracted to the same place they already were, for pure sign/zero extension). As a result it's got about 100 aliases.

It would be nice to have these in RISC-V but they seriously violate the quite strict "Stanford Standard RISC" 2R1W principle that keeps the RISC-V integer pipeline simple (smaller, faster, cheaper).

When working in the "B" extension working group I suggested adopting the M88000 bitfield instructions which follow the 2R1W principle. Someone had an objection to encoding both field width and offset into a single constant (or `Rs2`), though I think it's well worth it. M88k as a 32 bit ISA used 5 bits for each, but 6 bits for each for RV64 fits RISC-V's 12 bit immediates perfectly.

- ext / extu: Extract signed or unsigned bit field from a register. You specify offset (starting bit position) and width. The extracted field is right-justified (shifted to the low bits) in the destination, with sign-extension or zero-extension.

- mak: Make (insert) a bit field. Takes a value, shifts it left by the offset, and inserts it into the destination while clearing the target field first (or combining in specific ways).

- set: Set (force to 1) a contiguous bit field in a register.

- clr: Clear (force to 0) a contiguous bit field in a register.

All take `Rd`, `Rs1` and a field size:offset as either a literal or as `Rs2`.

Unfortunately, the R-type `mak` violates 2R1W because the `Rd` is also a source, which complicates OoO implementations making them 3R1W. RISC-V could use an alternative formulation in which `mak` (or some other name` masks off the source field and shifts it into place, and then the insert is completed using `clr` and `or`.

On the other hand the forms with 12 bit literals are expensive in encoding space, but even including just the `Rs2` versions would be great, especially as often several instructions in a row can use the same field specification, which fits `addi Rd,zero,imm12` (aka `li`) perfectly.

On the gripping hand, while the immediate version of `mak` violates RISC-V convention by making the `Rd` also a source, any real pipeline is going to have fields for all of `Rd`, `Rs1`, `Rs2`, and `imm32` so only the decoder is affected.

Also, `ext` / `extu` are not needed as a pair of C-extension shifts do the same job with the same code size, and can be decoded into a single µop on a higher end CPU if desired.

As an example: take a 10 bit field at offset 21 and insert into a destination at offset 1 (this is part of decoding RISC-V J/JAL instructions).

PowerPC:

    rlwimi  r4, r3, 11, 1, 10
Arm64:

    ubfx   x2, x0, #21, #10      # extract bits[30:21] → low 10 bits of x2 (unsigned)
    bfi    x1, x2, #1, #10       # insert those 10 bits into x1 starting at bit 1
Alternatively, using `bfm` directly without aliases (exactly the same instructions, just trickier to get right)

    bfm    x2, x0, #21, #30
    bfm    x1, x2, #63-1, #9

M88k:

    extu   r3, r1, 21, 10        # extract 10-bit field starting at bit 21 → low bits of r3
    mak    r2, r3, 1, 10         # make/insert the field at bit 1 in destination
RISC-V:

    srli   x12, x10, 21          # shift field down to low bits
    andi   x12, x12, 0x3FF       # mask to 10 bits
    slli   x12, x12, 1           # position at bit 1 (for imm[10:1])
    li     x13, ~0x7FE           # mask to clear bits [10:1] only
    and    x11, x11, x13
    or     x11, x11, x12         # insert the field
RISC-V with some M88k inspiration:

    extui  r3, r1, 21, 10        # extract 10-bit field starting at bit 21 → low bits of r3
    maki   r4, r3, 1, 10         # modified mak: masks + shifts field to bits [10:1] (others 0)
    clri   r2, 1, 10             # clear the target field in destination
    or     r2, r2, r4            # insert the prepared field
Alternatively

    li     t0, (1<<6) | 10       # specification for insertion bit field
    srli   a3, a1, 21            # shift 10-bit field starting at bit 21 → low bits of r3
    mak    a4, a3, t0            # modified mak: masks + shifts field to bits [10:1] (others 0)
    clr    a2, t0                # clear the target field in destination
    or     a2, a2, r4            # insert the prepared field
Alternatively:

    srli   a3, a1, 21
    maki   a2, a3, (1<<6) | 10   # decoder expands to `maki a2, a2, a3, (1<<6) | 10`
Again, this last formulation of `maki` violates RISC-V instruction format convention in making `a2` both src and dst, BUT if the decoder handles that then the expanded form does NOT cause any issues with the pipeline implementation.
reply
absynth
1 hour ago
[-]
HCF - Halt and Catch Fire.
reply