To get this started: for the good 'ol Z80, I'm torn between Decrement-and-Jump-if-Not-Zero (DJNZ), and Decimal Adjust Accumulator (DAA). Rarely used ComPare, Increment & Repeat (CPIR) could be another contender.
On RISC-V, I much like the Set-Less-Than (SLT..) instructions. Great for replacing conditional branches with branchless, sequential math.
Your favourites?
out = (out & ~mask) | (in << shift & mask)
out = (out & ~mask) | (in >> shift & mask)
Z80's EXX to swap with the shadow registers was interesting (meant for fast interrupt response so you didn't have to save registers to memory).Definitely a nice and pretty much pioneering feature on PowerPC in 1994 (and I guess RS/6000 before that, but I never used one).
Today's Arm64 BFM does both those jobs in one, minus the ability to create a split mask via rotating, but plus adding a choice of sign or zero extension to extracted fields (including extracted to the same place they already were, for pure sign/zero extension). As a result it's got about 100 aliases.
It would be nice to have these in RISC-V but they seriously violate the quite strict "Stanford Standard RISC" 2R1W principle that keeps the RISC-V integer pipeline simple (smaller, faster, cheaper).
When working in the "B" extension working group I suggested adopting the M88000 bitfield instructions which follow the 2R1W principle. Someone had an objection to encoding both field width and offset into a single constant (or `Rs2`), though I think it's well worth it. M88k as a 32 bit ISA used 5 bits for each, but 6 bits for each for RV64 fits RISC-V's 12 bit immediates perfectly.
- ext / extu: Extract signed or unsigned bit field from a register. You specify offset (starting bit position) and width. The extracted field is right-justified (shifted to the low bits) in the destination, with sign-extension or zero-extension.
- mak: Make (insert) a bit field. Takes a value, shifts it left by the offset, and inserts it into the destination while clearing the target field first (or combining in specific ways).
- set: Set (force to 1) a contiguous bit field in a register.
- clr: Clear (force to 0) a contiguous bit field in a register.
All take `Rd`, `Rs1` and a field size:offset as either a literal or as `Rs2`.
Unfortunately, the R-type `mak` violates 2R1W because the `Rd` is also a source, which complicates OoO implementations making them 3R1W. RISC-V could use an alternative formulation in which `mak` (or some other name` masks off the source field and shifts it into place, and then the insert is completed using `clr` and `or`.
On the other hand the forms with 12 bit literals are expensive in encoding space, but even including just the `Rs2` versions would be great, especially as often several instructions in a row can use the same field specification, which fits `addi Rd,zero,imm12` (aka `li`) perfectly.
On the gripping hand, while the immediate version of `mak` violates RISC-V convention by making the `Rd` also a source, any real pipeline is going to have fields for all of `Rd`, `Rs1`, `Rs2`, and `imm32` so only the decoder is affected.
Also, `ext` / `extu` are not needed as a pair of C-extension shifts do the same job with the same code size, and can be decoded into a single µop on a higher end CPU if desired.
As an example: take a 10 bit field at offset 21 and insert into a destination at offset 1 (this is part of decoding RISC-V J/JAL instructions).
PowerPC:
rlwimi r4, r3, 11, 1, 10
Arm64: ubfx x2, x0, #21, #10 # extract bits[30:21] → low 10 bits of x2 (unsigned)
bfi x1, x2, #1, #10 # insert those 10 bits into x1 starting at bit 1
Alternatively, using `bfm` directly without aliases (exactly the same instructions, just trickier to get right) bfm x2, x0, #21, #30
bfm x1, x2, #63-1, #9
M88k: extu r3, r1, 21, 10 # extract 10-bit field starting at bit 21 → low bits of r3
mak r2, r3, 1, 10 # make/insert the field at bit 1 in destination
RISC-V: srli x12, x10, 21 # shift field down to low bits
andi x12, x12, 0x3FF # mask to 10 bits
slli x12, x12, 1 # position at bit 1 (for imm[10:1])
li x13, ~0x7FE # mask to clear bits [10:1] only
and x11, x11, x13
or x11, x11, x12 # insert the field
RISC-V with some M88k inspiration: extui r3, r1, 21, 10 # extract 10-bit field starting at bit 21 → low bits of r3
maki r4, r3, 1, 10 # modified mak: masks + shifts field to bits [10:1] (others 0)
clri r2, 1, 10 # clear the target field in destination
or r2, r2, r4 # insert the prepared field
Alternatively li t0, (1<<6) | 10 # specification for insertion bit field
srli a3, a1, 21 # shift 10-bit field starting at bit 21 → low bits of r3
mak a4, a3, t0 # modified mak: masks + shifts field to bits [10:1] (others 0)
clr a2, t0 # clear the target field in destination
or a2, a2, r4 # insert the prepared field
Alternatively: srli a3, a1, 21
maki a2, a3, (1<<6) | 10 # decoder expands to `maki a2, a2, a3, (1<<6) | 10`
Again, this last formulation of `maki` violates RISC-V instruction format convention in making `a2` both src and dst, BUT if the decoder handles that then the expanded form does NOT cause any issues with the pipeline implementation.