[x86] AI Compute Extensions (ACE) Specification
18 points
1 hour ago
| 3 comments
| x86ecosystem.org
| HN
dgoldstein0
53 minutes ago
[-]
So how does this differ from available sse / avx instructions already in most x64 machines?
reply
anematode
46 minutes ago
[-]
One thing that stuck out to me is that deals with a lot more data formats, in particular, low-precision formats like FP4, FP6 and FP8. Manipulating those formats can take a lot of annoying effort; in general, x86 (until AVX-512, at least) has unconvincing support for so-called "lane-crossing" instructions that move data across 16-byte boundaries within a vector. So you can imagine unpacking, e.g., tightly packed 7-bit data to 8-bit data is a real slog.

I can already immediately think of a use case for vunpackb in some of the stuff I'm working on, where we'd like to efficiently unpack weights from the high half of a vector.

Separately, adding all signed–unsigned variants of the VNNI dot product instructions is a welcome (albeit niche) change. There was an annoying divergence here between major ISAs: x86 added vpdpbusd which computed a dot product between u8 and i8, while ARM added vdotq, which computes a dot product either between u8 and u8 elements, or i8 and i8. So for broad compatibility, you generally had to restrict one of your inputs to [0,127]. This difference shows in the design of (for example) WASM relaxed SIMD, where the result of wasm.dot.i8x16.i7x16.add.signed is implementation-defined if you exceed the [0,127] range. ARM later added mixed-sign variants, and now x86 consummates it.

reply
dmitrygr
48 minutes ago
[-]
this also adds new registers to operate on (more state) - 1KB more state at least (512b x 16)
reply
sorenjan
39 minutes ago
[-]
AVX512 isn't available on most new CPUs, I'm guessing ACE will only be available on server CPUs for at least a couple of years at launch?
reply
deadmutex
4 minutes ago
[-]
> AVX512 isn't available on most new CPUs

Please define new. Also, I think AMD uses very similar cores in server and client. So, disabling AVX512 may be an Intel thing (my guess is that they can easily move threads between E & P cores).

reply
BobbyTables2
25 minutes ago
[-]
Thank $ALL_DIETIES that the TCG wasn’t involved!
reply