FilterHN

[x86] AI Compute Extensions (ACE) Specification

18 points

by matt_d

1 hour ago

| past

| 3 comments

| x86ecosystem.org

| HN

▲

dgoldstein0

53 minutes ago

[-]

So how does this differ from available sse / avx instructions already in most x64 machines?

▲

anematode

46 minutes ago

[-]

One thing that stuck out to me is that deals with a lot more data formats, in particular, low-precision formats like FP4, FP6 and FP8. Manipulating those formats can take a lot of annoying effort; in general, x86 (until AVX-512, at least) has unconvincing support for so-called "lane-crossing" instructions that move data across 16-byte boundaries within a vector. So you can imagine unpacking, e.g., tightly packed 7-bit data to 8-bit data is a real slog.

I can already immediately think of a use case for vunpackb in some of the stuff I'm working on, where we'd like to efficiently unpack weights from the high half of a vector.

Separately, adding all signed–unsigned variants of the VNNI dot product instructions is a welcome (albeit niche) change. There was an annoying divergence here between major ISAs: x86 added vpdpbusd which computed a dot product between u8 and i8, while ARM added vdotq, which computes a dot product either between u8 and u8 elements, or i8 and i8. So for broad compatibility, you generally had to restrict one of your inputs to [0,127]. This difference shows in the design of (for example) WASM relaxed SIMD, where the result of wasm.dot.i8x16.i7x16.add.signed is implementation-defined if you exceed the [0,127] range. ARM later added mixed-sign variants, and now x86 consummates it.

▲

dmitrygr

48 minutes ago

[-]

this also adds new registers to operate on (more state) - 1KB more state at least (512b x 16)

▲

sorenjan

39 minutes ago

[-]

AVX512 isn't available on most new CPUs, I'm guessing ACE will only be available on server CPUs for at least a couple of years at launch?

▲

deadmutex

4 minutes ago

[-]

> AVX512 isn't available on most new CPUs

Please define new. Also, I think AMD uses very similar cores in server and client. So, disabling AVX512 may be an Intel thing (my guess is that they can easily move threads between E & P cores).

▲

BobbyTables2

25 minutes ago

[-]

Thank $ALL_DIETIES that the TCG wasn’t involved!