(NOT A) OR ((NOT B) XOR (C AND A))
then you simply write ~_MM_TERNLOG_A | (~_MM_TERNLOG_B ^ (_MM_TERNLOG_C & _MM_TERNLOG_A))
Literally the expression you want to calculate. It evaluates to immediate from _MM_TERNLOG_A/B/C constants defined in intrinsic headers, at least for gcc & clang: typedef enum {
_MM_TERNLOG_A = 0xF0,
_MM_TERNLOG_B = 0xCC,
_MM_TERNLOG_C = 0xAA
} _MM_TERNLOG_ENUM;
For MSVC you define them yourself. A = 0b11110000
B = 0b11001100
C = 0b10101010
That said, the trick for turning four or more argument bitwise operations into a series of vpternlogd operations has yet to be posted
For five or more arguments, this naturally extends into a tree, though it likely isn't the most efficient encoding.
In the end I did what pretty much everyone else did, Found the BLTCON0 for Bobs and straight copies and then pretended I newer saw the thing.
I did however get an A+ in computational logic at university years later, so maybe some of the trauma turned out to be beneficial.
OTOH, it doesn’t matter what assumptions we make, right? Words and phrases often have multiple meanings. The word ternary means composed of three parts, and that fits here.
Ternary comes from the Latin root word ternarius (composed of three things) which contains the root word ter- or tern- which is an adverb variation of the original root terti- meaning 3rd. Same with binary, it comes from the adverb variation bi- or bin- meaning second. Bi- being derived from the original Greek dy- or di-. However, ternary is more correct than trinary because in the binary, we are not using the Latin root for the cardinal two, duo-. We use the adverb version of two, bi-. Trinary just sounds better to some people because tri- and bi- rhyme. For a numerical system composed of four components, we use quarternary which also uses the adverb form of 4, quarter(n)-, instead of the cardinal form, quadr(i)-.
Personal opinion, saying trinary makes you sound like you don't have any formal education.
You’re right; ternary is totally valid and does seem more common now that I look it up. Trinary is a valid synonym, and in the dictionary, and mentioned in the Wikipedia for ternary, and was in fact used by many people during my formal education. Your personal opinion sounds like it’s rather presumptuous, not correct, and designed to start a fight. You have the right to keep it to yourself… or change your mind.
In C, '+' is a binary operator because it accepts two inputs. '?:' is a ternary operator because it accepts three inputs. It is usually referred to as the ternary operator because it is unique in C, but there's nothing fundamental about that.
vpternlogd implements all bitwise ternary operators - those operators that accept three inputs.
I remember there are names for some of the codes like BLACKNESS for producing black whatever the inputs are, COPY (or something like that) to just copy the source to the destination etc. I always thought BLACKNESS and WHITENESS had a kind of poetic ring to them.
As far as I know, I think this is from Petzold, it's implemented in software but the opcode is actually converted to custom assembly inside the function when you call it, a rare example of self-modifying code in the Windows operating system.
Neither is typically done in practice except for specialized purposes like FPGAs and the instruction described in this article. High-speed registers and static RAM are sometimes built with logic but it's more common to build them directly with transistors than with gates.
http://www.righto.com/2017/01/die-photos-and-reverse-enginee...
The page in Mapping the Amiga: https://archive.org/details/1993-thomson-randy-rhett-anderso...
To take a related concept further, it would be nice if there were totally unportable, chip-superspecific ways of feeding uops directly, particularly with raw access to the unrenamed register file.
Say you have an inner loop, and a chip is popular. Let your compiler take a swing at it. If it's way faster than the ISA translation, add a special case to the fat binary for a single function.
Alas, it will probably never happen due to security, integrity, and testing costs.
One issue (without further architectural help) is that you'd save and restore every register you touch, which is fine for an inner loop that's consequential enough. Another obstacle is any hardware that's shared among cores, such as memory coherency.
In general, moving super-ad-hoc memory management into the compiler would suck for the compiler writers. Maybe some of your compiler can be a LLM?
If performance is key, I would totally deploy this if I have faith in my tests - and maybe a way for the user to turn it off just in case.
I like the idea LLM assistss in the process. not because llms can reason better than compiler, but because with this particular translation task, it should excel at. perhaps we gonna see some new compiler in 2025... there is some writing of it here https://arxiv.org/abs/2408.03408 and there is this https://medium.com/@zergtant/meta-releases-llm-compiler-a-co... both from 2024. Model is on HF.
In a weird sense it kind of helped me feel that, yes, I would probably understand stuff better if I tried re-learning the Amiga hardware today and also like I got a bit of it for free already! Is there such a thing as being protected from a nerd snipe? "This article was my nerd trench" ... or something. Thanks! :)
movei (%r1),(%r2),(%r3),value
Move the contents of memory pointed to by r1, to the contents of memory pointed to by r2, applying the boolean operator <value>, with the memory pointed to by r3. Then increment all three registers by 4 to point to the next word. There was something similar to this in the Intel 82786 graphics chip which had a sort of minimal cpu part that could run simple "programs".And yeah, I really enjoyed the blitter on the Amiga. It was a really cool bit of hardware.
If it's an immediate then the compiler (human or machine) knows what the operation will be, so could just write that instead?
RISC-V already has what you want in terms of R-type instructions[1], ie "dest = src1 op src2" where "op" is effectively an immediate value, but being based on a load-store architecture it's only register-to-register of course.
Though I suppose ISA-wise there's nothing in the way of making an extension that introduces "M-type instructions" which act like the R-type instructions but are mem-to-mem instead. How much that messes with everything else I have no idea.
edit: ah, forgot about you wanting it to behave like movsb. Still, something that could be handled by the extension instruction.
[1]: https://github.com/riscv/riscv-isa-manual/releases/download/... (section 2.4.2)
As a fun exercise, you can do this for all 2-bit -> 1-bit functions. There's only 16 of them, and most of them have very well known names like "and" (LUT 1000) or "xor" (LUT 0110). Some of them don't depend on some of the inputs (eg. LUT 1100 / 1010 which is "return A" and "return B" respectively) or even any of them (eg. LUT 0000 which always returns 0).
https://www.youtube.com/watch?v=BA12Z7gQ4P0 (ben eater)
I mean it is obvious in retrospect, sort of along the lines of memoizing a function. but it was mind blowing when I first saw that.
Not that at the hardware level memory is actually any simpler than whatever boolean logic it is pretending to be, but it feels simpler and is easily available in large quantities.
So many super-clever instructions are next to impossible for compilers to automatically use.
* https://devblogs.microsoft.com/dotnet/performance-improvemen...
That is super normal logical calculus that any worthwhile CS degree teaches about.
Granted, probably not what a teenager without access to a BBS, or Aminet, would be able to figure out.
(Yes, I know, finite input domain etc.)
Come on, vpternlog* is not obscure. It subsumes _all_ bitwise instructions, even loading the constant (-1) into a register.