A few CPU hardware bugs
55 points
6 hours ago
| 7 comments
| taricorp.net
| HN
nippoo
3 hours ago
[-]
My favourite one of this kind is the Rockchip RK808 RTC, where the engineers thought that November had 31 days, needing a Linux kernel patch to this day that translates between Gregorian and Rockchip calendars (which are gradually diverging over time).

Also one of my favourite kernel patch messages: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

reply
nasretdinov
2 hours ago
[-]
It's always November, isn't it? I've once made a log collection system that had a map of month names to months (had to create it because Go date package didn't support that specific abbreviation for month names).

As you might've guessed, it lacked November, but no one noticed for 4+ months, and I've left the company since. It created a local meme #nolognovember and even got to the public (it was in Russia: https://pikabu.ru/story/no_log_november_10441606)

reply
lsofzz
1 hour ago
[-]
> Rockchip calendars

>.< haha i remember this

reply
IshKebab
2 minutes ago
[-]
> To me, this issue doesn’t seem as embarrassing as Intel’s wrong CPUIDs. Pipelined CPUs are hard to build

I disagree. Misspelling a name in the CPUID is kind of easy to do, somewhat awkward to test (in a non-tautological way), and pretty easy to work around.

Having `mul ...; lw ...;` fail show that they've done very little testing of the chip. Any basic randomised pipeline testing would hit that trivial case.

Essentially all CPUs are pipelined today. In-order pipelined CPU execution semantics are not particularly hard to test. Even some open source testing systems could detect this bug, e.g. TestRig or RISCV-DV.

reply
charcircuit
22 minutes ago
[-]
>The workaround for this is to cripple the system

That is not the workaround in the documentation that was just linked.

  Workarounds:
  The solution to this problem is to put two instructions that do not require write back data after the mul instruction.
This seems reasonable for your compiler vendor to implement without getting rid of multiplication altogether.
reply
Retr0id
4 hours ago
[-]
The GenuineIotel thing fascinates me because I can't fully grasp how it could happen. I can imagine a physical defect causing a permanent wrong-bit in a specific piece of silicon, but it seems more widespread than that. Perhaps some kind of bug in the logic synthesis process?
reply
b1temy
2 hours ago
[-]
> the characters ’n’ and ‘o’ differ by only one bit; an unpredictable error that sets that bit could change GenuineIntel to GenuineIotel.

On a QWERTY keyboard, the O key is also next to the I key. It's also possible someone accidentally fat-fingered "GenuineIontel" , noticed something was off, and moved their cursor between the "o" and "n", and accidentally hit Delete instead of Backspace.

Maybe an unlikely set of circumstances, but I imagine a random bit flip caused at the hardware-level is rare since it might cause other problems, if something more important was bit-flipped.

reply
userbinator
1 hour ago
[-]
I am reminded of the old AMD CPUs with "unlockable" extra cores, which would when unlocked change the model name to something unusual.

"GenuineIotel" is definitely odd, but difficult to research more about; I suspect these CPUs might actually end up being collector's items sometime in the future.

because inserting no-op instructions after them prevents the issue.

The early 386s were extremely buggy and needed the same workaround: https://devblogs.microsoft.com/oldnewthing/20110112-00/?p=11...

reply
pm215
25 minutes ago
[-]
Some of the 386 bugs described there sound to me like the classic kind of "multiple different subsystems interact in the wrong way" issue that can slip through the testing process and get into hardware, like this one:

> For example, there was one bug that manifested itself in incorrect instruction decoding if a conditional branch instruction had just the right sequence of taken/not-taken history, and the branch instruction was followed immediately by a selector load, and one of the first two instructions at the destination of the branch was itself a jump, call, or return.

Even if you write up a comprehensive test plan for the branch predictor, and for selector loads, and so on, it might easily not include that particular corner case. And pre silicon testing is expensive and slow, which also limits how much of it you can do.

reply
6K76981-O
5 hours ago
[-]
Writing software in embedded processor pipelines for bugs in the IT81202 CPU.

Microcode errata re-writes to GPR, compiling low level "mul," and "output," CPU RISC V to system archictecture.

reply