Athlon 64: How AMD turned the tables on Intel
328 points
3 days ago
| 23 comments
| dfarq.homeip.net
| HN
ndiddy
2 days ago
[-]
Fun fact: Bob Colwell (chief architect of the Pentium Pro through Pentium 4) recently revealed that the Pentium 4 had its own 64-bit extension to x86 that would have beaten AMD64 to market by several years, but management forced him to disable it because they were worried that it would cannibalize IA64 sales.

> Intel’s Pentium 4 had our own internal version of x86–64. But you could not use it: we were forced to “fuse it off”, meaning that even though the functionality was in there, it could not be exercised by a user. This was a marketing decision by Intel — they believed, probably rightly, that bringing out a new 64-bit feature in the x86 would be perceived as betting against their own native-64-bit Itanium, and might well severely damage Itanium’s chances. I was told, not once, but twice, that if I “didn’t stop yammering about the need to go 64-bits in x86 I’d be fired on the spot” and was directly ordered to take out that 64-bit stuff.

https://www.quora.com/How-was-AMD-able-to-beat-Intel-in-deli...

reply
kimixa
2 days ago
[-]
That's no guarantee it would succeed though - AMD64 also cleaned up a number of warts on the x86 architecture, like more registers.

While I suspect the Intel equivalent would do similar things, simply from being a big enough break it's an obvious thing to do, there's no guarantee it wouldn't be worse than AMD64. But I guess it could also be "better" from a retrospective perspective.

And also remember at the time the Pentium 4 was very much struggling to get the advertised performance. One could argue that one of the major reasons that the AMD64 ISA took off is that the devices that first supported it were (generally) superior even in 32-bit mode.

EDIT: And I'm surprised it got as far as silicon. AMD64 was "announced" and the spec released before the pentium 4 was even released, over 3 years before the first AMD implementations could be purchased. I guess Intel thought they didn't "need" to be public about it? And the AMD64 extensions cost a rather non-trivial amount of silicon and engineering effort to implement - did the plan for Itanium change late enough in the P4 design that it couldn't be removed? Or perhaps this all implies it was a much less far-reaching (And so less costly) design?

reply
ghaff
2 days ago
[-]
As someone who followed IA64/Itanium pretty closely, it's still not clear to me the degree to which Intel (or at least groups within Intel) thought IA64 was a genuinely better approach and the degree to which Intel (or at least groups within Intel) simply wanted to get out from existing cross-licensing deals with AMD and others. There were certainly also existing constraints imposed by partnerships, notably with Microsoft.
reply
ajross
2 days ago
[-]
Both are likely true. It's easy to wave it away in hindsight, but there was genuine energy and excitement about the architecture in its early days. And while the first chips were late and on behind-the-cutting-edge processes they were actually very performant (FPU numbers were world-beating, even -- parallel VLIW dispatch really helped here).

Lots of people loved Itanium and wanted to see it succeed. But surely the business folks had their own ideas too.

reply
ccgreg
2 days ago
[-]
> they were actually very performant

Insanely expensive for that performance. I was the architect of HPC clusters in that era, and Itanic never made it to the top for price per performance.

Also, having lived through the software stack issues with the first beta chips of Itanic and AMD64 (and MIPS64, but who's counting), AMD64 was way way more stable than the others.

reply
pjmlp
2 days ago
[-]
I am one of those people, and I think that it only failed because AMD had the possible to turn the tables on Intel, to use the article's title.

Without AMD64, I firmly believe eventually Itanium would have been the new world no matter what.

We see this all the time, technology that could be great but fails due to not being pushed hard enough, and other similar technology that does indeed succeed because the creators are willing push it at a loss during several years until it finally becomes the new way.

reply
thesz
8 hours ago
[-]

  > Without AMD64, I firmly believe eventually Itanium would have been the new world no matter what.
VLIW is not binary forward- or cross-implementation-compatible. If MODEL1 has 2 instruction per block and its successor MODEL2 has 4, the code for MODEL1 will be run on MODEL2, but it will underperform due to underutilization. If execution latencies differ between two versions of the same VLIW ISA implementation, the code for one may not be executed optimally on another. Even different memory controllers and cache hierarchies can change optimal VLIW code.

This precludes any VLIW from having multiple differently constrained implementations. You cannot segment VLIW implementations you can do with as x86, ARM, MIPS, PowerPC, etc, where same code will be executed as optimal as possible on the concrete implementation of ISA.

So - no, Itanium (or any other VLIW for that matter) would not be the new world.

reply
ajross
5 hours ago
[-]
> VLIW is not binary forward- or cross-implementation-compatible.

It was on IA-64, the bundle format was deliberately chosen to allow for easy extension.

But broadly it's true: you can't have a "pure" VLIW architecture independent of the issue and pipeline architecture of the CPU. Any device with differing runtime architecture is going to have to do some cooking of the instructions to match it to its own backend. But that decode engine is much easier to write when it's starting from a wide format that presents lots of instructions and makes explicit promises about their interdependencies.

reply
ghaff
2 days ago
[-]
I'm inclined to agree and I've written as much. In a world where 64-bit x86 wasn't really an option, Intel and "the industry" would probably have eventually figured a way to make Itanium work well-enough and cost-effectively-enough and incremented over time. Some of the then-current RISC chips would probably have remained more broadly viable in that timeline but, in the absence of a viable alternative, 64-bit was going to happen and therefore probably Itanium.

Maybe ARM gets a real kick in the pants but high-performance server processors were probably too far in the future to play a meaningful role.

reply
Agingcoder
2 days ago
[-]
There was a fundamental difficulty with ‘given a sufficiently smart compiler’ if I remember well revolving around automatic parallelization. You might argue that given enough time and money it might have been solved, but it’s a really hard problem.

( I might have forgotten)

reply
ajross
1 day ago
[-]
The compilers did arrive, but obviously too late. Modern pipeline optimization and register scheduling in gcc & LLVM is wildly more sophisticated than anything people were imagining in 2001.
reply
kimixa
1 day ago
[-]
But modern CPUs have even more capabilities on re-ordering/OOO execution and other "live" scheduling work. They will always have more information available than a ahead-of-time static scheduling from the compiler, as so much is data dependent. If it wasn't worth it they would be slashing those capabilities instead.

Statically scheduled/in order stuff is still relegated to pretty much microcontroller, or specific numeric workloads. For general computation, it still seems like a poor fit.

reply
ajross
5 hours ago
[-]
That's true. But if anything that cuts in the opposite direction in the argument: modern CPUs are doing all that optimization in hardware, at runtime. In software it's a no-brainer in comparison.
reply
kimixa
2 days ago
[-]
Yes - VLIW seems to lend itself to computation-heavy code, used to this day in many DSP (and arguably GPU, or at least "influences" many GPU) architectures.
reply
tw04
2 days ago
[-]
Given that Itanium originated at HP, it seems unlikely it was about AMD and more about the fact, at the time, Intel was struggling with 64-bit. People are talking about the P4 but Itanium architecture dates back to the late 80s…

https://en.m.wikipedia.org/wiki/Itanium

reply
mwpmaybe
1 day ago
[-]
For context, it was intended to be the successor to PA-RISC and compete with DEC Alpha.
reply
kouteiheika
2 days ago
[-]
> That's no guarantee it would succeed though - AMD64 also cleaned up a number of warts on the x86 architecture, like more registers.

As someone who works with AMD64 assembly very often - they didn't really clean it up all that much. Instruction encoding is still horrible, you still have a bunch of useless instructions even in 64-bit mode which waste valuable encoding space, you still have a bunch of instructions which hardcode registers for no good reason (e.g. the shift instructions have a hardcoded rcx). The list goes on. They pretty much did almost the minimal amount of work to make it 64-bit, but didn't actually go very far when it comes to making it a clean 64-bit ISA.

I'd love to see what Intel came up, but I'd be surprised if they did a worse job.

reply
p_l
2 days ago
[-]
Pentium 4 was widely speculated of being able to run 64bit at the time of AMD64 delivering, but at half the speed.

Essentially, while decoding a 64bit variant of x86 ISA might have been fused off, there was a very visible part that was common anyway, and that was available ALUs on NetBurst platform - which IIRC were 2x 32bit ALUs for integer ops. So you either issue micro-op to both to "chain" them together, or run every 64bit calculation in multiple steps.

reply
eigenform
2 days ago
[-]
Yeah, they wrote a paper about the ALUs too, see:

https://ctho.org/toread/forclass/18-722/logicfamilies/Delega...

> There are two distinct 32-bit FCLK execution data paths staggered by one clock to implement 64-bit operations.

If it weren't fused off, they probably would've supported 64-bit ops with an additional cycle of latency?

reply
p_l
1 day ago
[-]
At least one cycle, yes, but generally it would make it possible to deliver. AFAIK it also became crucial part of how intel could deliver "EM64T" chips fast enough - only to forget to upgrade memory subsystem which is why first generation can't run Windows (they retained 36bit physical addressing from PAE when AMD64 mandates minimum of 40, and Windows managed to trigger an issue on that)
reply
tuyiown
2 days ago
[-]
> first supported it were (generally) superior even in 32-bit mode.

They also were affordable dual cores, it wasn't the norm at all at the time.

reply
chasil
2 days ago
[-]
The times that I have used "gcc -S" on my code, I have never seen the additional registers used.

I understand that r8-r15 require a REX prefix, which is hostile to code density.

I've never done it with -O2. Maybe that would surprise me.

reply
astrange
2 days ago
[-]
You should be able to see it. REX prefixes cost a lot less than register spills do.

If you mean literally `gcc -S`, -O0 is worse than not optimized and basically keeps everything in memory to make it easier to debug. -Os is the one with readable sensible asm.

reply
chasil
2 days ago
[-]
Thanks, I'll give it a try.
reply
o11c
2 days ago
[-]
Obviously it depends on how many live variables there are at any point. A lot of nasty loops have relatively few non-memory operands involved, especially without inlining (though even without inlining, the ability to control ABI-mandated spills better will help).

But it's guaranteed to use `r8` and `r9` for for a function that takes 5 and 6 integer arguments (including unpacked 128-bit structs as 2 arguments), or 3 and 4 arguments (not sure about unpacking) for Microsoft. And `r10` is used if you make a system call on Linux.

reply
wat10000
2 days ago
[-]
I don't have gcc handy, but this bit of code pretty easily gets clang to use several of them:

    int f(int **x) {
        int *a = x[0]; int *b = x[1]; int *c = x[2]; int *d = x[3];
        puts("hello");
        return *a + *b + *c + *d;
    }
reply
kstrauser
2 days ago
[-]
"If you don't cannibalize yourself, someone else will."

Intel has a strong history of completely mis-reading the market.

reply
zh3
2 days ago
[-]
Andy Grove, "Only the paranoid survive":-

Quote: Business success contains the seeds of its own destruction. Success breeds complacency. Complacency breeds failure. Only the paranoid survive.

- Andy Grove, former CEO of Intel

From wikipedia: https://en.wikipedia.org/wiki/Andrew_Grove#Only_the_Paranoid...

Takeaway: Be paranoid about MBAs running your business.

reply
zer00eyz
2 days ago
[-]
> Takeaway: Be paranoid about MBAs running your business.

Except Andy is talking about himself, and Noyce the engineers getting it wrong: (watch a few minutes of this to get the gist of where they were vs Japan) https://www.youtube.com/watch?v=At3256ASxlA&t=465s

Intel has a long history of sucking, and other people stepping in to force them to get better. Their success has been accident and intervention over and over.

And this isnt just an intel thing, this is kind of an American problem (and maybe a business/capitalism problem). See this take on steel: https://www.construction-physics.com/p/no-inventions-no-inno... that sounds an awful lot like what is happening to intel now.

reply
II2II
2 days ago
[-]
> Intel has a long history of sucking, and other people stepping in to force them to get better. Their success has been accident and intervention over and over.

If one can take popular histories of Intel at face value, they have had enough accidental successes, avoided enough failures, and outright failed so many times that they really ought to know better.

The Itanium wasn't their first attempt to create an incompatible architecture, and it sounds like it was incredibly successful compared to the iAPX 432. Intel never intended to get into microprocessors, wanting to focus on memory instead. Yet they picked up a couple of contracts (which produced the 4004 and 8008) to survive until they reached their actual goal. Not only did it help the company at the time, but it proved essential to the survival of the company when the Japanese semiconductor industry nearly obliterated American memory manufacturers. On the flip side, the 8080 was source compatible with the 8008. Source compatibility would help sell it to users of the 8008. It sounds like the story behind the 8086 is similar, albeit with a twist: not only did it lead to Intel's success when it was adopted by IBM for the PC, but it was intended as a stopgap measure while the iAPX 432 was produced.

This, of course, is a much abbreviated list. It is also impossible to suggest where Intel would be if they made different decisions, since they produced an abundance of other products. We simply don't hear much about them because they were dwarfed by the 80x86 or simply didn't have the public profile of the 80x86 (for example: they produced some popular microcontrollers).

reply
asveikau
2 days ago
[-]
Windows NT also originally targeted a non-x86 CPU from Intel, the i860.
reply
p_l
2 days ago
[-]
i960 was essentially iAPX432 done right in its full form. But the major client (BiiN partnership with Siemens) ultimately didn't pan out, various world events quite possibly also impacted things, and finally intel cannibalized the i960 team to make Pentium Pro.
reply
wslh
2 days ago
[-]
Andy Grove explained this very clearly in his book. By the way, the parallel works if you replace Japan with China in the video. In the late 1970s and 1980s, Japan initially reverse engineered memory chips, and soon it became impossible to compete with them. The Japanese government also heavily subsidized its semiconductor industry during that period.

My point isn't to take a side, but simply to highlight how history often repeats itself, sometimes almost literally, not rhyme.

reply
tjwebbnorfolk
2 days ago
[-]
> Their success has been accident and intervention over and over.

Of course, the whole foundational thesis of market competition is that everything sucks unless forced by competitors to make your product better. That's why its VERY important to have effective competition.

It's not a capitalism problem, or really a "problem" at all. It's a recognition of a fact in nature that all animals are as lazy as they can get away with, and humans (and businesses made by humans) are no different.

reply
nextos
2 days ago
[-]
I don't think it's just mis-reading. It's also internal politics. How many at Nokia knew that the Maemo/MeeGo series was the future, rather than Symbian? I think quite a few. But Symbian execs fought to make sure Maemo didn't get a mobile radio. In most places, internal feuds and little kingdoms prevail over optimal decisions for the entire organization. I imagine lots of people at Intel were deeply invested in IA-64. Same thing repeats mostly everywhere. For example, from what I've heard from insiders, ChromeOS vs Android battles at Google were epic.
reply
immibis
2 days ago
[-]
In other words, all complex systems get cancer.

Cancer is when elements of a system work to enrich themselves instead of the system.

reply
cowmix
2 days ago
[-]
When I ran the Python Meetup here in Phoenix -- an engineer for Intel's compilers group would show up all the time. I remember he would constantly be frustrated that Intel management would purposely down-play and cripple advances of the Atom processor line because they thought it would be "too good" and cannibalize their desktop lines. This was over 15 years ago -- I was hearing this in real-time. He flat out said that Intel considered the mobile market a joke.
reply
sys_64738
1 day ago
[-]
They don't misread the market so much as intentionally do that due to INTC being a market driven org. They want to suck up all the profits in each generation for each SKU. They stopped being an engineering org in the 80s. I hope they crash and burn.
reply
userbinator
2 days ago
[-]
"Recently revealed" is more like a confirmation of what I had read many years before; and furthermore, that Intel's 64-bit x86 would've been more backwards-compatible and better-fitting than AMD64, which looks extremely inelegant in contrast, with several stupid missteps like https://www.pagetable.com/?p=1216 (the comment near the bottom is very interesting.)

If you look at the 286's 16-bit protected mode and then the 386's 32-bit extensions, they fit neatly into the "gaps" in the former; there are some similar gaps in the latter, which look like they had a future extension in mind. Perhaps that consideration was already there in the 80s when the 386 was being designed, but as usual, management got in the way.

reply
CheeseFromLidl
2 days ago
[-]

   would've been more backwards-compatible and better-fitting
Eagerly awaiting the first submission of someone decapping, forcing the fuse, capping and running it.
reply
Dylan16807
2 days ago
[-]
> (the comment near the bottom is very interesting.)

Segmentation very useful for virtualization? I don't follow that claim.

reply
userbinator
2 days ago
[-]
reply
Dylan16807
2 days ago
[-]
"The virtual machine monitor’s trap handler must reside in the guest’s address space, because an exception cannot switch address spaces."

I would call this the real problem, and segmentation a bad workaround.

reply
wmf
2 days ago
[-]
It wasn't recent; Yamhill has been known since 2002. A detailed article about this topic just came out: https://computerparkitecture.substack.com/p/the-long-mode-ch...
reply
Lu2025
2 days ago
[-]
> it would cannibalize IA64 sales

The concern is that it won't cannibalize sales, it would cannibalize IA64 manager's job and status. "You ship the org chart"

reply
indymike
2 days ago
[-]
> Fun fact: Bob Colwell (chief architect of the Pentium Pro through Pentium 4) recently revealed that the Pentium 4 had its own 64-bit extension to x86 that would have beaten AMD64 to market by several years, but management forced him to disable it because they were worried that it would cannibalize IA64 sales.

File this one under "we made the right decision based on everything we knew at the time." It's really sad because the absolute right choice would have been to extend x86 and let it duke it out with Itanium. Intel would win either way and the competition would have been even more on the back heel. So easy to see that decades later...

reply
ChuckMcM
1 day ago
[-]
Yup. I went to the Microprocessor Forum where they introduced 'Sledgehammer' (the AMD 64 architecture) and came back to NetApp where I was working and started working out how we'd build our next Filer using it. (that was a journey given the AMD distrust inside of NetApp!). I had a pretty frank discussion with the Intel SVP of product who was pretty bought into the Intel "high end is IA, Mid/PC is IA32, embedded is the 8051 stuff". They were having a hard time with getting Itainum wins.
reply
mathgradthrow
2 days ago
[-]
This seems like an object lesson in making sure that the right hand does not know what the left is doing. Yes, if you have two departments working on two mutually exclusive architectures, one of them will necessarily fail. In exchange, however, you can guarantee that it will be the worse one. This is undervalued as a principle since the wasted labor is more easily measured, and therefore decision making is biased towards it.
reply
short_sells_poo
2 days ago
[-]
I agree with you, but perhaps this is very hard (impossible?) to pull off. Invariably, politics will result in various outcomes being favored in management and the moment that groups realize the game is rigged, the whole fair market devolves into the usual political in-fighting.
reply
jcranmer
2 days ago
[-]
The story I heard (which I can't corroborate) was that it was Microsoft that nixed Intel's alternative 64-bit x86 ISA, instead telling it to implement AMD's version instead.
reply
smashed
2 days ago
[-]
Microsoft did port some versions of Windows to Itanium, so they did not reject it at first.

With poor market demand and AMD's success with amd64, Microsoft did not support itanium in vista and later desktop versions which signaled the end of Intel's Itanium.

reply
Analemma_
2 days ago
[-]
Microsoft also ships/shipped a commercial compiler with tons of users, and so they were probably in a position to realize early that the hypothetical "sufficiently smart compiler" which Itanium needed to reach its potential wasn't actually possible.
reply
SunlitCat
2 days ago
[-]
I wonder if AI would have been a huge help to that.
reply
consp
2 days ago
[-]
Some "simple" optimization algorithm would be enough modern "AI" just adds obfuscation. Though it would be slow as hell and thus unusable.
reply
wmf
2 days ago
[-]
Microsoft supported IA-64 (Itanium) and AMD64 but they refused to also support Yamhill. They didn't want to support three different ISAs.
reply
dooglius
2 days ago
[-]
What is/was Yamhill?
reply
cwizou
2 days ago
[-]
It was the name of Intel's x86 64bit flavor : https://www.edn.com/intel-working-on-yamhill-technology-says...
reply
antod
2 days ago
[-]
Yeah, I remember hearing that at the time too. When MS chose to support AMD64, they made it clear it was the only 64bit x86 ISA they were going to support, even though it was an open secret Intel was sitting on one but not wanting to announce it.
reply
alfiedotwtf
1 day ago
[-]
> cannibalize IA64 sales

Damn!

reply
h4ck_th3_pl4n3t
2 days ago
[-]
I wanted to mention that the Pentium 4 (Prescott) that was marketed as the Centrino in laptops had 64bit capabilities, but it was described as 32bit extended mode. I remember buying a laptop in 2005(?) which I first ran with XP 32bit, and then downloading the wrong Ubuntu 64bit Dapper Drake image, and the 64bit kernel was running...and being super confused about it.

Also, for a long while, Intel rebranded the Pentium 4 as Intel Atom, which then usually got an iGPU on top with being a bit higher in clock rates. No idea if this is still the case (post Haswell changes) but I was astonished to buy a CPU 10 years later to have the same kind of oldskool cores in it, just with some modifications, and actually with worse L3 cache than the Centrino variants.

core2duo and core2quad were peak coreboot hacking for me, because at the time the intel ucode blob was still fairly simple and didn't contain all the quirks and errata fixes that more modern cpu generations have.

reply
kccqzy
2 days ago
[-]
In 2005 you could already buy Intel processors with AMD64. It just wasn't called AMD64 or Intel64; it was called EM64T. During that era running 64-bit Windows was rare but running 64-bit Linux was pretty commonplace, at least amongst my circle of friends. Some Linux distributions even had an installer that told the user they were about to install 32-bit Linux on a computer capable of running 64-bit Linux (perhaps YaST?).
reply
fy20
2 days ago
[-]
AMD was a no-brainer in the mid 2000s if you were running Linux. It was typically cheaper than Intel, lower power consumption (= less heat, less fan noise), had 64bit so you could run more memory, and dual core support was more widespread. Linux was easily able to take advantage of all of these, were as for Windows it was trickier.
reply
mjg59
2 days ago
[-]
Pentium 4 was never marketed as Centrino - that came in with the Pentium M, which was very definitely not 64-bit capable (and didn't even officially have PAE support to begin with). Atom was its own microarchitecture aimed at low power use cases, which Pentium 4 was definitely not.
reply
marmarama
2 days ago
[-]
Centrino was Intel's brand for their wireless networking and laptops that had their wireless chipsets, the CPUs of which were all P6-derived (Pentium M, Core Duo).

Possibly you meant Celeron?

Also the Pentium 4 uarch (Netburst) is nothing like any of the Atoms (big for the time out-of-order core vs. a small in-order core).

reply
SilverElfin
2 days ago
[-]
Speaking of marketing, that era of Intel was very weird for consumers. In the 1990s, they had iconic ads and words like Pentium or MMX became powerful branding for Intel. In the 2000s I think it got very confused. Centrino? Ultrabook? Atom? Then for some time there was Core. But it became hard to know what to care about and what was bizarre corporate speak. That was a failure of marketing. But maybe it was also an indication of a cultural problem at Intel.
reply
sys_64738
1 day ago
[-]
The is what happens when marketing gets involved. The worst of the worst being INTC marketing dept.
reply
immibis
2 days ago
[-]
Core is confusing. Of course it's a Core 2. It has 2 cores in it. Core 2 Quad? Obviously has 2 cores... oh wait, 4. i3/i5/i7 was reasonable except for lacking the generation number so people thought a 6th gen i3 was slower than a 1st gen i7 because 3 is less than 7. Nvidia seems to have model numbers figured out. Higher number is better, first half is generation and second half is relative position within it. At least if they didn't keep unfairly shifting the second half.
reply
p_l
2 days ago
[-]
Very early intel "EM64T" chips (aka amd64 compatible) had too short virtual address size of 36bit instead of 40, which is why Windows 64bit didn't run on them, but some linux versions did.

Rest is well explained by sibling posts :)

reply
cogman10
2 days ago
[-]
reply
seabrookmx
2 days ago
[-]
PAE is a 32-bit feature that was around long before AMD64. OP means EM64T: https://www.intel.com/content/www/us/en/support/articles/000...
reply
esseph
2 days ago
[-]
No, EM64T
reply
bigstrat2003
2 days ago
[-]
I remember at the time thinking it was really silly for Intel to release a 64-bit processor that broke compatibility, and was very glad AMD kept it. Years later I learned about kernel writing, and I now get why Intel tried to break with the old - the compatibility hacks piled up on x86 are truly awful. But ultimately, customers don't care about that, they just want their stuff to run.
reply
drewg123
2 days ago
[-]
It didn't help that Itanium was late, slow, and Intel/HP marketing used Itanium to kill off the various RISC CPUs, each of which had very loyal fans. This pissed off a lot of techies at the time.

I was a HUGE DEC Alpha fanboy at the time (even helped port FreeBSD to DEC Alpha), so I hated Itanium with a passion. I'm sure people like me who were 64-bit MIPS and PA-RISC fanboys and fangrirls also existed, and also lobbied against adoption of itanic where they could.

I remember when amd64 appeared, and it just made so much sense.

reply
EasyMark
2 days ago
[-]
This, if intel's compilers and architecture had been stellar and provided a x5 or x10 improvement it would have caught on. However no one in IT was fool enough to switch architectures over a 30-50% performance improvement that require switching hardware, compilers, and software and try to sell it to their bosses.
reply
kjs3
2 days ago
[-]
I dunno if you meant it this way, but I've heard waaaay too many people say things like this meaning "if Intel compiler guys didn't suck...". They didn't, and don't (Intel C and Fortran compilers are to this day excellent). The simple fact is noone has proven yet that anyone can write compilers good enough to give VLIW overwhelmingly compelling performance outside of niche uses (DSPs, for example). I remember the Multiflow and Cydrome guys giving the same "it's the compiler, stupid" spiel in the mid-80s, and the story hasn't changed much except the details. We bought a Multiflow Trace...it was really nice, for certain problems, but not order-of-magnatude-faster, change-the-world nice, which was how it was sold.

Now, to be clear, a lot of these folks and their ideas moved the state-of-the-art in compilers massively ahead, and are a big reason compilers are so good now. Really, really smart people worked this problem.

reply
acdha
2 days ago
[-]
I think Itanium would have had a better chance if Intel had made their compilers and optimized libraries free, but your larger point is really important: Intel’s performance numbers for the Itanium seem to have been broadly extrapolated from a few very FPU-intensive benchmarks and I don’t think it was ever realistic to expect anything like that level of performance for branchy business logic or to decisively change the price-performance ratio to become compelling. I worked with some scientists who did have a ton of floating point-heavy code, so they were definitely interested but their code also had a lot of non-linear memory access and the Itanium never managed a performance lead at all, much less one big enough that it wouldn’t have been cheaper and faster to buy 2-4 other servers for each Itanium box. In contrast, when AMD released the Opteron it decisively took the lead in absolute performance as well as price/performance and so we bought them by the rack.
reply
p_l
2 days ago
[-]
My understanding is that Itanium was also stupidly strict on VLIW, making compilers even harder. Microsoft blogs had examples of some really funky bugs caused by it, too.

In comparison, Multiflow was not so bad.

reply
axiolite
2 days ago
[-]
> if intel's compilers and architecture had been stellar and provided a x5 or x10 improvement it would have caught on.

That sounds like DEC Alpha to me, yet Alpha didn't take over the world. "Proprietary architecture" is a bad word, not something you want to base your future on. Without the Intel/AMD competition, x86 wouldn't have dominated for all these years.

reply
cameldrv
2 days ago
[-]
The DEC Alpha actually did provide very good performance, and you could even run Windows NT on it. As far as I can tell, the biggest problem was just that Alpha systems were very expensive, and so they had a limited customer base. There were some quirks, but the main thing was just that you'd be paying $2000 for a PC and $10,000 for an Alpha based system, and most people didn't need the performance that badly.
reply
axiolite
23 hours ago
[-]
> Alpha systems were very expensive, and so they had a limited customer base.

That's the usual chicken & egg problem... If they sold more units, the prices would have come down. But people weren't buying many, because the prices were high.

Itanium, like Alpha, or any other alternative architecture, would also have trouble and get stuck in that circle. x86-64, being a very inexpensive add-on to x86, managed to avoid that.

reply
kjs3
2 days ago
[-]
PA-RISC fanboys and fangrirls

Itanic wasn't exactly HP-PA v.3, but it was a kissing cousin. Most of the HP shops I worked with believed the rhetoric it was going to be a straightforward if not completely painless upgrade from the PA-8x00 gear they were currently using.

Not so much.

The MIPS 10k line on the other hand...sigh...what might have been.

I remember when amd64 appeared, and it just made so much sense.

And you were right.

reply
hawflakes
2 days ago
[-]
Did the PA-RISC shops run their old PA-RISC code with the Aries emulator?

One of the selling points for HP users was running old code via dynamic translation and x86 would just work on the hardware directly.

Another fun fact I remember from working at HP was that later PA-RISC chips were fabbed at Intel because the HP-Intel agreement had Intel fabbing a certain amount of chips and since Merced was running behind... Intel-fabbed PA-RISC chips!

https://community.hpe.com/t5/operating-system-hp-ux/parisc-p...

reply
p_l
2 days ago
[-]
First generations of Itanium used same bus and support chips as last HP-PA, thus way simpler migration path involved - some servers even allowed to swap HP-PA for Itanium without replacing most of the server (similar as with rare VAX 7000 and VAX 10000, which could have CPU boards replaced with Alpha ones)
reply
antod
2 days ago
[-]
Wasn't much of the Athlon designed by laid-off DEC Alpha engineers that AMD snapped up? Makes sense that AMD64 makes sense to an Alpha fanboy :)
reply
kjs3
2 days ago
[-]
Yeah...look up Jim Keller. And AMD basically recycled the later Alpha system bus as the K7 bus to the extent there was very short lived buzz about having machines that could be either x86-64 or Alpha.
reply
Romario77
2 days ago
[-]
It wasn't just incompatibility, it was some of the design decisions that made it very hard to make performant code that runs well on Itanium.

Intel made a bet on parallel processing and compilers figuring out how to organize instructions instead of doing this in silicone. It proved to be very hard to do, so the supposedly next gen processors turned out to be more expensive and slower than the last gen or new AMD ones.

reply
cameldrv
2 days ago
[-]
Yeah the biggest idea was essentially to do the scheduling of instructions upfront in the compiler instead of dynamically at runtime. By doing this, you can save a ton of die area for control and put it into functional units doing math etc.

The problem as far as I can tell as a layman is that the compiler simply doesn't have enough information to do this job at compile time. The timing of the CPU is not deterministic in the real world because caches can miss unpredictably, even depending on what other processes are running at the same time on the computer. Branches also can be different depending on the data being processed. Branch predictors and prefetchers can optimize this at runtime using the actual statistics of what's happening in that particular execution of the program. Better compilers can do profile directed optimization, but it's still going to be optimized for the particular situation the CPU was in during the profile run(s).

If you think of a program like an interpreter running a tight loop in an interpreted program, a good branch predictor and prefetcher are probably going to be able to predict fairly well, but a statically scheduled CPU is in trouble because at the compile time of the interpreter, the compiler has no idea what program the interpreter is going to be running.

reply
wvenable
2 days ago
[-]
Intel might have been successful with the transition if they didn't decide to go with such radically different and real-world untested architecture for Itanium.
reply
pixl97
2 days ago
[-]
Well that and Itanium was eyewateringly expensive and standard PC was much cheaper for similar or faster speeds.
reply
Tsiklon
2 days ago
[-]
I think Itanium was a remarkable success in some other ways. Intel utterly destroyed the workstation market with it. HP-UX, IRIX, AIX, Solaris.

Itanium sounded the deathknell for all of them.

The only Unix to survive with any market share is MacOS, (arguably because of its lateness to the party) and it has only relatively recently went back to a more bespoke architecture

reply
inkyoto
2 days ago
[-]
Looking back, I think we can now conclude that it was largely inevitable for the other designs to fade sooner or later – and that is what has happened.

The late 90's to the early aughts' race for highest-frequency, highest-performance CPUs exposed not a need for a CPU-only, highly specialised foundry, but a need for sustained access to the very front of process technology – continuous, multibillion-dollar investment and a steep learning curve. Pure-play foundries such as TSMC could justify that spend by aggregating huge, diverse demand across CPU's, GPU's and SoC's, whilst only a handful of integrated device manufacturers could fund it internally at scale.

The major RISC houses – DEC, MIPS, Sun, HP and IBM – had excellent designs, yet as they pushed performance they repeatedly ran into process-cadence and capital-intensity limits. Some owned fabs but struggled to keep them competitive; others outsourced and were constrained by partners’ roadmaps. One can trace the pattern in the moves of the era: DEC selling its fab, Sun relying on partners such as TI and later TSMC, HP shifting PA-RISC to external processes, and IBM standing out as an exception for a time before ultimately stepping away from leading-edge manufacturing as well.

A compounding factor was corporate portfolio focus. Conglomerates such as Motorola, TI and NEC ran diversified businesses and prioritised the segments where their fab economics worked best – often defence, embedded processors and DSP's – rather than pouring ever greater sums into low-volume, general-purpose RISC CPU's. IBM continued to innovate and POWER endured, but industry consolidation steadily reduced the number of independent RISC CPU houses.

In the end, x86 benefited from an integrated device manufacturer (i.e. Intel) with massive volume and a durable process lead, which set the cadence for the rest of the field. The outcome was less about the superiority of a CPU-only foundry and more about scale – continuous access to the leading node, paid for by either gigantic internal volume or a foundry model that spread the cost across many advanced products.

reply
jabl
2 days ago
[-]
Yes. AFAIU the cost of process R&D and building and running leading-edge fabs massively outweigh the cost of CPU architecture R&D. It's just a world of its own largely out the comfort zone of software people, hence we endlessly debate the merits of this or that ISA, or this or that microarchitecture, a bit like the drunkard searching for his keys under the streetlamp.

It's also interesting to note that back then the consensus was that you needed your own in-house fab with tight integration between the fab and CPU design teams to build the highest performance CPU's. Merchant fabs were seen as second-best options for those who didn't need the highest performance or couldn't afford their own in-house fab. Only later did the meteoric rise of TSMC to the top spot on the semiconductor food chain upend that notion.

reply
icedchai
2 days ago
[-]
I'd argue it was Linux (on x86) and the dot-com crash that destroyed the workstation market, not Itanium. The early 2000s was awash in used workstation gear, especially Sun. I've never seen anyone with an Itanium box.
reply
phire
2 days ago
[-]
While Linux helped, I'd argue the true factor is that x86 failed to die as projected.

The common attitude in the 80s and 90s was that legacy ISAs like 68k and x86 had no future. They had zero chance to keep up with the innovation of modern RISC designs. But not only did x86 keep up, it was actually outperforming many RISC ISAs.

The true factor is out-of-order execution. Some RISC contemporary designs were out-of-order too (Especially Alpha, and PowerPC to a lesser extent), but both AMD and Intel were forced to go all-in on the concept in a desperate attempt to keep the legacy x86 ISA going.

Turns out large out-of-order designs was the correct path (mostly OoO has side effect of being able to reorder memory accesses and execute them in parallel), and AMD/Intel had a bit of a head start, a pre-existing customer base and plenty of revenue for R&D.

IMO, Itanium failed not because it was a bad design, but because it was on the wrong path. Itanium was an attempt to achieve roughly the same end goal as OoO, but with a completely in-order design, relying on static scheduling. It had massive amounts of complexity that let it re-order memory reads. In an alternative universe where OoO (aka dynamic scheduling) failed, Itanium might actually be a good design.

Anyway, by the early 2000s, there just wasn't much advantage to a RISC workstation (or RISC servers). x86 could keep up, was continuing to get faster and often cheaper. And there were massive advantages to having the same ISA across your servers, workstations and desktops.

reply
chasil
2 days ago
[-]
Bob Colwell mentions originally doing out of order design at Multiflow.

He was a key player in the Pentium Pro out of order implementation.

https://www.sigmicro.org/media/oralhistories/colwell.pdf

"We should also say that the 360/91 from IBM in the 1960s was also out of order, it was the first one and it was not academic, that was a real machine. Incidentally that is one of the reasons that we picked certain terms that we used for the insides of the P6, like the reservation station that came straight out of the 360/91."

Here is his Itanium commentary:

"Anyway this chip architect guy is standing up in front of this group promising the moon and stars. And I finally put my hand up and said I just could not see how you're proposing to get to those kind of performance levels. And he said well we've got a simulation, and I thought Ah, ok. That shut me up for a little bit, but then something occurred to me and I interrupted him again. I said, wait I am sorry to derail this meeting. But how would you use a simulator if you don't have a compiler? He said, well that's true we don't have a compiler yet, so I hand assembled my simulations. I asked "How did you do thousands of line of code that way?" He said “No, I did 30 lines of code”. Flabbergasted, I said, "You're predicting the entire future of this architecture on 30 lines of hand generated code?" [chuckle], I said it just like that, I did not mean to be insulting but I was just thunderstruck. Andy Grove piped up and said "we are not here right now to reconsider the future of this effort, so let’s move on"."

reply
phire
2 days ago
[-]
> Bob Colwell mentions originally doing out of order design at Multiflow.

Actually no, it was Metaflow [0] who was doing out-of-order. To quote Colwell:

"I think he lacked faith that the three of us could pull this off. So he contacted a group called Metaflow. Not to be confused with Multiflow, no connection."

"Metaflow was a San Diego group startup. They were trying to design an out of order microarchitecture for chips. Fred thought what the heck, we can just license theirs and remove lot of risk from our project. But we looked at them, we talked to their guys, we used their simulator for a while, but eventually we became convinced that there were some fundamental design decisions that Metaflow had made that we thought would ultimately limit what we could do with Intel silicon."

Multiflow, [1] where Colwell worked, has nothing to do with OoO, its design is actually way closer to Itanium. So close, in-fact that the Itanium project is arguably a direct decedent of Multiflow (HP licensed the technology, and hired Multiflow's founder, Josh Fisher). Colwell claims that Itainum's compiler is nothing more than the Multiflow compiler with large chunks rewritten for better performance.

[0] https://en.wikipedia.org/wiki/Metaflow_Technologies

[1] https://en.wikipedia.org/wiki/Multiflow

reply
chasil
2 days ago
[-]
I thoroughly acknowledge and enjoy your clarification.
reply
stevefan1999
2 days ago
[-]
> The true factor is out-of-order execution.

I'm pressing X: the doubt button.

I would argue that speculative execution/branch prediction and wider pipeline, both of which that OoO largely benefitted from, would be more than OoO themselves to be the sole factor. In fact I believe the improvement in semiconductor manufacturing process node could contribute more to the IPC gain than OoO itself.

reply
phire
2 days ago
[-]
To be clear, when I (and most people) say OoO, I don't mean just the act of executing instructions out-of-order. I mean the whole modern paradigm of "complex branch predictors, controlling wide front-ends, feeding schedulers with wide back-ends and hundreds or even thousands of instructions in flight".

It's a little annoying that OoO is overloaded in this way. I have seen some people suggesting we should be calling these designs "Massively-Out-of-Order" or "Great-Big-Out-of-Order" in order to be more specific, but that terminology isn't in common use.

And yes, there are some designs out there which are technically out-of-order, but don't count as MOoO/GBOoO. The early PowerPC cores come to mind.

It's not that executing instructions out-of-order benefits from complex branch prediction and wide execution units, OoO is what made it viable to start using wide execution units and complex branch prediction in the first place.

A simple in-order core simply can't extract that much parallelism, the benefits drop off quickly after two-wide super scalar. And accurate branch prediction is of limited usefulness when the pipeline is that short.

There are really only two ways to extract more parallelism. You either do complex out-of-order scheduling (aka dynamic scheduling), or you take the VLIW approach and try to solve it with static scheduling, like the Itanium. They really are just two sides of the same "I want a wide core" coin.

And we all know how badly the Itanium failed.

reply
stevefan1999
2 days ago
[-]
> I mean the whole modern paradigm of "complex branch predictors, controlling wide front-ends, feeding schedulers with wide back-ends and hundreds or even thousands of instructions in flight".

Ah, the philosophy of having the CPU execution out of ordered, you mean.

> A simple in-order core simply can't extract that much parallelism

While yes, it is also noticable that it does not have data hazard because a pipeline simply doesn't exist at all, and thus there is no need for implicit pipeline bubble or delay slot.

> And accurate branch prediction is of limited usefulness when the pipeline is that short.

You can also use a software virtual machine to turn an out-of-order CPU into basically running in-order code and you can see how slow that goes. That's why JIT VM such as HotSpot and GraalVM for JVM platform, RyuJIT for CoreCLR, and TurboFan for V8 is so much faster, because when you compile them to native instruction, the branch predictor could finally kick in.

> like the Itanium > And we all know how badly the Itanium failed.

Itanium is not exactly VLIW. It is an EPIC [^1] fail though.

[1]: https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...

reply
tyingq
2 days ago
[-]
I think the idea there is that it's less direct. Intel's lack of interest in a 64-bit x86 spawned AMD x64. The failure of Itanium then let that Linux/AMD x64 kill off the workstation market, and the larger RISC/Unix market. Linux on 32 bit X86 or 64 bit RISC alone was making some headway there, but the Linux/x64 combo is what enabled the full kill off.
reply
p_l
2 days ago
[-]
Intel's lack of interest in delivering 64bit for "peons" running x86 also was part - I remember when first discussion in popular computer magazines showed of amd64, that intel's proposed timeline was discussed, and it very much indicated a wish to push for "buy our super expensive stuff" and trying to squeeze money.

Meanwhile the decision to keep Itanium on expensive but lower-volume market meant that there simply wasn't much market growth, especially once non-technical part of killing other RISCs failed. Ultimately Itanium was left as recommended way in some markets to run Oracle databases (due to partnership between Oracle and HP) and not much else, while shops that used other RISC platforms either migrated to AMD64, or moved to other RISC platforms (even forcing HP to resurrect Alpha for last one gen)

reply
kjs3
2 days ago
[-]
Yup. I had a front row seat. So many discussion with startups in the 2Ks that boiled down to "we can get a Sun/HP/DEC machine, or we can get 4-5 nice Wintel boxes running Linux for the same price". So at the point where everyone figured out Linux was a 'good enough' Unix for dev work and porting to the incumbents was a reasonable prospect, it was "so do we all want to share one machine or go find 500% more funding just to have the marquis brand". Once you made that leap, "we don't need the incumbents" because inevitable.
reply
icedchai
2 days ago
[-]
It was amazing how fast that happened. I remember one startup mainly supported Sun, late 90's, early 2000's. This was for a so called "enterprise" app that would run on-prem. They wanted me to move the app to Linux (Red Hat, I think?) so they could take it to a trade show booth without reliable Internet access. It was a pretty simple port.
reply
cameldrv
2 days ago
[-]
If you're counting all desktop/server computers, Linux has way more market share than all of the Unices ever did. It's probably even true for desktop Linux. If you count mobile phones, Android is a Linux derivative, and iOS is a BSD derivative. The fundamental issue for the workstation vendors was simply that with the P6, Intel was near parity or even ahead of the workstation vendors in performance, and it cost something like 1/4 as much.
reply
seabrookmx
2 days ago
[-]
HP-UX was one of the most popular operating systems to run on Itanium though?
reply
icedchai
2 days ago
[-]
HP was also one of the few companies to actually sell Itanium systems! They were also the last to stop selling them. They ported both OpenVMS and HP-UX to Itanium.
reply
sillywalk
2 days ago
[-]
HP also ported NonStop to Itanium.
reply
tyingq
2 days ago
[-]
Well, largely because they made it difficult for customers to stay on PA-RISC, then later, because their competitors were dying off...and if you were in the market for stodgy RISC/Unix there weren't many other choices.
reply
icedchai
2 days ago
[-]
As for RISC/Unix, in the enterprise, IBM's POWER/AIX is still around. I know some die hard IBM shops still using it.

I guess Oracle / Sun sparc is also still hanging on. I haven't seen a Sun shop since the early 2000's...

reply
kjs3
2 days ago
[-]
There's still a lot of AIX around and the LoB is seeing revenue growth. You just don't hear about it on HN because it's mostly doing mundane, mission critical stuff buried in large orgs.

I still run into a number of Solaris/SPARC shops, but even the most die hard of them are actively looking for the off-ramp. The writing is on that wall.

reply
icedchai
2 days ago
[-]
I believe it! For a few years, I worked on fairly large system deployed to an AIX environment. The hardware and software were both rock solid. While I haven't used it, the performance of the newer POWER stuff looks incredible.
reply
p_l
2 days ago
[-]
Oracle sales would push you towards HP-UX on Itanium as recommended platform.

To the point that once that ended with Oracle's purchase of Sun, there was a lawsuit between Oracle and HP. And a lot of angry customers as HP-UX was pushed to the last moment of acquisition announcement.

reply
bluedino
2 days ago
[-]
That's what we ran. Core system was written on PICK Basic in the 80's and it just kept going on and on. I was buying HP Integrity (Itanium line) spare parts on eBay up until about 10 years ago.
reply
cryptonector
2 days ago
[-]
Absolutely not. Sun destroyed itself and Solaris, not Intel. The others were even more also-rans than Solaris.
reply
icedchai
2 days ago
[-]
If Sun had been more liberal with Solaris licensing on x86 in the early years (before, say, 2000), we might all be running Solaris servers today. Sun / Solaris was the Unix for most of the 90's through the dot-com crash.

Almost all early startups I worked with were Sun / Solaris shops. All the early ISPs I worked with had Sun boxes for their customer shell accounts and web hosts. They put the "dot in dot-com", after all...

reply
kronicum2025
2 days ago
[-]
And such a terrible architecture for the time.
reply
zokier
2 days ago
[-]
It is worth noting that at the turn of the century x86 wasn't yet so utterly dominant yet. Alphas, PowerPC, MIPS, SPARC and whatnot were still very much a thing. So that is part why running x86 software was not as high priority, and maybe even compatibility with PA-RISC would have been a higher priority.
reply
Spooky23
2 days ago
[-]
The writing was on the wall once Linux was a thing. I did alot of solution design in that period. The only times there were good business cases in my world for not-x86 were scenarios where DBAs and some vertical software required Sun, and occasionally AIX or HPUX for license optimization or some weird mainframe finance scheme.

The cost structure was just bonkers. I replaced a big file server environment that was like $2M of Sun gear with like $600k of HP Proliant.

reply
michaelt
2 days ago
[-]
And by ~2000 there were also increasingly viable x86 offerings in CAD, 3D and video editing.

You had AutoCAD, you had 3D Studio Max, you had After Effects, you had Adobe Premiere. And it was solid stuff - maybe not best-in-class, but good enough, and the price was right.

reply
rollcat
2 days ago
[-]
> The writing was on the wall once Linux was a thing.

Linux didn't "win" nearly as much as x86 did by becoming "good enough" - Linux just happened to be around to capitalize on that victory.

The writing on the wall was the decreasing prices and increasing capability of consumer-grade hardware. Then real game-changer followed: horizontal scalability.

reply
tliltocatl
2 days ago
[-]
Well, according to some IA-64 was a planned flop with the whole purpose of undermining HP's supercomputer division.
reply
cogman10
2 days ago
[-]
Nah, HP made bank on their superdome computers even though they had very few clients. People paid through the nose for those. I worked on IA-64 stuff in 2011, long after I thought it was dead :D.

The real thing that killed the division is Oracle announcing that they would no longer support IA-64. It just so happened that like 90% of the clients using Itanium were using it for oracle DBs.

But by that point HP was already trying to get people to transition to more traditional x86 servers that they were selling.

reply
hawflakes
2 days ago
[-]
The hardware folks at HP were big into the outdoors. The story went that it was named Halfdome but customers outside the US who weren't familiar with Yosemite would ask where the other half was.

https://en.wikipedia.org/wiki/Half_Dome

reply
unethical_ban
2 days ago
[-]
Is that true in 2000, especially as consumer PCs ramped up?
reply
irusensei
2 days ago
[-]
I heard AMD64 uses some DEC Alpha tricks. Guess DEC engineers did had a last laugh against intel in the end.
reply
wicket
2 days ago
[-]
A couple of details missing from the article:

- Intel quietly introduced their implementation of amd64 under the name "EM64T". It was only later that they used the name "Intel64".

- Early Itanium processors included hardware features, microcode and software that implemented an IA‑32 Execution Layer (dynamic binary translation plus microcode assists) to run 32‑bit x86 code; while the EL often ran faster than direct software emulation, it typically lagged native x86 performance and could be worse than highly‑optimised emulators for some workloads or early processor steppings.

reply
p_l
2 days ago
[-]
EL was considered so bad, either Microsoft or HP speed ran an emulator implementation of their own which enabled HP-designed Itanium 2 to lack it.
reply
gbraad
2 days ago
[-]
It never mentioned the release of the x86_64 'emulator' by AMD to prepare and test your 64bit development. Or even the Opteron. Feels like it is more story how the author perceived it than an actual timeline

Edit: Looked it up, it is called AMD SimNow! Originally released in 2000. I clearly remember www.x86-64.org existed for this

reply
gethly
2 days ago
[-]
Athlon was my second computer(cpu) after i486. I think the core was K7 architecture and it had 700MHz clock, iirc. I rememebr Athlon/AMD being much cheaper than Intel and it was very exotic even thinking about it as Intel was EVERYWHERE(it was THE computer - "intel inside") and getting AMD was quite literally a question whether I'll even be able to install Windows and run normal programs(we really didn't know back then). I think I had another AMD after that in desktop(1.4GHz, dual core....iirc), then Intel in a laptop and now AMD again in a laptop. Will probably stick with AMD for the future as well.
reply
m101
2 days ago
[-]
I remember the days of cpu clock speed being displayed on the outside of the computer case using an led display. There was also this turbo button but I'm not sure whether that really did anything.
reply
fredoralive
2 days ago
[-]
Generally, Turbo toggled some form of slow mode. Back in the XT era, enough software relied on the original 4.77MHz CPU clock of the PC and XT that faster Turbo XT clones running at 8MHz would have a switch to slow things back down. It persisted for a while into the early ‘90s as a way to deal with software that expects a slower CPU, although later implementations may not slow things down all the way to an XT’s speed.
reply
zerocrates
2 days ago
[-]
I was one of those weird users who used the 64-bit version of Windows XP, with what I'm pretty sure was an Athlon 64 X2, both the first 64-bit chip and first dual-core one that I had.
reply
speed_spread
2 days ago
[-]
XP64 shared a lot with Windows Server 2003. Perhaps the best Windows ever released.
reply
seabrookmx
2 days ago
[-]
Did 2003 have symlinks?

7 and 2008R2 were pretty good too. All downhill from there..

reply
jborean93
2 days ago
[-]
It had junction points and hard links but symbolic links were added in Vista/Server 2008.
reply
chasil
2 days ago
[-]
This seems odd, as there was a POSIX layer in Windows from the beginning, and I can't see how it could do without symbolic links.

https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem

reply
jborean93
2 days ago
[-]
No idea if the POSIX subsystem used NTFS or some other filesystem but if it was NTFS it probably just used the same reparse data buffer. It's just that Windows only added a symlink buffer structure in Vista/2008. You can manually use the same data buffer in older Windows versions it just won't know what to do with them just like all the other reparse data structures.
reply
chasil
2 days ago
[-]
So the "reparse data buffer" would be able to implement symlink() as a POSIX system call?

https://pubs.opengroup.org/onlinepubs/009695299/functions/sy...

reply
jborean93
2 days ago
[-]
The subsystem in question would be the one to handle the logic for the syscall. So the POSIX subsystem would use the reparse data buffer as needed. It's just that the Win32 subsystem added its own symlink one in Vista/2008.

This is all a guess, the POSIX subsystems were a bit before my time and I've never actually used them. I just know how symlinks work on Windows/NTFS and when they were added.

reply
htgb
2 days ago
[-]
Me too! It was funny how little love it got given how well it worked.

The only issues I came across were artificial blocks. Some programs would check the OS version and give an error just because. Even the MSN Messenger (also by Microsoft) refused to install by default; I had to patch the msi somehow to install it anyway. And then it ran without issues, once installed.

reply
bunabhucan
2 days ago
[-]
We tried Windows 2000 Professional for the DEC Alpha for a GIS system in the late 90s. Suddenly made the $5000 PCs that could run it seem cheap.
reply
chasil
2 days ago
[-]
Unfortunately, NT for Alpha only ran in a 32-bit address space.

"The 64-bit versions of Windows NT were originally intended to run on Itanium and DEC Alpha; the latter was used internally at Microsoft during early development of 64-bit Windows. This continued for some time after Microsoft publicly announced that it was cancelling plans to ship 64-bit Windows for Alpha. Because of this, Alpha versions of Windows NT are 32-bit only."

https://en.wikipedia.org/wiki/Windows_NT#64-bit_platforms

reply
antod
2 days ago
[-]
Alpha support was removed in one of the later NT5 betas right? Makes sense that it would've been late 90s then, before it was renamed Windows 2000 for release.
reply
thequux
2 days ago
[-]
Years ago, I had a CD marked "Windows 2000 for Alpha RC1", which suggests that it was cancelled quite late in the release cycle.
reply
p_l
2 days ago
[-]
It was canceled essentially overnight by Compaq higher-ups, teams at Microsoft and Compaq learnt when they came to office. It was present on last release candidate before RTM, because it was essentially the only 64bit platform to actually fixup 32bit issues that prevented 64bit address space in earlier NT releases.
reply
ndesaulniers
2 days ago
[-]
Yeah I remember windows xp 64b, running on my Pentium D (first Intel dual cores, IIRC).
reply
mrweasel
2 days ago
[-]
It's probably important to note that the AMD64 platform isn't what got Intel in it's current situation. After adopting AMD64 Intel once again dominated AMD and the Bulldozer/Piledrive/Excavator series of AMD processors where not doing well in the competition with Intel.

With Zen AMD once again turned the tables on Intel, but not enough to break Intel. Intels downfall seems entirely self-inflicted and is due to a series of bad business decisions and sub-par product releases.

reply
nayuki
2 days ago
[-]
Yeah. The article tells a good story and I agree with it. I even bought an Athlon 64 CPU back in ~2004.

What I want to add to the story is that when Intel Core 2 came out (and it was an x86-64 chip), it absolutely crushed AMD's Athlon 64 processors. It won so hard that, more or less, the lowest spec Core 2 CPU was faster than the highest spec Athlon 64 CPU. (To confirm this, you can look up benchmark articles around the year 2006, such as those from Tom's Hardware Guide.) Needless to say, my next computer in 2008 was a Core 2 Quad, and it was indeed much faster than my Athlon 64.

The Core 2 and all its sequels were how Intel dominated over AMD for about a decade until AMD Zen came along.

reply
Sponge5
2 days ago
[-]
My takeaway from the article is that Itanium could have been the equivalent of Apple's switch to M1 if Intel doubled down instead of panicking.
reply
deaddodo
2 days ago
[-]
Nitpick: The author states that removal of 16-bit in Windows 64 was a design decision and not a technical one. That’s not quite true.

When AMD64 is in one of the 64-bit modes, long mode (true 64-bit) or compatibility mode (64-bit with 32-bit compatibility), you can not execute 16-bit code. There are tricks to make it happen, but they all require switching the CPU mode, which is insecure and can cause problems in complex execution environments (such as an OS).

If Microsoft (or Linux, Apple, etc) wanted to support 16-bit code in their 64-bit OSes, they would have had to create an emulator+VM (such as OTVDM/WineVDM) or make costly hacks to the OS.

reply
jcranmer
2 days ago
[-]
I've written code to call 16-bit code from 64-bit code that works on Linux (because that's the only OS where I know the syscall to modify the LDT).

It's actually no harder to call 16-bit code from 64-bit code than it is to call 32-bit code from 64-bit code... you just need to do a far return (the reverse direction is harder because of stack alignment issues). The main difference between 32-bit and 16-bit is that OS's support 32-bit code by having a GDT entry for 32-bit code, whereas you have to go and support an LDT to do 16-bit code, and from what I can tell, Windows decided to drop support for LDTs with the move to 64-bit.

The other difficulty (if I've got my details correct) is that returning from an interrupt into 16-bit code is extremely difficult to do correctly and atomically, in a way that isn't a problem for 32-bit or 64-bit code.

reply
deaddodo
2 days ago
[-]
Executing 16-bit code in Compatibility Mode (not Long Mode) is possible, that's not the problem. The problem is lack of V86 allowing legacy code to run. So Real Mode code is out wholesale (a sizable chunk of legacy software) and segmented memory is out in Protected Mode (nearly the totality of remaining 16-bit code).

So yes, you can write/run 16-bit code in 64-bit Compatibility Mode. You can't execute existing 16-bit software in 64-bit Compatibility Mode. The former is a neat trick, the latter is what people actually expect "16-bit compatibility" to mean.

reply
jcranmer
2 days ago
[-]
> segmented memory is out in Protected Mode (nearly the totality of remaining 16-bit code).

No, segmented memory is exactly what you can get working. You set up the segments via the LDT, which is still supported even in 64-bit mode; this is how Wine is able to execute Win16 code on 64-bit Linux. (Reading Wine code is how I figured out how to execute 16-bit code from 64-bit code in the first place!)

What doesn't work, if my memory serves me correctly, is all the call gate and task gate stuff. Which is effectively building blocks for an OS kernel that everyone tossed out in the early 90s and instead went with kernel-mode and user-mode with the syscalls (first software interrupts and then the actual syscall instruction in x86-64). You don't need any of that stuff to run most 16-bit code, you just need to emulate the standard Windows DLLs like kernel, ntdll, and user.

reply
deaddodo
2 days ago
[-]
Neither the AMD nor Intel TRMs agree with you. Both confirm that, even with LDTs, segments will not function with the legacy 16-bit wraparound; nor can you run 16-bit code segments. Fairly critical for most 16-bit software. Not to mention another critical incompatibility issue (for some software) that you yourself pointed out: far pointers.

And again, that only covers protected mode software, it doesn’t even touch the sheer cliff that is Real Mode (gating issues, for instance).

You wrote 16-bit code with knowledge of the limits imposed by Long Mode. Congrats. Too bad none of the thousands of pieces of software written in the 80s and 90s had that hindsight, so didn’t. The conversation is about running legacy code, not your/bespoke code.

reply
jcranmer
2 days ago
[-]
I'm successfully running code that was written 10 years before x86-64 was invented on Linux x86-64. You can, too--just find some Win16 software online somewhere and run it under wine, and confirm that it's running without any hardware emulation mode enabled.

And yes, this code is using far pointers.

Do you need me to post my code that loads and executes a NE executable to believe that it's possible?

FWIW, here's what the Intel manual says about running 16-bit code in IA-32e mode:

> In IA-32e mode, the processor supports two sub-modes: compatibility mode and 64-bit mode. 64-bit mode provides 64-bit linear addressing and support for physical address space larger than 64 GBytes. Compatibility mode allows most legacy protected-mode applications to run unchanged.

> In IA-32e mode of Intel 64 architecture, the effects of segmentation depend on whether the processor is running in compatibility mode or 64-bit mode. In compatibility mode, segmentation functions just as it does using legacy 16-bit or 32-bit protected mode semantics.

Those don't sound to me like statements saying that there's no way to get 16-bit legacy applications running on 64-bit mode. Quite the contrary, they're saying that you should expect them to work largely the same.

What does the Intel manual say is actually broken in compatibility mode? This:

> Compatibility mode permits most legacy 16-bit and 32-bit applications to run without re-compilation under a 64-bit operating system. [...] Compatibility mode also supports all of the privilege levels that are supported in 64-bit and protected modes. Legacy applications that run in Virtual 8086 mode or use hardware task management will not work in this mode.

reply
deaddodo
2 days ago
[-]
Funny you gloss over the most important part:

> Legacy applications that run in Virtual 8086 mode or use hardware task management will not work in this mode.

That being said, this isn't worth arguing over. If you can provably run late 1980s-early 1990s 16-bit code in AMD64 compatibility mode, with full execution protections and support of even most not all commercial software; that goes against the general understanding of those architectures. Document it and add it to the academic sphere to expand the knowledge.

reply
jcranmer
2 days ago
[-]
I can see that DOS applications might require Virtual 8086 mode, but from what I can tell, almost no Win16 applications require it nor require hardware task management (I'm not even sure I'm aware of any application that uses hardware task management!).

Granted, Win16 is not an especially long period of active application development, but I would expect that the vast majority of Win16 applications would work perfectly fine in x86-64 compatibility mode. I know that the ones I have played with do.

reply
Agingcoder
1 day ago
[-]
I love your answer.

It’s honest, and actually typical of this kind of problem space where what’s possible exceeds what was expected.

It reminds me of what coders did with older atari st machines to achieve overscan and a whole bunch of other tricks. The manuals explicitly stated that some things were not possible or would cause machine resets … except that there was a way.

reply
Animats
2 days ago
[-]
It's not so much running 16 bit code, but running something that wants to run on bare metal, i.e. DOS programs that access hardware directly. Maintaining the DOS virtualization box well into the 21st century probably wasn't worth it.

> The 64-bit builds of Windows weren’t available immediately.

There was a year or so between the release of AMD-64 and the first shipping Microsoft OS that supported it.[1] It was rumored that Intel didn't want Microsoft to support AMD-64 until Intel had compatible hardware. Anyone know? Meanwhile, Linux for AMD-64 was shipping, which meant Linux was getting more market share in data centers.[1]

reply
EvanAnderson
2 days ago
[-]
Microsoft has just such an emulator. Via Windows source code leaks the NTVDM (Virtual DOS Machine) from 32-bit Windows versions has been built for 64-bit Windows targets[0].

I don't understand why Microsoft chose to kill it. That's not in their character re: backwards compatibility.

[0] https://github.com/leecher1337/ntvdmx64

Edit: Some nice discussion about the NTVDMx64 when it was released: https://www.vogons.org/viewtopic.php?t=48443

reply
deaddodo
2 days ago
[-]
NTVDM requires Virtual 8086 mode in the processor. This doesn't exist in the 64-bit modes, requiring a software emulator. That is why OTVDM/WineVDM exist.

You can see all of this explained in the README for the very project you linked:

```

How does it work?

=================

I never thought that it would be possible at all, as NTVDM on Win32 uses V86 mode of the CPU for fast code execution which isn't available in x64 long mode. However I stumbled upon the leaked Windows NT 4 sourcecode and the guys from OpenNT not only released the source but also patched it and included all required build tools so that it can be compiled without installing anything but their installation package. The code was a pure goldmine and I was curious how the NTVDM works.

It seems that Microsoft bought the SoftPC solution from Insignia, a company that specialised in DOS-Emulators for UNIX-Systems. I found out that it also existed on MIPS, PPC and ALPHA Builds of Windows NT 4 which obviously don't have a V86 mode available like Intel x86 has. It turned out that Insignia shipped SoftPC with a complete emulated C-CPU which also got used by Microsoft for MIPS, PPC and ALPHA-Builds.

```

As to why they didn't continue with that solution, because they didn't want to rely on SoftPC anymore or take on development themselves for a minuscule portion of users who would probably just use 32-bit Windows anyways.

reply
EvanAnderson
2 days ago
[-]
Yeah. Like I said, Microsoft had the emulator. NTVDM on x64 is handled just like MIPS or Alpha, by using the SoftPC emulator. It's just a new CPU architecture.

They had a proven and tested emulator yet they chose not to build it for the new x64 CPU architecture. It turns out that it wasn't too hard to build for the new architecture either. That's the crux of my confusion.

It's not like SoftPC was new and unproven code. It doesn't feel like it would have been a major endeavor to keep supporting it.

Obviously, I don't know Microsoft's telemetry told them re: the number of 16-bit application users. I know it impacted a number of my Customers (some of whom are running DOSBox today to keep old fit-for-purpose software working) and I don't support a ton of offices or people.

It seems out of character for Microsoft to make their Customers throw away software.

reply
deaddodo
2 days ago
[-]
Cool, if you read to the last paragraph you would see that I also addressed that point.

They weren’t making them throw away software. Even at the inception of 64-bit Windows, 16-bit software made up a fraction of a percentile of use cases. They continued to support 32-bit Windows for almost two decades later for people that needed 16-bit software. At which point it was a fraction of a fraction of a fraction of users.

Of course it was dropped...30 years later.

reply
cesarb
2 days ago
[-]
> I don't understand why Microsoft chose to kill it.

My personal suspicion: it's about handles.

Several kinds of objects in the Windows API are identified by global handles (for instance, HWND for a window), and on 16-bit Windows, these handles are limited to 16 bits (though I vaguely recall reading somewhere that they're actually limited to 15 bits). Not having the possibility of a 16-bit Windows process would allow them to increase the global limit on the number of handles (keeping in mind that controls like buttons are actually nested windows, so it's not just one window handle for each top-level window).

reply
ack_complete
2 days ago
[-]
No need for suspicion, the documentation confirms this was a factor:

https://learn.microsoft.com/en-us/windows/win32/winprog64/ru...

reply
aljgz
2 days ago
[-]
You go to a small shop recommended by a friend, he convinces you to get AMD despite Intel still being the reigning default. You get it home, doing a little research you realize the CPU is the best performance per price in the recent CPUs. Now you know you trusted the right person
reply
_blk
2 days ago
[-]
Good article. I remember being very skeptical of Athlon because the K6 I owned before was subjectively muss less stable than any Intel I had used until then. So felt it was only a question of time until IA64 would establish itself. Since, after all, Intel had the power to buy itself into a leader position. That feeling that AMD isn't quite as stable never really left until a few years ago, where with Spectre, I then thought that Intel was now playing catch-up with mobile-phone-like tactics rather that being design-superior.

Now again, Intel had a great opportunity with Xe but it feels like they just can't get their horsepower transferred onto the road. Not bad by any means, but something's just lacking.

Meanwhile, Qualcomm is announcing it's snapdragon X2 .. if only they could bring themselves to ensuring proper Linux support ..

reply
aurizon
2 days ago
[-]
How many feet does Intel actually have? It seems as if they have shot themselves in 4 or 5 - is it any wonder they can hardly walk?
reply
PaulKeeble
2 days ago
[-]
They have also made a lot of successful products and come backs. While the Pentium 4 lost out to the Athlon's and their marketshare dropped they then released the Core series of CPUs and the Core 2 Duo was a huge hit and marked the beginning of the dark ages for AMD until they released Ryzen.

As a company they have had long periods of dominance potted with big losses to AMD on the CPU front which they always claw back. They seem this time to be taken out by their inability to get their fabs online more than anything their competitor is doing.

reply
panick21_
2 days ago
[-]
AMD was beating the on performance before Athlon and Athlon 64 made it simply clear to everybody.

Intel spent literally 8 years and many, many billions and billions of $ to do everything possible to prevent AMD from getting volume.

The had so much production capacity and AMD so little, that they basically had the ability to pay every single large OEM not to use AMD. If you as company used AMD, you would instantly lose billions of $, you would be the last Intel costumer served, you wouldn't get the new chips early on and potentially much more. OEM were terrified of Intel. Because Intel and Microsoft were so dominate OEMs made terrible margin, and Intel could basically crush them. Intel used to joke that OMEs were their distributes nothing more.

This was to the point where AMD offered free chips to people and they refused it.

AMD had a long period of time where they had better product, but the couldn't sustaining investing in better products and fighting so many legal battles. And the regulators around the world took to long and were to soft on Intel.

Intel in the 80s invested big in memory, and got crushed by Japan. They invested big into the iAPX 186 and got crushed, it was horrible product. Luckily they were saved by the PC and were then able to have exclusivity on the back of the i386.

By the late 90s AMD was better then them and that persisted for almost 10 years. And then they took the lead for for about 8 years and then lost it. And they didn't lose it because of the fabs I don't think. When they lost on the fabs they just fell further behind.

Its really the late 80s and 90s gigantic PC boom that gave them the crazy manufacturing and market lead that AMD was not able to overcome the 10 years after that.

reply
panick21_
2 days ago
[-]
Answering myself. For those interested in AMD fall late 2010, Jim Kellers Computer History Museum Oral history is really great on what the issues with AMD were at the time, and how they turned it around.

Interestingly, he said he didn't really bring in many new people, AMD had great people, and it was more a matter of reorganizing and refocusing on the right things.

reply
sh-run
2 days ago
[-]
I might be misremembering, but the initial core series (core 2 duo/quad) was still a bit behind AMD's Phenom line. Core was definitely better than the old netburst architecture, but I don't really remember Intel regaining dominance until the core i series/AMD FX.

This was also like high school/college for me, so I could be way off.

reply
wbl
2 days ago
[-]
Chiplets were a great move that kept yields up on aggressive process shrinks and prices low.
reply
wmf
2 days ago
[-]
When you have 90% market share you can afford to make a lot of mistakes.
reply
miladyincontrol
2 days ago
[-]
How AMD turned the tables on Intel? It always felt more like a tale of how Intel turned their back on x86.
reply
speed_spread
2 days ago
[-]
At least with Itanium Intel was trying something fresh. In comparison, the Pentium 4 arch was extra bad because it had a very long pipeline to achieve high core frequencies. Branch mispredictions were thus very costly. And it was soon obvious that the process wouldn't scale much above 3Ghz without wasting humongous amounts of power, defeating the long pipeline's purpose.
reply
txdv
2 days ago
[-]
I remember my Athlon 64 machine.

The last one to run Windows XP.

reply
nrb
2 days ago
[-]
Core memories for me were my pc builds for the Athlon Thunderbird and later the Athlon 64 FX-60. What an experience it was to fire those machines up and feel the absolutely gigantic performance improvements.
reply
whalesalad
2 days ago
[-]
I had a Soltek socket 754 build with chrome OCZ memory and a 9800 pro that was flashed to XT. I loved that the motherboard was black/purple.

Makes me want to play need for speed underground and drink some bawls energy

reply
bombcar
2 days ago
[-]
Youngsters today don't remember it; x86 was fucking dead according to the press; it really wasn't until Athlon 64 came out (which gave a huge bump to Linux as it was one of the first OSes to fully support it - one of the reasons I went to Gentoo early on was to get that sweet 64 bit compilation!) that everyone started to admit the Itanium was a turd.

The key to the whole thing was that it was a great 32 bit processor; the 64 bit stuff was gravy for many, later.

Apple did something similar with its CPU changes - now three - they only swap when the old software runs better on the new chip even if emulated than it did on the old.

AMD64 was also well thought out; it wasn't just a simple "have two more bytes" slapped on 32 bit. Doubling the number of general purpose registers was noticeable - you took a performance hit going to 64 bit early on because all the memory addresses were wider, but the extra registers usually more than made up for it.

This is also where the NX bit entered.

reply
golddust-gecko
2 days ago
[-]
100% -- the conventional wisdom was that the x86 architecture was too riddled with legacy and complexity to improve its performance, and was a dead end.

Itanium never met an exotic computer architecture journal article that it didn't try and incorporate. Initially this was viewed as "wow such amazing VLIW magic will obviously dominate" and subsequently as "this complexity makes it hard to write a good compiler for, and the performance benefit just doesn't justify it."

Intel had to respond to AMD with their "x86-64" copy, though it really didn't want to.

Eventually it became obvious that the amd64/x64/x86-64 chips were going to exceed Itanium in performance, and with the massive momentum of legacy on its side and Itanium was toast.

reply
Animats
2 days ago
[-]
Back in that era I went to an EE380 talk at Stanford where the people from HP trying to do a compiler for Itanium spoke. It the project wasn't going well at all. Itanium is an explicit-parallelism superscalar machine. The compiler has to figure out what operations to do in parallel. Most superscalar machines do that during execution. Instruction ordering and packing turned out to be a hard numerical optimization problem. The compiler developers sounded very discouraged.

It's amazing that retirement units, the part of a superscalar CPU that puts everything back together as the parallel operations finish, not only work but don't slow things down. The Pentium Pro head designer had about 3,000 engineers working at peak, which indicates how hard this is. But it all worked, and that became the architecture of the future.

This was around the time that RISC was a big thing. Simplify the CPU, let the compiler do the heavy lifting, have lots of registers, make all instructions the same size, and do one instruction per clock. That's pure RISC. Sun's SPARC is an expression of that approach. (So is a CRAY-1, which is a large but simple supercomputer with 64 of everything.) RISC, or something like it, seemed the way to go faster. Hence Itanium. Plus, it had lots of new patented technology, so Intel could finally avoid being cloned.

Superscalars can get more than one instruction per clock, at the cost of insane CPU complexity. Superscalar RISC machines are possible, but they lose the simplicity of RISC. Making all instructions the same size increases the memory bandwidth the CPU needs. That's where RISC lost out over x86 extensions. x86 is a terse notation.

So we ended up with most of the world still running on an instruction set based on the one Harry Pyle designed when he was an undergrad at Case in 1969.

reply
jerf
2 days ago
[-]
If I am remembering correctly, this was also a good time to be in Linux. Since the Linux world operated on source code rather than binary blobs, it was easier to convert software to run 64-bit native. Non-trivial in an age of C, but still much easier than the commercial world. I had a much more native 64-bit system running a couple of years before it was practical in the Windows world.
reply
wmf
2 days ago
[-]
Linux for Alpha probably deserves some credit for getting everything 64-bit-ready years before x86-64 came out.
reply
jabl
2 days ago
[-]
Well, in the sense of Alpha being the first 64-bit Linux port, and thus having to fix a lot of places where "bitness" assumptions had crept into the codebase.

DEC (Compaq?) had some plans to make cheaper Alpha workstations, and while they managed to drive down the price somewhat, the volumes were never there to make them price-competitive with PC's. (See also the Talos Raptor POWER machines..)

reply
p_l
2 days ago
[-]
EV6 CPUs could ostensibly use the same chipsets etc. as Athlon (in fact, some Alpha motherboards used Athlon chipsets). That was part of the strategy to increase volume.

Then came Compaq and its love for intel.

reply
MangoToupe
2 days ago
[-]
It also helps that linux had a much better 32-bit compatibility than windows did. Not sure why but it probably has something to do with legacy support windows shed moving to 64-bits.
reply
hylaride
2 days ago
[-]
Linux was natively written for 32-bit CPUs, so they had no legacy cruft or software to support. IIRC, the first 64 bit port of linux (I think to Alpha?) exposed a lot of code that needed to be rewritten as it assumed 32-bit and/or x86 specifics.
reply
jacquesm
2 days ago
[-]
Up until Athlon your best bet for a 64 bit system was a DEC Alpha running RedHat. Amazing levels of performance for a manageable amount of money.
reply
drob518
2 days ago
[-]
Itanium wasn’t a turd. It was just not compatible with x86. And that was enough to sink it.
reply
kstrauser
2 days ago
[-]
It absolutely was. It was possible, hypothetically, to write a chunk of code that ran very fast. There were any number of very small bits of high-profile code which did this. However, it was impossible to make general-purpose, not-manually-tuned code run fast on it. Itanium placed demands on compiler technology that simple didn't exist, and probably still don't.

Basically, you could write some tuned assembly that would run fast on one specific Itanium CPU release by optimizing for its exact number of execution units, etc. It was not possible to run `./configure && make && make install` for anything not designed with that level of care and end up with a binary that didn't run like frozen molasses.

I had to manage one of these pigs in a build farm. On paper, it should've been one of the more powerful servers we owned. In practice, the Athlon servers were several times faster at any general purpose workloads.

reply
hawflakes
2 days ago
[-]
Itanium was compatible with x86. In fact, it booted into x86 mode. Merced, the first implementation had a part of the chip called the IVE, Intel Value Engine, that implemented x86 very slowly.

You would boot in x86 mode and run some code to switch to ia64 mode.

HP saw the end of the road for their solo efforts on PA-RISC and Intel eyed the higher end market against SPARC, MIPS, POWER, and Alpha (hehe. all those caps) so they banded together to tackle the higher end.

But as AMD proved, you could win by scaling up instead of dropping an all-new architecture.

* worked at HP during the HP-Intel Highly Confidential project.

reply
philipkglass
2 days ago
[-]
I used it for numerical simulations and it was very fast there. But on my workstation many common programs like "grep" were slower than on my cheap Athlon machine. (Both were running Red Hat Linux at the time.) I don't know how much of that was a compiler problem and how much was an architecture problem; the Itanium numerical simulation code was built with Intel's own compiler but all the system utilities were built with GNU compilers.
reply
fooker
2 days ago
[-]
>Itanium wasn’t a turd

It required immense multi-year efforts from compiler teams to get passable performance with Itanium. And passable wasn't good enough.

reply
Joel_Mckay
2 days ago
[-]
The IA-64 architecture had too much granularity of control dropped into software. Thus, reliable compiler designs were much more difficult to build.

It wasn't a bad chip, but like Cell or modern Dojo tiles most people couldn't run it without understanding parallelism and core metastability.

amd64 wasn't initially perfect either, but was accessible for mere mortals. =3

reply
bombcar
2 days ago
[-]
Wasn't the only compiler that produced code worth anything for Itanium the paid one from Intel? I seem to recall complaining about it on the GCC lists.
reply
hajile
2 days ago
[-]
NOTHING produced good code for the original Itanium which is why they switched gears REALLY early on.

Intel first publicly mentioned Poulson all the way back in 2005 just FOUR years after the original chip was launched. Poulson was basically a traditional out-of-order CPU core that even had hyperthreading[0]. They knew really early on that the designs just weren't that good. This shouldn't have been a surprise to Intel as they'd already made a VLIW CPU in the 90s (i860) that failed spectacularly.

[0]https://www.realworldtech.com/poulson/

reply
speed_spread
2 days ago
[-]
Even the i860 found more usage as a specialized CPU than the Itanium. The original Nextcube had an optional video card that used an i860 dedicated to graphics.
reply
hawflakes
2 days ago
[-]
I lost track of it but HP, as co-architects, had its own compiler team working on it. I think SGI also had efforts to target ia64 as well. But the EPIC (Explicitly Parallel Instruction Computing) didn't really catch on. VLIW would need recompilation on each new chip but EPIC promised it would still run.

https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...

reply
nextos
2 days ago
[-]
Yes, SGI sold quite a lot of high-end IA-64 machines for HPCs, e.g. https://en.wikipedia.org/wiki/SGI_Altix
reply
fooker
2 days ago
[-]
In the compiler world, these HP compiler folks are leading compiler teams/orgs at ~all the tech companies now, while almost none of the Intel compiler people seem to be around.
reply
jabl
2 days ago
[-]
Are you sure about that? If my memory serves, a lot of the Intel compiler people were transferred from HP? At least in the Fortran world, the Fortran frontend for the Intel compiler traces it's lineage back to DEC Fortran (for VAX and later Alpha) -> Compaq Visual Fortran (for Windows) -> Intel Fortran.
reply
textlapse
2 days ago
[-]
I have worked next to an Itanium machine. It sounds like a helicopter - barely able to meet the performance requirements.

We have come a long way from that to arm64 and amd64 as the default.

reply
Joel_Mckay
2 days ago
[-]
The stripped down ARM 8/9 for AArch64 is good for a lot of use-cases, but most of the vendor specific ASIC advanced features were never enabled for reliability reasons.

ARM is certainly better than before, but could have been much better. =3

reply
eej71
2 days ago
[-]
Itanium was mostly a turd because it pushed so many optimization issues onto the compiler.
reply
CoastalCoder
2 days ago
[-]
IIRC, wasn't part of the issue that compile-time instruction scheduling was a poor match with speculative execution and/or hardware-based branch prediction?

I.e., the compiler had no access to information that's only revealed at runtime?

reply
duskwuff
2 days ago
[-]
Yes, absolutely. Itanium was designed with the expectation that memory speed/latency would keep pace with CPUs - it didn't.
reply
_flux
2 days ago
[-]
Could it have been a good target for e.g. Java JIT? It would be able to instrument the code at times, and then generate more optimal code for it?
reply
philipkglass
2 days ago
[-]
I think you may be right. It's hard for me to find then-contemporary benchmarks from 20 years ago, but this snarky Register article mentions it indirectly:

https://www.theregister.com/2004/01/27/have_a_reality_check_...

SPECjbb2000 (an important enterprise server benchmark): Itanic holds a slim (under 3%) lead over AMD64 at the 4-processor node size and another slim (under 4%) lead over POWER4+ at the 32-processor node size - hardly 'destroying' the competition, once again.

It was slightly faster than contemporary high-performance processors on Java. It was also really good at floating point performance. It was also significantly more expensive than AMD64 for server applications if you could scale your servers horizontally instead of vertically.

reply
Findecanor
2 days ago
[-]
The Itanium had some interesting ideas executed poorly. It was a bloated design by committee.

It should have been iterated on a bit before it was released to the world, but Intel was stressed by there being several 64-bit RISC-processors on the market already.

reply
cmrdporcupine
2 days ago
[-]
Itanium was pointless when Alpha existed already and was already getting market penetration in the high end market. Intel played disgusting corporate politics to kill it and then push the ugly failed Itanium to market, only to have to panic back to x86_64 later.

I have no idea how/why Intel got a second life after that, but they did. Which is a shame. A sane market would have punished them and we all would have moved on.

reply
dessimus
2 days ago
[-]
> I have no idea how/why Intel got a second life after that, but they did.

For the same reason the line "No one ever got fired for buying IBM." exists. Buying AMD at large companies was seen as a gamble that deciders weren't will to make. Even now, if you just call up your account managers at Dell, HP, or Lenovo asking for servers or PCs, they are going to quote you Intel builds unless you specifically ask. I don't think I've ever been asked by my sales reps if I wanted an Intel or AMD CPU. Just how many slots/cores, etc.

reply
bombcar
2 days ago
[-]
The Intel chipsets were phenomenally stable; the AMD ones were always plagued by weird issues.
reply
toast0
2 days ago
[-]
Historically, when Intel is on their game, they have great products, and better than most support for OEMs and integrators. They're also very effective at marketting and arm twisting.

The arm twisting gets them through rough times like itanium and pentium4 + rambus, etc. I still think they can recover from the 10nm fab problems, even though they're taking their sweet time.

reply
j_not_j
2 days ago
[-]
Alpha had a lot of implementation problems, e.g. floating point exceptions with untraceable execution paths.

Cray tried to build the T3E (iirc) out of Alphas. DEC bragged how good Alpha was for parallel computing, big memory etc etc.

But Cray publicly denounced Alpha as unusable for parallel processing (the T3E was a bunch of Alphas in some kind of NUMA shared memory.) It was so difficult to make the chips work together.

This was in the Cray Connect or some such glossy publication. Wish I'd kept a copy.

Plus of course the usual DEC marketing incompetence. They feared Alpha undoing their large expensive machine momentum. Small workstation boxes significantly faster than big iron.

reply
jabl
2 days ago
[-]
The Cray T3D and T3E used Alpha processors. But it wasn't really shared memory, each node with 1/(2?) CPU's ran it's own lightweight OS kernel. There were some libraries built on top of it (SHMEM) that sort-of made it look a bit like shared memory, but not really. Mostly it was a machine for running MPI applications.

A decade or so later on, they more or less recreated the architecture but this time with 64-bit Opteron CPU's in the form of the 'Red Storm' supercomputer for Sandia. Which then became commercially available as the XT3. And later XT4/5/6.

reply
p_l
2 days ago
[-]
Part of the issue was also that it was Cray's first proper MPP system, after being very much against MPP designs in the past.
reply
panick21_
2 days ago
[-]
Gordon Moore tried to link up with Intel when he was at DEC. Alpha would have become Intels 64 bit architecture. This of course didn't happen and Intel instead linked up with DEC biggest competitor HP, and adopted their, much, much worse VLIW architecture.

Imagine a future where Intel and Apple both adopt DEC and Alpha instead of Intel HP and Apple IBM.

reply
loloquwowndueo
2 days ago
[-]
“Sane market” sounds like an oxymoron, technology markets have multiple failed attempts at doing the sane thing.
reply
bombcar
2 days ago
[-]
IIRC it didn't even do great against POWER and other bespoke OS/Chip combos, though it did way better there than generic x86.
reply
p_l
2 days ago
[-]
With ex-Digital customers running OpenVMS, they held onto last-generation Alpha machines because they were substantially faster than the new Itanium ones in all practical uses. This is also why HP was finally forced to resurrect the nearly-complete EV7 chips
reply
jcranmer
2 days ago
[-]
I acquired a copy of the Itanium manuals, and in flicking through it, you can barely get through a page before going "you did WHAT?" over some feature.
reply
tptacek
2 days ago
[-]
Example example example example must see examples!
reply
jcranmer
2 days ago
[-]
Some of the examples:

* Itanium has register windows.

* Itanium has register rotations, so that you can modulo-schedule a loop.

* Itanium has so many registers that a context switch is going to involve spilling several KB of memory.

* The main registers have "Not-a-Thing" values to be able to handle things like speculative loads that would have trapped. Handling this for register spills (or context switches!) appears to be "fun."

* It's a bi-endian architecture.

* The way you pack instructions in the EPIC encoding is... fun.

* The rules of how you can execute instructions mean that you kind of have branch delay slots, but not really.

* There are four floating-point environments because why not.

* Also, Itanium is predicated.

* The hints, oh god the hints. It feels like every time someone came up with an idea for a hint that might be useful to the processor, it was thrown in there. How is a compiler supposed to be able to generate all of these hints?

* It's an architecture that's complicated enough that you need to handwrite assembly to get good performance, but the assembly has enough arcane rules that handwriting assembly is unnecessarily difficult.

reply
tptacek
2 days ago
[-]
I am not disappointed. Having-but-not-really-having delay slots is my favorite thing here. Thank you, by the way!
reply
tverbeure
2 days ago
[-]
About this part:

> In 2004, Intel wrote off the Itanium and cloned AMD64.

AMD introduced x86-64 in 2003. You don't just clone an ISA (even if based on AMD documents), design it, fab it etc. in a year or two. Intel must have been working on this well before AMD introduced the Athlon64.

reply
mjg59
2 days ago
[-]
The ISA was published in 2000, there was plenty of time to start working on an implementation before AMD shipped actual product.
reply
tverbeure
2 days ago
[-]
Thanks! I didn’t know AMD published it that early, but that makes much more sense then Intel “cloning” it as a reactive move to Athlon64 having it.

(Though you could certainly make the case that it was reactive move by Intel marketing to enable it.)

reply
jmyeet
2 days ago
[-]
This is a pretty bad recounting of history. Just from memory I can recall more of this and some missing details are important.

First you have to know that Intel licensed the instruction sets to AMD and Cyrix (and possibly others?) in the 1990s. If you were around at that time, you could buy Cyrix 486dx2/66, 486dx4/100, 486dx4/133 and other CPUs that were really first to operate at a multiple of clock speed. Earlier CPUs didn't do this. But these deals were two-way, meaning Intel had the right to use any x86 extensions other manufacturers created;

2. Intel didn't like this. They'd also lost a trademark dispute over 486 where USPTO said you couldn't trademark a number. This was entirely the reason the Pentium was called the Pentium and not the 586. Intel didn't want to share. The instruction set cross-licensing was another issue;

3. Because of this, Intel wanted to go 64 bit from scratch. You have to remember that at this time the whole CISC vs RISC debate was unsettled. There were a variety of RISC UNIX servers and workstations from companies like SGI, Sun, HP, DEC, etc. Intel wanted to compete in this space. So they partnered with HP and came up with EPIC as the architecture name. The first CPU was Merced and it was meant to be released in 1996 (IIRC) but it was years late;

4. Intel thought their market dominance could drive the market. Obviously this would leave AMD (Cyrix was out by this point) in the cold. So AMD came out with the x86_64 extensions for 64 bit support and Athlon was born;

5. Oh, additionally in the 90s we had the (initially) Megahertz but later Gigahertz race between Intel and AMD. This is because clock speed became a marketing point. It was stupid because it ignored IPC (instructions per clock) but consumers responded to it;

6. So Intel's moved from the Pentium 3 to the Netburst architecture of the Pentium 4, which was designed to hit high clock speeds. You have to remember that even in the late 90s a lot of people thought clock speeds would keep going up to 10GHz. Anyway, Intel "won" this Gigahertz race with the Pentium 4 but lost the war as I'll explain;

7. So in the early 2000s, Intel needed a solution for laptops. They came up with the Centrino platform. I think this was the first laptop where Wifi was a first-class citizen. Anyway, Centrino was wildly successful against any competitors, so much so that people tried to make desktops out of it but it was really hard to acquire the parts;

8. So AMD took the easy route and released the Athlon, which was widly successful and with Intel facing ever-longer delays on EPIC was in a bind. They were forced to respond. They adopted x86_64 and repurposed the Centrino platfrom to create the Core Duo and then Core 2 Duo chips for desktop. To this day, the heritage of the Intel Core CPUs can trace its lineage back to the Pentium 3;

9. AMD further complicated Intel's position by releasing server chips. This is what the Opteron was. And this became a huge problem for Intel. EPIC chips were wildly expensive and, even worse, it required basically a rewrite of all software from the OS level up, compilers included. For several years, Opteron really ate Intel's lunch with Opteron.

10. By 2010 or so Intel had cancelled EPIC and regained their group on server-grade chips (ie Xeons) and AMD's Athlon and Opteron had begun to fade. So Intel had basically won but, don't worry, the 10nm white whale was just over the horizon.

I guess my point is that the Athlon can't be viewed or judged in isolation without considering EPIC, Intel's cross-licensing deals, the Gigahertz race, x86_64 and the Pentium 3/4.

reply
chasil
2 days ago
[-]
A key element was AMD's Barcelona, which was a quad-core design that had TLB problems and failed in the field.

Intel just wired together multiple dual cores in several generations of their CPUs.

AMD should have had this as a contingency. They are doing this same thing now with chiplets.

https://en.wikipedia.org/wiki/AMD_10h#TLB_bug

reply
panick21_
2 days ago
[-]
Intel and trying to kill their most successful product name a better duo.

When amd64 came out, Sun should have started to migrate out of SPARC.

Ironically it is Itanium that killed of most of the RISC competition, but its the Athlon that actually delivered on that killing blow.

reply
flanked-evergl
2 days ago
[-]
I still worked with SPARC as recently as 10 years ago. Horrible CPUs. Price for performance wise, a SPARC CPU with Solaris was significantly worse than a Xeon processor with Linux, probably by at least one order of magnitude if not more. Amazing how Sun managed to get people to pay for that garbage.
reply
panick21_
2 days ago
[-]
Path dependency has long been the savior of under-performing companies. The amount of coasting all the companies from IBM down did once they had an install-base is crazy. Sun just couldn't do it in the long term because the software that ran on Solaris was to easy to move to Linux and arguably their install base was to small.
reply
GartzenDeHaes
2 days ago
[-]
> they wouldn’t have to worry about competing CPU designs, at least not for a very long time.

US Government sales require two vendors, which I think is why AMD had x86 licenses in the first place.

reply
colejohnson66
2 days ago
[-]
IBM*. They approached Intel for the 8088 for the 5150, but said "We want a second source". So Intel reeled in AMD. Second sourcing at the time was pretty common.
reply
Agingcoder
2 days ago
[-]
For the first time in a long time I don’t feel that old on hn ( and I do HPC so have a strong interest in the topic ) :-)

I’m really enjoying the discussion here - thanks everyone.

reply
ksec
2 days ago
[-]
Since most of the patents have expired, I wonder if AMD could open source their baseline x86 ISA as AE86.
reply
sciencesama
2 days ago
[-]
Will there be A 128bit revolution coming soon ?
reply
seabrookmx
2 days ago
[-]
Yes, it's called IPV6 :)
reply
floxy
2 days ago
[-]
reply
tliltocatl
2 days ago
[-]
Not until we have technology for exabyte-scale memories (read: not any time soon).
reply
nick__m
2 days ago
[-]
Having 128bit in the adress bus is useless, no questions there !

But what about pointers provenance, tagging and capability. Having more bits would be useful to implement something like CHERI.

reply
Avalaxy
2 days ago
[-]
Why would there be?
reply
tommica
2 days ago
[-]
Great little article! Intel/AMD fight has been an interesting one all these years
reply
9front
2 days ago
[-]
The Itanium was a new 64bit architecture. AMD64 is just addition to the 32bit Intel architecture. Itanium didn't make it, so we're stuck with backward compatibility all the way to 8080 in today's x86 processors. That's all in the past! What I'm looking forward is to the future SoC releases with Intel cores and Nvidia graphics.
reply
chasil
2 days ago
[-]
Actually, AArch64 appears to be preferred by many.

Apple has discarded all 32-bit legacy, implementing only 64-bit in their equipment to great success.

Fujitsu did the same with their supercomputer that was the best-performing in the world for a time.

Had Intel bought ARM, then espoused their architecture in the age of the Athlon, perhaps things would have been very different.

reply
hawflakes
2 days ago
[-]
Funny enough when Intel and DEC settled their lawsuit Intel got StrongARM[1] from DEC which was pretty fast for its time. It was a pretty cool, literally, chip that didn't need a heatsink. I had a Shark set-top appliance prototype. The offical name was DNARD — Digital Network Appliance Reference Design.

[1] https://en.wikipedia.org/wiki/StrongARM [2] https://collection.maynardhistory.org/items/show/8946

reply
fluoridation
2 days ago
[-]
>What I'm looking forward is to the future SoC releases with Intel cores and Nvidia graphics.

As far as I know those are still going to be x86s, only with Nvidia dies tacked on.

reply