The Intel solution was also not 3D stacked. It's a little like having an HBM stack next to the chip as a cache.
From the article, it was actually a separate die/chiplets,
> Broadwell implemented its L4 cache on a separate 77mm2 die, creating a chiplet configuration. This cache die was codenamed “Crystal Well”, and was fabricated using the older 22nm process.
A lot of interesting details in article about how widely different this dram is, made to go fast fast fast. Fun read.
I'd really wanted a system with Crystal Well, seemed so cool. A lot of macs seemed to have the Intel Iris Pro models that had it. But general adoption in the PC market was - I feel - quite poor.
They also made them for desktop in I5, I7 and Xeon form.
On mainframes, z14's drawer controller (that controlled four CPU sockets each) had a huge amount of eDRAM acting as an L4 cache for all cores in that drawer.
Modern DRAM chips are manufactured in an entirely different way on a very different process than logic chips, and the manufacturing processes are incompatible. eDRAM was a very different implementation of DRAM that could be manufactured on a logic process.
The difference between eDRAM and DRAM is not just that eDRAM is closer, it was also typically dramatically faster, but also had shorter retention period requiring more frequent refresh.
The EDRAM I'm familiar with, by a company called Ramtron and later Enhanced Memory Systems, seems to be largely lost to history. It's discussed in this relatively recent presentation, see slide 16 onward: https://site.ieee.org/pikespeak/files/2020/08/Silcon-Mountai...
I remember reading about IBM's usage of eDRAM for cache when they first used it. Their analysis showed that for their server workloads the number one thing was to keep the cores busy and that a lot of slower eDRAM worked better than a smaller amount of faster DRAM.
(I miss Tech Report.)
I think the main reason chips like that didn’t took off was marketing. Laptop OEMs tell consumers that if they want performant graphics they must buy laptops with discrete GPUs despite expensive, heavy, and discharge batteries faster.
The Xeon Max processors had up to 64GB of HBM that could act as memory or shadow external memory effectively acting like a huge L4 cache.
No Xeon 6 seems to have that feature, at least not for now. Xeon 6's top out at a paltry 504MBs of L3.