FilterHN

DDR4 Sdram – Initialization, Training and Calibration

104 points

by todsacerdoti

25 days ago

| past

| 5 comments

| systemverilog.io

| HN

▲

nsteel

22 days ago

[-]

Implementing DDR3 training for our packet queuing chip (custom memory controller) was my first project at work. We had originally hoped to use the same training params for all parts. That wasn't reliable even over a small number of testing systems in the chamber. DDR3 RAM parts were super cheap compared to what we had used in previous generations, and you get what you pay for with a huge amount of device variation. So we implemented a relatively long training process to be run on each device during our board testing, and saved those per-lane skews. But we found the effects of temperature, and particularly system noise, were too great once the system was sending full-rate traffic. (The training had to be done one interface at a time, with pedestrian data-rates). We then ended up with a quick re-training pass to re-center the eyes. It still wasn't perfect - slower ram chips (with smaller eyes) would report ECC correctables when all interfaces were doing worst-case patterns at temperature extremes. We spent a lot of time making those interfaces robust, and ended up relying more on ECC than we had intended. But those chips have been shipping ever since and will have seen traffic from most of us.

▲

bri3d

22 days ago

[-]

You played in hard mode in a weird sense; more modern DDR versions are in a backwards sense "easier" if you're buying the IP, because a lot of the training has moved to boot time and is handled by the vendor IP rather than needing to be run during burn-in using some proprietary toolkit or self-test tool.

It's just as arcane and weird, but if you buy one of the popular modern packages for DDR4/5 like DesignWare, more and more training is accomplished using opaque blob firmware (often ARC) loaded into an embedded calibration processor in the DDR controller itself at boot time rather than constants trained by your tooling or the vendor's.

▲

halifaxbeard

22 days ago

[-]

Wow, I was spoiled building firmware for my ARM boards then (building, not developing).

Marvell has a source available DDR driver that actually takes care of training on a few of their platforms! https://github.com/MarvellEmbeddedProcessors/mv-ddr-marvell

▲

nsteel

22 days ago

[-]

I don't know if this is still the case, but back then the likes of Synopsys charged a lot of money for what was very limited controller functionality; you were stuck with their frustrating support channels and generally dumpster fire firmware. Our controller was fully custom to our needs, supporting more optimum refresh schemes tightly integrated with our application, and multiple memory protocols (not just DDR3), and I don't remember what else.

At least we were able to modify the training algorithms and find the improvements, rather than being stuck with the usual vendor "works for us" response. Especially with something like commodity DDR, where our quantities don't command much clout. But it was a bit of an ordeal and may have contributed to us buying in a controller for our next gen (not DDRx). But I think we're going the other way again after that experience..!

▲

MisterTea

22 days ago

[-]

From my understanding, memory training is/was a closely held secret of memory makers and EDA IP houses who sold memory controller IP to all the chip vendors. This in turn makes fully open motherboard firmware almost impossible as no one can write code for memory training to bring up the chip. That piece of code has to be loaded as a blob - if you can get the blob.

▲

Aurornis

22 days ago

[-]

I think you're mixing different concepts. JEDEC doesn't define DDR4 training procedures so there isn't a secret that's being withheld. Everyone who implements a DDR4 controller has to develop and implement a training procedure to meet the specifications.

On a DDR4 motherboard the training would occur between the memory controller and the DDR4 RAM. The proprietary blob you need would include the communication with the memory controller and instructions to handle the training for that specific memory controller.

There are several open source DDR4 controllers in different states of usability. They have each had to develop their own implementations.

▲

bri3d

22 days ago

[-]

What you're saying is true, but the OP has a point too.

What's basically happening is that as things get faster the lifetime of training data decreases because the system becomes more sensitive to environmental conditions, so training procedures which were previously performed earlier in the manufacturing cycle are now delegated to the runtime, so the system migrates from data to code.

Previously, you or the vendor would provide tools and a calibration system which would infer some values and burn a calibration, and then load it during early boot. More recently, the runtime is usually a combination of a microcontroller and fixed-function blocks on the DDR PHY, and that microcontroller's firmware is usually supplied as a generic blob by the vendor. The role of this part of the system keeps growing. The system has gotten a bit more closed; it's increasingly moved from "use this magic tool to generate these magic values, or read the datasheets and make your own magic tool" to "load this thing and don't ask questions."

▲

Aurornis

22 days ago

[-]

The parent commenter was mixing two concepts together.

DDR4 training is not defined. It’s vendor-implemented.

If you want to work with a vendor’s memory controller chip, you need the documentation for that chip.

So the secret isn’t memory training (the topic of this article) it’s just proprietary chips on motherboards. Memory training is only one of many things that have to be reverse engineered or known for an open firmware.

▲

kvemkon

22 days ago

[-]

> environmental conditions

Shouldn't it be: no more negligible manufacturing / assembly tolerances instead? I mean, when I turn PC on, the temperature of all components is 20 C, the training is done at almost this temperature. But then the PC can work for months with much more higher memory controller and DRAM chips temperatures.

▲

nopurpose

22 days ago

[-]

I remember listening to Oxide & Friends (or it was On the Metal?) podcast few years ago and had an impression they wrote their own training code.

▲

p_l

22 days ago

[-]

It's a more available option on AMD chips, intel AFAIK kept it a secret blob.

Ultimately oxide got to run customised firmware deal and AFAIK even got custom PSP firmware

▲

Joel_Mckay

22 days ago

[-]

It is usually the IP licensing, as spinning a board isn't always complex.

Note, it is actually easier to profile a known dram chip set bonded to the PCB. A lot of products already do this like phones, tablets, and thin laptops.

Where as SSD drives being a wear item, should be removable by end users. =3

▲

varispeed

22 days ago

[-]

> as no one can write code for memory training to bring up the chip

Surely someone can do it, but it's probably too niche to do. The licensing fee is probably cheaper than corporation spinning the board and reverse engineer it and for hobbyists lower tier memory likely was fine.

That said given that such technology has become so much more accessible (you can certainly create FPGA board and wire it up to DDR4 using free tools and then get board made in China), it's probably a matter of time someone will figure this out.

▲

brcmthrowaway

22 days ago

[-]

Why do we need training?

▲

namibj

22 days ago

[-]

Because DDR3/4/5 dies are made to a price with half to three quarters of their IO pins shared between the dies in parallel on a rank of a channel, and for capacity often up to around 6 ranks per channel. E.g. high capacity server DDR4 memory, say on AMD SP3, may have 108 dies on each of 8 channels of a socket.

So if you can move complexity over to the controller you can spend 100:1 ratio in unit cost. So you get to make the memory dies very dumb by e.g. feeding a source synchronous sampling clock that's centered on writes and edge aligned on reads leaving the controller to have a DLL master/slave setup to center the clock at each data group of a channel and only retain a minimal integer PLL in the dies themselves.

▲

Neywiny

22 days ago

[-]

You need to train whether you're on one die or 100. It's about your per bit skew and PVT

▲

namibj

22 days ago

[-]

Yes, I was just pointing out why we choose to make the memory chips so fragile/dependant on the controller doing all the magic training for them.

▲

Neywiny

22 days ago

[-]

Well no, your comment answered that we need training because the dies are made cheaply. But no amount of money would prevent the need to train out static and dynamic delays. If it wasn't in the controller it'd be in the memory and the question of why it's needed would still be relevant.

▲

phendrenad2

22 days ago

[-]

Because when you change a PCB trace from 0 to 1 or 1 to 0, the slope of the signal as it changes from gnd to v+ (the signal voltage) or v+ to ground isn't perfect, and that slope is highly affected by the various pieces of metal and silicon and fiberglass that make up the board and the chips. The shape and topology of the PCB trace matters, as do slight imperfections in the solder, PCB material, the bond wires inside the chips, etc. These effectively create resistors/capacitors/inductors that the designer didn't intend, which effect the slope of the 0->1 1->0 changes. So for these high-speed signals, chip designers started adding parameters to tweak the signal in real-time, to compensate for these ill effects. Some parameters include a slight delay between the clock and data signals, to account for skew. Voltage adjustement to avoid ringing (changing v+). Adjusting the transistor bias to catch level transitions more accurately. Termination resistance adjustment, to dampen reflections. And on top of all that, some bits will still be lost but because these protocols are error-correcting, this is acceptable loss.

This is how people were able to send ethernet packets over barbed wire. Many bits are lost, but some get through, and it keeps trying until the checksums all pass.

▲

adrian_b

22 days ago

[-]

A large section of the article is dedicated to the answer for this question.

▲

juancn

22 days ago

[-]

Imprecision in manufacturing (adjust resistor values), different trace lengths (speed of light differences for parallel signals), etc... it's in the article.

▲

nvme0n1p1

22 days ago

[-]

It's in the link. https://www.systemverilog.io/design/ddr4-initialization-and-...

▲

numpad0

22 days ago

[-]

Because loading XMP profile won't suffice and they have to tune the parameters further to be able to actually run the sticks

▲

robotnikman

22 days ago

[-]

Wow, what goes on in a RAM module is so much more complex than I thought.

▲

v1ne

22 days ago

[-]

Does anybody else's fingers also tingle like this is written by an AI?

The formatting is strangely inconsistent, highlighting only some numbers and some variables in fixed-width font. Also there's odd statements like that the reference resistor keeps its value "at all temperatures", which is just not true. Other phrases like "poly-silicon resistor" are highlighted, and then not explained. All in all, I find this article to be quite a mess and not a clear explanation.

▲

MertsA

21 days ago

[-]

There are some typos like "if" instead of "of" that seem to imply at the very least, some of it is verbatim written by a person. Given the subject matter, I'd be extremely surprised if this was 100% AI but one thing I've totally done for similar technical writing is ask AI for help refining a rough draft. There's some suggestions I'd ignore but the larger grammatical and sentence structure suggestions I'd usually adopt.

▲

userbinator

22 days ago

[-]

I did some RE'ing of BIOS code back in the days of the first SDR SDRAM and the calibration part was reasonably straightforward; basically sweeping some controller registers through their ranges while doing repeated read/write operations with lots of transitions in the data (e.g. 0x55555555 <> 0xAAAAAAAA) to find the boundaries where errors occurred, and then choosing the middle of the range.

While the article does mention periodic calibration, I wonder if there are controllers which will automatically and continuously adapt the signal to keep the eye centered, like a PLL.