Cloudflare's Gen 13 servers: trading cache for cores for 2x performance
64 points
by wmf
3 days ago
| 7 comments
| blog.cloudflare.com
| HN
gdwatson
3 hours ago
[-]
I will confess to skimming by the end. But I don’t think they explained how they solved the cache issue except to say they rewrote the software in Rust, which is pretty vague.

Was all the code they rewrote originally in Lua? So was it just a matter of moving from a dynamic language with pointer-heavy data structures to a static language with value types and more control over memory layout? Or was there something else going on?

reply
jshier
1 hour ago
[-]
They posted about the Rust rewrite last year. https://blog.cloudflare.com/20-percent-internet-upgrade/
reply
zozbot234
2 hours ago
[-]
The gains in lower memory footprint and lower demands on memory bandwidth from rewriting stuff to Rust are very real, and they're going to matter a lot with DRAM prices being up 5x or more. It doesn't surprise me at all that they would be getting these results.
reply
alberth
2 hours ago
[-]
It seems like the unspoken takeaway is just how shockingly performant LuaJIT is, even relative to Rust.
reply
anitil
1 hour ago
[-]
They said they're replacing 15 years of Nginx+Lua, that's a testament to how good it can be.
reply
synack
1 hour ago
[-]
Is the Linux scheduler aware of shared CPU cache hierarchies? Is there any way we could make the scheduler do better cache utilization rather than pinning processes to cores or offloading these decisions to vendor specific code?
reply
AbuAssar
41 minutes ago
[-]
Epyc’s naming is beautiful and consistent
reply
attentive
2 hours ago
[-]
That was annoying to read because there is no easy to see impact of each change. It's FL2 + Gen 13 combined.

I.e. what's the FL2 benchmark on Gen 12 compared to FL1?

reply
trhway
2 hours ago
[-]
Reminds that time when cheap Celeron with small cache was beating expensive Pentium with large cache (if i remember correctly that Celeron's cache was running at the core frequency while Pentium's was a separate die on half-frequency, and Celeron was very overclockable)
reply
zozbot234
2 hours ago
[-]
Lower cache per core is actually a pretty natural outcome with the latest device fabrication nodes shrinking logic while leaving the size of SRAM largely unchanged. We may perhaps also see eDRAM (a lot denser than SRAM) for last-level caches.
reply
BoredPositron
2 hours ago
[-]
Pentium 4 the first release. AMD had the same gimmick with their Phenoms.
reply
danpalmer
3 hours ago
[-]
The tradeoff: The opportunity: Proving it out:

Nah I'm good thanks. Slop takes more effort to read and just raises questions of accuracy. It's just disrespectful to your reader to put that work on them. And in a marketing blog post it's just a bad idea.

reply
montyanne
2 hours ago
[-]
Cloudflare has excellent (human) technical writers. I don’t see any indication this is “slop”, it’s the standard in-the-weeds but understandable blog post they’ve been doing for years.

AI text is everywhere, but this isn’t it.

reply
refulgentis
1 hour ago
[-]
This is AI, but I can’t prove it, lol. :) The bulleted lists that are too short both in total list length and text per list item length. littles drama headers as parent noted.

To your point, this would register as “human bloviating for word count subtly” if llms didn’t exist, and at this point is probably the most useful feedback. I doubt it’s 100% one-shot AI, but someone definitely optimized it in parts, but the AI heard “concise” as “bullets and short sentences.”

reply
rustystump
2 hours ago
[-]
Agree to disagree. It is likely ai enhanced some where along the path to production. So many phrases reek AI but others do not. Is this a sprinkling of llm help or how a human genuinely writes, idk.
reply
abuani
2 hours ago
[-]
Out of curiosity, can you point to specific sections that reel of AI? I read the article and didn't see anything that immediately stuck out, but maybe I need to start looking for different signals.
reply
rustystump
2 hours ago
[-]
“ deliver more than just a core count increase. The architecture delivers improvements across multiple dimensions”

“But we didn't just assume it would be a problem; we measured it.”

“ Instead of compromising, we built FL2. ”

Idk if i am now seeing this pattern everywhere because it is all AI slop or if people really do write this way.

Skimming it, this looks like they got a partnership with amd and tacked it onto an ongoing project as if it were planned. This confuses us as it makes it harder to understand how much was the rewrite generally or was it some hardware thing? Man, i used to really enjoy cloudflares technical blogs.

reply