FilterHN

Cloudflare's Gen 13 servers: trading cache for cores for 2x performance

64 points

by wmf

3 days ago

| past

| 7 comments

| blog.cloudflare.com

| HN

▲

gdwatson

3 hours ago

[-]

I will confess to skimming by the end. But I don’t think they explained how they solved the cache issue except to say they rewrote the software in Rust, which is pretty vague.

Was all the code they rewrote originally in Lua? So was it just a matter of moving from a dynamic language with pointer-heavy data structures to a static language with value types and more control over memory layout? Or was there something else going on?

▲

jshier

1 hour ago

[-]

They posted about the Rust rewrite last year. https://blog.cloudflare.com/20-percent-internet-upgrade/

▲

zozbot234

2 hours ago

[-]

The gains in lower memory footprint and lower demands on memory bandwidth from rewriting stuff to Rust are very real, and they're going to matter a lot with DRAM prices being up 5x or more. It doesn't surprise me at all that they would be getting these results.

▲

alberth

2 hours ago

[-]

It seems like the unspoken takeaway is just how shockingly performant LuaJIT is, even relative to Rust.

▲

anitil

1 hour ago

[-]

They said they're replacing 15 years of Nginx+Lua, that's a testament to how good it can be.

▲

synack

1 hour ago

[-]

Is the Linux scheduler aware of shared CPU cache hierarchies? Is there any way we could make the scheduler do better cache utilization rather than pinning processes to cores or offloading these decisions to vendor specific code?

▲

AbuAssar

41 minutes ago

[-]

Epyc’s naming is beautiful and consistent

▲

attentive

2 hours ago

[-]

That was annoying to read because there is no easy to see impact of each change. It's FL2 + Gen 13 combined.

I.e. what's the FL2 benchmark on Gen 12 compared to FL1?

▲

trhway

2 hours ago

[-]

Reminds that time when cheap Celeron with small cache was beating expensive Pentium with large cache (if i remember correctly that Celeron's cache was running at the core frequency while Pentium's was a separate die on half-frequency, and Celeron was very overclockable)

▲

zozbot234

2 hours ago

[-]

Lower cache per core is actually a pretty natural outcome with the latest device fabrication nodes shrinking logic while leaving the size of SRAM largely unchanged. We may perhaps also see eDRAM (a lot denser than SRAM) for last-level caches.

▲

BoredPositron

2 hours ago

[-]

Pentium 4 the first release. AMD had the same gimmick with their Phenoms.

▲

danpalmer

3 hours ago

[-]

The tradeoff: The opportunity: Proving it out:

Nah I'm good thanks. Slop takes more effort to read and just raises questions of accuracy. It's just disrespectful to your reader to put that work on them. And in a marketing blog post it's just a bad idea.

▲

montyanne

2 hours ago

[-]

Cloudflare has excellent (human) technical writers. I don’t see any indication this is “slop”, it’s the standard in-the-weeds but understandable blog post they’ve been doing for years.

AI text is everywhere, but this isn’t it.

▲

refulgentis

1 hour ago

[-]

This is AI, but I can’t prove it, lol. :) The bulleted lists that are too short both in total list length and text per list item length. littles drama headers as parent noted.

To your point, this would register as “human bloviating for word count subtly” if llms didn’t exist, and at this point is probably the most useful feedback. I doubt it’s 100% one-shot AI, but someone definitely optimized it in parts, but the AI heard “concise” as “bullets and short sentences.”

▲

rustystump

2 hours ago

[-]

Agree to disagree. It is likely ai enhanced some where along the path to production. So many phrases reek AI but others do not. Is this a sprinkling of llm help or how a human genuinely writes, idk.

▲

abuani

2 hours ago

[-]

Out of curiosity, can you point to specific sections that reel of AI? I read the article and didn't see anything that immediately stuck out, but maybe I need to start looking for different signals.

▲

rustystump

2 hours ago

[-]

“ deliver more than just a core count increase. The architecture delivers improvements across multiple dimensions”

“But we didn't just assume it would be a problem; we measured it.”

“ Instead of compromising, we built FL2. ”

Idk if i am now seeing this pattern everywhere because it is all AI slop or if people really do write this way.

Skimming it, this looks like they got a partnership with amd and tacked it onto an ongoing project as if it were planned. This confuses us as it makes it harder to understand how much was the rewrite generally or was it some hardware thing? Man, i used to really enjoy cloudflares technical blogs.