FilterHN

I Put a Datacenter GPU in My Gaming PC for £200

56 points

by birdculture

56 minutes ago

| past

| 11 comments

| blog.tymscar.com

| HN

▲

mickeyp

5 minutes ago

[-]

Impressive work. But the problem is not the 30 tok/s which is fine for agentic coding and chat.

It's prefill; slow prefill kills agentic workloads dead.

If you have 100,000 tokens at ~150tok/s per the OP, you're looking at:

    You have: 100000 / (150/s)

    You want: hms

     11 min + 6.6666667 sec

Which is quite a wait indeed.

▲

Teknomadix

10 minutes ago

[-]

Tesla V100 SXM2 16GB is NOT DGX class as the author writes. It's HGX class. The V100 comes in two classes, SXM2 and SXM4, the latter coming with a Max of 80gb on board memory. Typically these are installed 8×A100 80GB SXM4 on an HGX riser, and what that gives you is NVSwitch fabric and 640GB of pooled HBM2e (on package stacked memory /w ~2 TB/s of memory bandwidth). 2u standard rack footprint too.

▲

omarqureshi

9 minutes ago

[-]

Could probably avoid the crazy fan with a waterblock - I've seen a whole kit, v100 + PCIE adapter + block for £235. Yes, you'll have to pay for pump, radiators and radiator fans, but that should really quieten it down

▲

mondainx

21 minutes ago

[-]

Great write-up, I've often considered these DC cards for a project and now you've convinced me to pick one up; you describe the price of the unit against what one spends on tokens and that does it for me.

▲

lucamark

27 minutes ago

[-]

Congrats! Most people won’t want to debug drivers, kernels, ACPI, adapters, and fan headers. But for those who do, the capability-per-pound is absurd.

▲

matja

22 minutes ago

[-]

The AMD MI250X GPUs are also interesting - 128GB of HBM2E at 3TB/s, sometimes you see them second-hand for under $1k, the catch obviously is that it needs an OAM socket. Never seen an easy way to hook them up to a regular mainboard.

▲

Teknomadix

7 minutes ago

[-]

These are interesting, and offer beefy through put. No point in adapting to a PCI lane thought, stuck behind the slot-bus bottleneck.

▲

jmyeet

22 minutes ago

[-]

Some context:

- In 2017, the v100 was a ~$10,000 GPU. I believe there was a PCI-e version but this is probably so cheap because SXM2 is going to be harder to use;

- A 5090 has 1800GB/s of internal memory bandwidth (compared to 900GB/s in the 9 year old GPU). Of course a 5090 is substantially more expensive;

- A 5090 has ~21k CUDA cores vs ~5k;

- The current $10k NVidia GPU is the RTX 6000 Pro w/ 96GB of VRAM. It has slightly more CUDA cores but it otherwise pretty much just a 5090. This is unsurprising. NVidia uses VRAM for market segmentation.

Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.

Anyway, I'm kind of impressed the author managed to get this all to work. I don't think it even would've occurred to me that someone had made an SXM2 adapter, particularly because it's not even used anymore. Like props to whoever did that.

▲

echelon

6 minutes ago

[-]

> Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.

Even more interesting: it'll devalue all of SaaS and the entire US tech sector.

We might have just shot our most valuable non-AI tech products in the foot.

▲

b112

13 minutes ago

[-]

I bet 3 years, but otherwise agree.

▲

recursivegirth

7 minutes ago

[-]

> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

Had to stop there. Annoying. I can't stand AI use for writing. It makes any otherwise great article feel so disingenuous.

▲

m0rde

5 minutes ago

[-]

What a difficult world you must live in these days

▲

peddling-brink

1 minute ago

[-]

While I don’t disagree with their sentiment, I’m far more annoyed with it than the AI writing.

▲

casey2

18 minutes ago

[-]

Some resell group is going to have to make this easier. The shear amount of these cards otherwise heading towards the landfill is staggering. That is if Big Tech don't destroy them to prevent model weights from leaking.

▲

Alifatisk

6 minutes ago

[-]

> The shear amount of these cards otherwise heading towards the landfill is staggering.

The thought of throwing away working cards sounds so bizarre to me. I can't believe companies would dispose them into the landfill like that, it is at least worth giving away for refuse.

▲

eric__cartman

14 minutes ago

[-]

How would destroying the GPUs prevent the model weights from leaking? By the time you get your hands on them the memory is powered off for a long enough time that a cold-boot style attack is impossible.

▲

lelanthran

26 minutes ago

[-]

> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

Because humans write exactly like this /s

▲

postalrat

20 minutes ago

[-]

Where do you think llms learned to write that way?

▲

jlund-molfese

3 minutes ago

[-]

You can also look at past posts by the same author (before LLM usage proliferated) if you’re curious.

The project is still very cool, but it’s a little less enjoyable to read when everything sounds the same. It would be just as annoying for people to manually write in a corporate/marketing style, because humanity is what makes the small web interesting.

https://blog.tymscar.com/posts/privategithubcicd/

▲

lelanthran

2 minutes ago

[-]

> Where do you think llms learned to write that way?

Not from individual human content, that's for sure - maybe MLM marketing copy? Sleazy 4AM ads?

I mean, every time this response comes up, I keep asking the person to point at something written prior to 2022 that gets 80%+ on the LLM detectors, and yet no one can find anything.

Maybe you, postalrat, can find something written in this style that was published prior to 2022.

▲

alehlopeh

5 minutes ago

[-]

Marketing content.

▲

driverdan

4 minutes ago

[-]

There's interesting stuff in this writeup but it sure seems like most of it was written by an LLM.

▲

bossyTeacher

11 minutes ago

[-]

X is Y. Z is Y. And Alpha is genuinely Beta.

Classic LLM writing style.

▲

knollimar

22 minutes ago

[-]

A little bit of local copium but neat read.

Isn't a rasbpi with 16gb of RAM $300 now?

▲

thejj100100

8 minutes ago

[-]

I don't understand what point you're trying to make here? Are you talking about the price of RAM?

▲

matja

11 minutes ago

[-]

The latest Raspberry Pi 5 has one 32-bit channel (2x 16-bit subchannels) of LPDDR4X-4267 SDRAM giving 17.1GB/s of bandwidth, 52x less than this GPU. Never mind lacking the CUDA and Tensor cores, so the FP16 performance is 102x less (307 GFLOPS vs 31.4 TFLOPS). So for £200, there's absolutely no comparison for this specific use-case.