Serving AI from the Basement – 192GB of VRAM Setup
318 points
4 months ago
| 28 comments
| ahmadosman.com
| HN
XMasterrrr
4 months ago
[-]
Hey guys, this is something I have been intending to share here for a while. This setup took me some time to plan and put together, and then some more time to explore the software part of things and the possibilities that came with it.

Part of the main reason I built this was data privacy, I do not want to hand over my private data to any company to further train their closed weight models; and given the recent drop in output quality on different platforms (ChatGPT, Claude, etc), I don't regret spending the money on this setup.

I was also able to do a lot of cool things using this server by leveraging tensor parallelism and batch inference, generating synthetic data, and experimenting with finetuning models using my private data. I am currently building a model from scratch, mainly as a learning project, but I am also finding some cool things while doing so and if I can get around ironing out the kinks, I might release it and write a tutorial from my notes.

So I finally had the time this weekend to get my blog up and running, and I am planning on following up this blog post with a series of posts on my learnings and findings. I am also open to topics and ideas to experiment with on this server and write about, so feel free to shoot your shot if you have ideas you want to experiment with and don't have the hardware, I am more than willing to do that on your behalf and sharing the findings

Please let me know if you have any questions, my PMs are open, and you can also reach me on any of the socials I have posted on my website.

reply
mattnewton
4 months ago
[-]
The main thing stopping me from going beyond 2x 4090’s in my home lab is power. Anything around ~2k watts on a single circuit breaker is likely to flip it, and that’s before you get to the costs involved of drawing that much power for multiple days of a training run. How did you navigate that in a (presumably) residential setting?
reply
J_Shelby_J
4 months ago
[-]
I’m running two 3090s on a 700w psu. You definitely can get more than that out of 2000w bus.

I wrote a blog on reducing the power limits of nvidia gpus. Definitely try it out. https://shelbyjenkins.github.io/blog/power-limit-nvidia-linu...

reply
XMasterrrr
4 months ago
[-]
Hey man, I think I came across your blog at some point before while trying to figure out my own power plan for this beast (check my comment to OP for more context), so kudos to you for that.

I would say that power limiting is a potential work around, and it should work perfectly fine for inference, but when it comes to trainning you will want to squeeze every ounce of power. So, depends on your goal.

What CPU/Mobo/Storage are you running with those two 3090s for a 700w to work? I am gonna say, if at any point you're pushing more than 500w out of that PSU, you might be risking the 80% safety rule. I would have at least used a 850w just to be safe with two 3090s + rest of hardware.

reply
smcnally
4 months ago
[-]
Thank you for this post. I’d read it in ~June and it helped quite a bit with manual ‘nvidia-smi’ runs. I just recently created the systemd service description and am still delving related power and performance possibilities.
reply
tcdent
4 months ago
[-]
I can't believe a group of engineers are so afraid of residential power.

It is not expensive, nor is it highly technical. It's not like we're factoring in latency and crosstalk...

Read a quick howto, cruise into Home Depot and grab some legos off the shelf. Far easier to figure out than executing "hello world" without domain expertise.

reply
gizmo686
4 months ago
[-]
A good engineer knows the difference between safe and dangerous. Setting up an AI computer is safe. Maybe you trip a circut. Maybe you interfere with something else running on your hobby computer. But nothing bad can really happen.

Residential electrical is dangerous. Maybe you electrocute yourself. Maybe you cause a fire 5 years down the line. Maybe you cause a fire for the next owner because you didn't know to protect the wire with a metal plate so they drill into it.

Having said that, 2 4090s will run you aroud $5,000, not counting any of the surrounding system. At that cost point, hireing an electritian would not be that big of an expense relativly speaking.

Also, if you are at the point where you need to add a circut for power, you might need to seriously consider cooling, which could potentially be another side quest.

reply
XMasterrrr
4 months ago
[-]
I agree with you on all of that. I went down the rabbit hole to understand what's up, but I also hired someone and told them exactly what I wanted: breakers amps and volts, outlets type, surge protector over the entire breaker box up to 120k, etc (I am going to be writing about power and electricity in part 3 of this blogpost series). Electricity was on top of the things I was not going to cheap out on because the risk vs reward made no sense to me.

Re: cooling; I have an AC vent directed on the setup, plus planned out in-out in the most optimal way possible to maximize cooling. I have installed like 20 more fans since taking these pictures :D

reply
vunderba
4 months ago
[-]
Just a slight clarification, an RTX 4090 card currently runs about $1700 USD at least in the states, so it's more like $3500 pre-tax.
reply
XMasterrrr
4 months ago
[-]
A year ago it was a struggle to get one for anything below $2100 pre-tex. Glad it came down a bit.
reply
tjoff
4 months ago
[-]
Add to that is that it is likely illegal to do yourself. Which of course has implications for insurance etc.
reply
m-s-y
4 months ago
[-]
In the US, it’s fully legal to perform electric/plumbing/whatever work on your own home.

If you screw it up and need to file a claim, insurance can’t deny the claim based solely on the fact that you performed the work yourself, even if you’re not a certified electrician/plumber/whatever.

What you don't want to do is have an unlicensed friend work on your home, and vice versa. There are no legal protections, and the insurance companies absolutely will go after you/your friend for damages.

Edit: sorry this applies to owned property, not if you’re renting

reply
earleybird
4 months ago
[-]
In my jurisdiction I can certainly do the work but am under the same requirements to pull a permit and pass a provincial inspection. It very quickly becomes the most effective to have an electrician involved, maybe not for all the work but some of it. They're more that willing to review the work you do and talk about it. Think of it as pair coding - great opportunity to learn and they'll tell you when you've done a good job. (at least the ones I've found)
reply
gizmo686
4 months ago
[-]
Around here, the bar is lower for work on your own property, but you still need to be qualified by the county to be allowed to do so. Qualification consists of a 2 hour open book exam, where the book is a copy of the national electrical codes.

https://www.montgomerycountymd.gov/DPS/Process/combuild/home...

Granted, if you actually do unlicensed work in your house, no one will know. But it is still illegal.

reply
jefftk
4 months ago
[-]
Depends on the state and municipality. Mine doesn't allow homeowners to pull electrical permits.
reply
scubbo
4 months ago
[-]
Extraordinary claims require extraordinary evidence.
reply
zamadatix
4 months ago
[-]
As with most regulations in the "US" I have a feeling the answer is really something like "Depending on the city and state you live in the answer lies somewhere between 'go nuts' and 'that could lead to criminal charges and you being liable for everything that happens to the house and your neighbors kitchen sink'".
reply
defrost
4 months ago
[-]
It's like that in Australia, liability and insurance hinge on licenced work by trade qualified professionals.

What is common here, in the handy crowd at least, is to do your own electrical, plumbing, gas work and leave it open and accessable for a licenced professional to check and sign off on.

You're still paying for an hour or two of their time and a surcharge for "taking on the responsibility" but it's often not an issue if the work is clean, to current code, and sanity tests correct (correct wiring, correct angles on plumbing, pressure testing on gas pipes).

reply
tourmalinetaco
4 months ago
[-]
It‘s hardly an extraordinary claim. Just because you can’t install a ceiling fan doesn’t mean it‘s an “extraordinary” feat that is “likely illegal”.
reply
scubbo
4 months ago
[-]
> insurance can’t deny the claim based solely on the fact that you performed the work yourself

_This_ is the claim that is extraordinary. I'm not saying that the government would bust down my door for doing work on my own home, but rather that the insurance company would then view that work as uninsured.

The entire business model of insurance agencies is to find new, creative, and unexpected ways to deny claims. That is how they make their money. To claim that they would accept liability for a property that's had uninspected work done by an unlicensed, untrained, unregistered individual is just that - extraordinary.

reply
bigiain
4 months ago
[-]
> Also, if you are at the point where you need to add a circut for power, you might need to seriously consider cooling, which could potentially be another side quest.

There should be an easy/reliable way to channel "waste heat" from something like this to your hot water system.

Actually, 4 or 5 kW continuous is a lot more than most domestic hot water services need. So in my usual manner of overcomplicating simple ideas, now I want to use the waste heat to run a boiler driving a steam engine, perhaps to directly mechanically run your air conditioning or heat pump compressor.

reply
raxxorraxor
4 months ago
[-]
Instant water heaters use up to and sometimes even more than 27kW. Of course boilers use less, but still...

These aren't power requirements that are insurmountable. They would get pricey though and I wish my rig for computing would use something around .1kW under load...

Using the heat from PCs would be nice. I guess most just use them as electrical heaters right now.

reply
XMasterrrr
4 months ago
[-]
Let me know if you figure it out, I would be really interested hahaha
reply
BizarroLand
4 months ago
[-]
I'm doing this myself now. I have a homelab server setup and a hybrid water heater.

Stuffed the homelab next to the air intake of the water heater, now when I need hot water my water heater sucks the heat out of the air and puts it into the water.

It's obviously not 100% efficient, but at least it recaptures some of the waste heat and decreases my electrical bill somewhat.

reply
lolinder
4 months ago
[-]
> I can't believe a group of engineers are so afraid of residential power. ... Read a quick howto, cruise into Home Depot and grab some legos off the shelf. Far easier to figure out than executing "hello world" without domain expertise.

The instinct to not touch something that you don't yet deeply understand is very much an engineer's instinct. Any engineer worthy of the title has often spent weeks carefully designing a system to take care of the hundreds of edge cases that weren't apparent at a quick glance. Once you've done that once (much less dozens of times) you have a healthy respect for the complexity that usually lurks below the surface, and you're loathe to confidently insert yourself confidently into an unfamiliar domain that has a whole engineering discipline dedicated to it. You understand that those engineers are employed full time for a reason.

The attitude you describe is one that's useful in a lot of cases and may even be correct for this particular application (though I'm personally leery of it), but if confidently injecting yourself into territory you don't know well is what being an "engineer" means to you, that's a sad commentary on the state of software engineering today.

reply
tcdent
4 months ago
[-]
Sir, this is "Hacker News".
reply
lolinder
4 months ago
[-]
So did you mean "I can't believe a group of hackers are so afraid of residential power"?
reply
3eb7988a1663
4 months ago
[-]
People can and do die from misuses of electricity. Not a move-fast-and-break things kind of domain.
reply
varispeed
4 months ago
[-]
You only "break" once...
reply
lbotos
4 months ago
[-]
I've been learning Japanese and a favorite of mine is: 一体

Which is used as "what the heck" but it's direct kanji translation is one body.

https://jisho.org/word/%E4%B8%80%E4%BD%93

reply
varispeed
4 months ago
[-]
Fun fact as a kid I stuck my fingers in the loose mains wires as we were playing at unfinished building. The wires were live and I still remember it felt like it's going to break my arm. Fortunately I only got a slight burn. This got me interested in electronics which I started studying later in my life.
reply
fhdsgbbcaA
4 months ago
[-]
You’re forgetting many people have landlords who aren’t exactly keen on tenants doing diy electrical work.
reply
tourmalinetaco
4 months ago
[-]
I’m hardly surprised, this is primarily a programming discussion website, and the highest voltage read on an average day here is in mV. It’s natural to be leery of things you have no experience in.
reply
fragmede
4 months ago
[-]
your car is 12 volts, and USB is 5 volts; 12 or up to 20 these days for laptop charging. My computer's CPU is probably 1.8 volts but I can't remember the last time I had my multimeter on that, but that's still more than millivolts.
reply
raxxorraxor
4 months ago
[-]
Probably meant milliampere, specifically 1 milliampere. But yes, usually lightweight engineers are familiar with TTL and limit themselves to 5V. 12V+ is another arcane realm you don't want to touch.

Some old serial ports had 12V and a high max current. The DIY things you attached here were prone to kill your mainboard.

Voltage/current is either 0 or 1. Anything higher kills software developers instantly.

reply
matt-p
4 months ago
[-]
In telecoms 48V dc is very common and not always even connectorised! It's "safe-ish" but DC makes me more nervous than 240v, big thick 400A cables into a rack are quite intimidating to see but the main issue is DC is sticky and doesn't have the safety protections of RCDs etc. Indeed you are lucky to get a working isolator.
reply
littlestymaar
4 months ago
[-]
That's technically correct, but irrelevant: you cannot kill yourself with 12 or 20V any more than with 10mV. 120V or 230V is another story.

That being said, it's still very easy not to kill yourself with 120/230V: just shut down the power before touching anything.

reply
wpietri
4 months ago
[-]
Ah yes, the "move fast and burn your house down" school of "engineering".
reply
phil21
4 months ago
[-]
Adding a bog-standard breaker and a short conduit run is about as simple as it gets for electric work. It’s rather low risk if you simply read the code and follow it.

If you know nothing about basic electric work or principles, sure - spend the $500 to have an electrician add a 30 or 50A 220V outlet near your electric service panel. Totally reasonable to do as it is indeed dangerous to touch things you don’t understand.

It’s far less complex and less dangerous than adding an EV charge point to your garage which seems to be quite common for this crowd. This is the same (less, since you typically have a lot more flexibility on where to locate the outlet and likely don’t need to pull through walls) complexity as adding a drop for an electric stove.

Where the “home electric hackers” typically tend to get in trouble is doing stuff like adding their own generator connection points and not properly doing interlocks and all that fun stuff.

If you can replace your own light switches and wall receptacles you are just one step away from adding an additional branch circuit. Lots of good learning material out there on the subject these days as well!

reply
wpietri
4 months ago
[-]
I'm not saying people shouldn't add breakers. I'm saying your talking like people are scaredy-cats and comparing it to working with toys or hello world is exactly the kind of of macho nonsense that leads people to do shoddy engineering.

As a hobby, I restore pinball machines. A modern one is extremely careful about how it uses power, limiting wall current to a small, normally-sealed section of the machine. And even so, it automatically disables the lower-voltage internals the moment you open the coin door. A 1960s machine, by contrast, may not have a ground at all. It may have an unpolarized plug, and it will run wall current all over the place, including the coin door, one of the flippers, and a mess of relays.

In the pinball community, you'll find two basic attitudes toward this. One is people treating electrical safety about as seriously as the people who design the modern machines. The others is people who think anybody who worries about a little wall current are all pussies who don't have the balls to work on anything and should just man up and not worry about a little 120V jolt.

The truth is that most people here are not engineers of any sort. We're software developers. We're used to working in situations where safety and rigor basically don't matter, where you have to just cowboy ahead and try shit. And that's fine, because control-z is right there. I've met people who bring that attitude to household electrical work, and they're fucking dangerous. I know one guy, quite a smart one, who did a lot of his own electrical work based on manliness and arrogance, and once the inspector caught up with him, he immediately pulled the guy's meter and wouldn't let him connect up to the grid again until a real electrician had straightened it all out.

It's true that this stuff is not that hard to learn if you study it. But an architect friend likes to say that the building code is written in blood, meaning that much of it is there because confident dumbasses managed to kill enough people that they had to add a new rule. If people are prepared to learn the rules and appreciate why they're there, I'm all for it. But if they do it coming from a place of proving that they're not "so afraid of residential power", that's a terrible way to approach it.

reply
matt-p
4 months ago
[-]
To be fair I'd be quite a bit more relaxed working on 120v. Very supprised these machines don't run on dc internally?
reply
wpietri
4 months ago
[-]
In the older ones, it's almost all AC. One giant transformer, a couple of different voltages. Possibly with "high tap", a way to compensate for wall current with lower than expected voltages. The past is another country.
reply
pupdogg
4 months ago
[-]
You can run a setup of 8x 4090 GPUs using 4x 1200W 240V power supplies (preferably HP HSTNS-PD30 Platinum Series), with a collective use of just around 20-amps, meaning it can easily run on a single 240V 20-amp breaker. This should be easily doable in a home where you typically have a 100 to 200A main power panel. Running 4x 1200W power supplies 24 hours a day will consume 115.2 kWh per day. At an electricity rate of $0.12 per kWh, this will cost approximately $13.82 per day or around $414.72 per month.

FYI, I can handle electrical system design and sheet metal enclosure design/fabrication for these rigs, but my software knowledge is limited when it comes to ML. If anyone's interested, I'd love to collaborate on a joint venture to produce these rigs commercially.

reply
google234123
4 months ago
[-]
In Cali isn't now like 0.5 per kWh :P
reply
pupdogg
4 months ago
[-]
Wow!
reply
XMasterrrr
4 months ago
[-]
Yeah, you get it. I am using 3x 1600w 240V Platinum PSUs from Superflower (They're the manufacturer of EVGAs PSUs. Top notch stuff, did my homework) I however decided against server PSUs like the one you suggested because the extra setup overhead + noise.

As I mentioned in my reply to OP, very doable as long as you do your research. The only thing I did not do was not doing the installation itself because I was not comfortable with it, but I pretty much had everything named to the contractor, and even how I would have gone about the installation process was exactly how he did it.

Hit me up on Twitter or Email, we can chat ideas about this venture

reply
swader999
4 months ago
[-]
You'll have helicopters over your house.
reply
pupdogg
4 months ago
[-]
Believe it or not, a buddy of mine was dealing with crazy high power bills for three months and couldn’t figure out why. He tried everything to cut back, but nothing worked. Finally, he called an electrician, who found a hidden short in a 30-amp circuit that was constantly drawing power without tripping the breaker. After fixing the issue, his bills went back to normal—no helicopters were involved during the process!
reply
fennecbutt
4 months ago
[-]
And all that heat was going...where?
reply
w-ll
4 months ago
[-]
Good thing i can see them coming, and happy to offer anyone a cup of coffee/tea to understand im just the biggest nerd in the hoa.
reply
defrost
4 months ago
[-]
Already have to deal with Robinson R22's during mustering season, a few more won't hurt.
reply
orbital-decay
4 months ago
[-]
>Anything around ~2k watts on a single circuit breaker is likely to flip it

I'm curious, how do you use e.g. a washing machine or an electric kettle, if 2kW is enough to flip your breaker? You should simply know your wiring limits. Breaker/wiring at my home won't even notice this.

reply
trillic
4 months ago
[-]
My kettle only pulls 1500W, as do most in the US. Our water just takes longer to boil than in Europe. My washer / dryer has its own 30a breaker as does my Oven as well as water heater. My garbage disposal has its own 15a breaker.

Boiling 1 liter takes like 2 mins. Most Americans don’t have kettles because they don’t drink tea.

reply
immibis
4 months ago
[-]
Americans do not have electric kettles and need special circuits for electric clothes dryers.
reply
lolinder
4 months ago
[-]
We have an electric kettle in the US and it runs just fine drawing 1500W.

You're correct that the dryer is on a larger circuit, though.

reply
beAbU
4 months ago
[-]
> and it runs just fine drawing 1500W.

You think that this is "just fine" because you've never experienced the glory that is a 3kW kettle!

reply
rootusrootus
4 months ago
[-]
I just get 99C water from a tap next to my kitchen sink. Why do people still use kettles?
reply
dymk
4 months ago
[-]
Because they don’t have a spare few grand for an instant hot plus installation
reply
rootusrootus
4 months ago
[-]
You’re off by an order of magnitude. They are a couple hundred bucks and an easy DIY job.
reply
dymk
4 months ago
[-]
It's a couple hundred bucks if you can DIY it, which most people cannot or are not willing to do
reply
fragmede
4 months ago
[-]
Japanese water boilers keep it boiling so there's no wait at all!
reply
blibble
4 months ago
[-]
I get bored and tend to wander off waiting for it to boil at 3kW

1.5kW must be absolute agony

reply
lolinder
4 months ago
[-]
I mean... yes, I don't sit around waiting for the kettle to boil. But if I fill it and start it first the water is already boiling by the time I get everything out, so it's not like any time is wasted as is.
reply
jamesbfb
4 months ago
[-]
Huh, what?! Mega TIL moment for me as an Australian with an electric kettle and dryer plugged into whatever power socket I wish! Reminds me of this great Technology Connections video: https://youtu.be/jMmUoZh3Hq4?si=3vSMHmU2ClwNRtow
reply
Yodel0914
4 months ago
[-]
I'm pretty sure in our current rental the kitchen and laundry are on the same circuit, which means I'll often have the dryer, washing machine, kettle, toaster and microwave drawing power at the same time. It's never been an issue.
reply
XMasterrrr
4 months ago
[-]
What kind of a dryer? Because that cannot be right. Dryers require a 30amp 240volt dedicated breakers by code in most counties in most states nowadays.
reply
Yodel0914
4 months ago
[-]
It's a bosch heat pump dryer, but previously we had a traditional vented dryer.

I've never seen a dedicated circuit for dryers in Australia, and I've lived in probably a dozen different properties. Ovens, aircon, hot water, bathroom heat lamps often have dedicated circuits, though.

reply
HeatrayEnjoyer
4 months ago
[-]
Insane
reply
XMasterrrr
4 months ago
[-]
If that is true and OP is not just confused, he should sue his landlord, and I am not even kidding!!!
reply
HeatrayEnjoyer
4 months ago
[-]
Why would regular sockets need to supply that much juice
reply
mattnewton
4 months ago
[-]
I rent an old Victorian. I have one breaker line for the fridge and microwave and one line for basically everything else.

If that wasn’t the limit though, the fact that the machine is currently a space heater at 2 liquid cooled 4090’s would be.

reply
nathanasmith
4 months ago
[-]
I heat water on the stove top which is plugged into a 240 volt outlet.
reply
throwthrowuknow
4 months ago
[-]
Not speaking from direct experience building a rig like this but the blog post mentions having 3 power supplies so the most direct solution would be to put each on their own dedicated circuit. As long as you have space in your electrical box this is straightforward to do though I would recommend having an electrician do the wiring if you aren’t experienced with that type of home electrical work.
reply
gizmo686
4 months ago
[-]
Even without space in the existing box, installing a subpanel isn't that much more of a cost.
reply
XMasterrrr
4 months ago
[-]
Actually, putting each PSU on its own circuit is crazy dangerous. In the scenario of your suggestion, if one goes out, you are in for a fire. Highly recommend against that.
reply
gizmo686
4 months ago
[-]
This might be one of those reasons why I'm not an electritian, but is it dangerous.

Back when I worked at a high-availability data center, all of our servers had dual psus, plugged into seperate circuits.

The transformer in the PSUs should electrical isolate the mains voltage from the low voltage side, so you aren't going to cause a short across the two circuts.

The only risk I see is a cascade failure, where the increased load on the second circuit causes its breaker to trip.

reply
throwthrowuknow
4 months ago
[-]
Do they not have fuses? Use 15A breakers?
reply
bluedino
4 months ago
[-]
Take your typical 'GPU node', which would be a Dell/HP/SuperMicro with 4-8 NVIDIA H100's and a single top high level AMD/Intel CPU. You would need 2-4 240v outlets (30A).

In the real world you would plug them into a PDU such as: https://www.apc.com/us/en/product/AP9571A/rack-pdu-basic-1u-...

Each GPU will take around 700W and then you have the rest of the system to power, so depending on CPU/RAM/storage...

And then you need to cool it!

reply
XMasterrrr
4 months ago
[-]
Yeah, you get it. I have two PDUs with NEMA L6-30P plugin, and two NEMA L6-30R outlet. Each outlet is connected to it's dedicated 30A 240v circuit (which gives you a total of 60A, or 6000w).

Cooling is its own story... Man, figuring out this process was hella of a journey

reply
fennecbutt
4 months ago
[-]
Is suppose this is an American view. Most places with 240 you can run anything up to 3kW per socket most of the time. But you can also get a sparky and go for a cheap high current socket install on 240 or even pay a bit more to get 3 phase installed, if you have a valid enough use case.

Hell most kettles use 3kw. Tho for a big server I'd get it wired dedicated, same way power showers are done (7-12~ kW)

reply
abound
4 months ago
[-]
Not OP, but my current home had a dedicated 50A/240V circuit because the previous owner did glass work and had a massive electric kiln. I can't imagine it was cheap to install, but I've used it for beefy, energy hungry servers in the past.

Which is all to say its possible in a residential setting, just probably expensive.

reply
woleium
4 months ago
[-]
Yes, or something like a residential aircon heatpump will need a 40a circuit too. Car charging usually has a 30a. Electric oven is usually 40a. There’s lots of stuff that uses that sort of power residentially
reply
slavik81
4 months ago
[-]
Not the OP, but I hired an electrician to put a 30A 240V circuit with a NEMA L6-30R recepticle next to my electrical panel. It was 600 CAD. You can probably get it done cheaper. He had to disconnect another circuit and make a trip to the hardwate store because I told him to bring the wrong breaker.
reply
XMasterrrr
4 months ago
[-]
Yup, that's exactly what I had installed.
reply
GaggiX
4 months ago
[-]
I use a hair dryer that is a little bit more than 2kw, but I guess because of the 120V it would be a problem in the US.

16 amps x 120v = 1920W, it would probably trip after several minutes.

16 amps x 230v = 3680W, it wouldn't trip.

reply
sva_
4 months ago
[-]
When my gf first came to Europe she brought her hairdryer from the US and plugged it in using an adapter that just reroutes the wires. She was unaware of the voltage difference (or thought the adapter would adjust it.) That thing started spewing fire pretty much immediately and I luckily quickly realized what she was doing and was able to pull the plug (I hadn't noticed that she brought her own hair dryer.) Luckily she wasn't pointing it at herself ...
reply
sandos
4 months ago
[-]
This is funny as a european, since we have many, many groups where we reguarly will run 2kW, and some, loads. Really no issue, but I guess lower voltage makes it a problem.
reply
sixothree
4 months ago
[-]
Yup. We typically have 20 amp breakers in living portions of the house and it's common practice for most devices to top out at 1500 watts. But from your description, you would still need three lines and three breakers. So. I'm not understanding your point.
reply
XMasterrrr
4 months ago
[-]
Most outlets rely on 120v 15amp breakers here in the US, only washer/dryer/kitchenware require the higher end 240v breakers
reply
teaearlgraycold
4 months ago
[-]
I’ve ran 3x L40S on a 1650W PSU on a normal 120V 20A circuit.
reply
XMasterrrr
4 months ago
[-]
You've got a much stronger heart than me man
reply
teaearlgraycold
4 months ago
[-]
Sometimes you just need to YOLO it.
reply
littlestymaar
4 months ago
[-]
Then juste add a 32A circuit breaker to your electrical installation, it's not a big deal really.
reply
XMasterrrr
4 months ago
[-]
Oh yeah, my original setup was an RTX 4090 + an RTX 3090, and I swear one night I had the circuit breaker trip more than 15 times before I gave up. I have a UPS so I would run to the box before my system shuts down. Most houses are equipped with 15amp 120v breakers, these should never exceed 1500w, and their max is 1800w but then you're really risking it.

So, as mentioned on the article, I actually have installed (2) 30amp 240v breakers dedicated entirely for this setup (and the next one in case I decide to expand to 16x GPUs over 2 nodes lol). Each breaker is supposed to power up to 6000w at ease. I also installed a specific kind of power outlet that can handle that kind of current, and I have done some extreme research into PDUs. I plan on posting about all of that in this series (part 3 according to my current tentative plans) so stay tuned and maybe bookmark the website/add the RSS feed to your digest/or follow me on any of the socials if this is something that you wanna nail down without spending a month on research like me :'D

reply
mattnewton
4 months ago
[-]
Thanks! Yeah the 4090’s are very thirsty if you let them be, I haven’t played enough with throttling their voltage and how that affects perf. Looking forward to your articles.
reply
nullindividual
4 months ago
[-]
Do you run this 24/7?

What is your cost of electricity per kilowatt hour and what is the cost of this setup per month?

reply
michaelt
4 months ago
[-]
I have a much smaller setup than the author - a quarter the GPUs and RAM - and I was surprised to find it draws 300W at idle
reply
nullindividual
4 months ago
[-]
The reason I asked is I used to run a dual X5650 server with SSDs and it was about $50/month with the cheapest (or very close to) rates in the US.
reply
disiplus
4 months ago
[-]
We have more expensive gas then usa, but i pay like a 5 cent per kwh @220V

Did not know how expensive it is in usa, especially California.

reply
nirav72
4 months ago
[-]
Depends on where you live in the U.S. On the west coast, you’ll definitely pay 2x the national average. I live in the south-eastern part of the U.S and electricity is quite a bit cheaper at 12 cents/kwh. But even here it varies by region in my state of Georgia. Down the road where I have friends and family , they get power from a different provider. Their cost are much higher. But they also have flexible pricing. Such as at night they pay a fraction of their day time rates. From 17 cents to 3 cents. While I have a fixed rate 24/7. Which is cheaper than their day time rates.
reply
rootusrootus
4 months ago
[-]
US generally has some of the cheapest electricity, about half of what Europeans pay. California has some areas that are abnormally expensive.
reply
nullindividual
4 months ago
[-]
I pay 10 cents per KWH. Most of the electricity in my state is produced by hydropower.
reply
fuzzybear3965
4 months ago
[-]
Yep. ~$.33/kWh in Southern California (SoCal Edison) and going up all the time!
reply
fragmede
4 months ago
[-]
$0.51/KwH during peek hours in San Francisco!
reply
trollbridge
4 months ago
[-]
This is a setup that might make more sense to run at full power during winter months.
reply
nrp
4 months ago
[-]
How are you finding 2b/3b quantized llama 405B? Is it behaving better than 8b or 16b llama 70B?
reply
pupdogg
4 months ago
[-]
Amazing setup. I have the capability to design, fabricate, and powder coat sheet metal. I would love to collaborate on designing and fabricating a cool enclosure for this setup. Let me know if you're interested.
reply
koyote
4 months ago
[-]
This is undoubtedly cool and I am a bit jealous!

Maybe a bit of a stupid question, but what do you actually do with the models you run/build, a part from tinkering? I'd assume most tinkering can also be done on smaller systems? Is it in order to build a model that is actually 'useful'/competitive?

reply
faangguyindia
4 months ago
[-]
I tried self hosting LLM for commandline instant completion and guidance utility: https://github.com/zerocorebeta/Option-K

But problem is even 7b models are too slow on my pc.

Hosted models are lightening fast. I considered possibility of buying hardware but decided against it.

reply
bravura
4 months ago
[-]
How loud is it? Was special electrical needed?
reply
lossolo
4 months ago
[-]
Cool, it looks similar to my crypto mining rigs (8xGPU per node) from around 7 years ago, but I used PCI-E risers and a dual power supply.
reply
wkat4242
4 months ago
[-]
> And who knows, maybe someone will look back on my work and be like “haha, remember when we thought 192GB of VRAM was a lot?”

I wonder if this will happen. It's already really hard to buy big HDDs for my NAS because nobody buys external drives anymore. So the pricing has gone up a lot for the prosumer.

I expect something similar to happen to AI. The big cloud parties are all big leaders on LLMs and their goal is to keep us beholden to their cloud service. Cheap home hardware work serious capability is not something they're interested in. They want to keep it out of our reach so we can pay them rent and they can mine our data.

reply
Eisenstein
4 months ago
[-]
It isn't that cloud providers want to shut us out, it is that nVidia wants to relegate AI capable cards to the high end enterprise tier. So far in 2024 they have made $10.44b in revenue from the gaming market, and over $47.5b in the datacenter market, and I would bet that there is much less profit in gaming. In order to keep the market segmented they stopped putting nvlink on gaming cards and have capped VRAM at 24GB for the highest end GPUs (3090 and 4090) and it doesn't look much better for the upcoming 5090. I don't blame them, they are a profit-maximizing corporation after all, but if anything is to be done about making large AI models practical for hobbyists, start with nVidia.

That said, I really don't think that the way forward for hobbyists is maxing VRAM. Small models are becoming much more capable and accelerators are a possibility, and there may not be a need for a person to run a 70billion parameter model in memory at all when there are MoEs like Mixtral and small capable models like phi.

reply
Saris
4 months ago
[-]
>It's already really hard to buy big HDDs for my NAS because nobody buys external drives anymore. So the pricing has gone up a lot for the prosumer.

I buy refurb/used enterprise drives for that reason, generally around $12 per TB for the recent larger drives. And around $6 per TB for smaller drives. You just need an SAS interface but that's not difficult or expensive.

IE; 25TB for $320, or 12TB for $80.

reply
thelastparadise
4 months ago
[-]
> It's already really hard to buy big HDDs for my NAS

IME 20tb drives are easy to find.

I don't think the clouds have access to bigger drives or anything.

Similarly, we can buy 8x A100s, they're just fundamentally expensive whether you're a business or not.

There doesn't seem to be any "wall" up like there used to be with proprietary hardware.

reply
wkat4242
4 months ago
[-]
They are easy to find but extremely expensive. I used to pay below 200€ for a 14TB Seagate 8 years ago. That's now above 300. And the bigger ones are even more expensive.

For me these prices are prohibitive. Just like the A100s are (though those are even more so of course).

The problem is the common consumer relying on the cloud so these kind of products become niches and lose volume. Also, the cloud providers don't pay what we do for a GPU or HDD. They buy them by the ten thousands and get deep discounts. That's why the RRPs which we do pay are highly inflated.

reply
walterbell
4 months ago
[-]
Looking at https://shucks.top and https://diskprices.com, prices do seem to be higher.

Homelab vendor in Austin, TX with periodic sales, limited volume: https://shop.digitalspaceport.com

reply
Dylan16807
4 months ago
[-]
Well if I look at Amazon I see a couple models of external 14TB for $190, and a brand new Exos 16TB for $230. Not too bad. Though personally I get much cheaper used drives and put them in RAID for a NAS.

And they do have better sales.

reply
wkat4242
4 months ago
[-]
Most of the cheap drives here are refurbs with questional quality. And those Exoses here are much more expensive sadly, especially if you choose only legit vendors on Amazon.
reply
gizmo686
4 months ago
[-]
The cloud companies do not make the hardware, they buy it like the rest of us. They are just going to be almost the entirety of the market, so naturally the products will built and priced with that market in mind.
reply
wkat4242
4 months ago
[-]
Yes and they get deep discounts which we don't. Can be 40% or more!

Of course the vendor can't make a profit with such discounts so they inflate the RRP. But we do end up paying that.

reply
illiac786
4 months ago
[-]
That’s the main problem is a market owned by enterprise customers. Consumers don’t matter, there is zero interest is competing for them, they’re too little. The discounts is a killer for example, well have to buy from a reseller each time, who of course will pocket a good proportion of the discount because there won’t be many resellers that sell to consumers…

I have seen very large ent customers get 80% discount on hardware - it’s mind boggling that the vendor is not going bankrupt.

reply
donavanm
4 months ago
[-]
Not specific for GPUs but I believe some of those giant and deeply discounted buys are at/below typical cost because of volume. They allow the vendor to increase their OEM/manufacturing commits, or shift bins theyre long on, to improve the rest of their sales pipeline. Similar for very large last orders or all the remaining stock of a SKU which improves cash flow and turns over inventory. Its a very very different vendor relationship with things like defect rates, yield, and “warranty” turned in to price factors.
reply
wkat4242
4 months ago
[-]
> I have seen very large ent customers get 80% discount on hardware - it’s mind boggling that the vendor is not going bankrupt.

Yes exactly. When I see what we pay for stuff at work...

Obviously the vendors don't have 80+% margins. So what do they do? Inflate the RRP to compensate. So they can give a huge discount that sounds good on paper.

But this makes it unviable to buy for consumers that do have to pay RRP.

reply
walterbell
4 months ago
[-]
An adjacent project for 8 GPUs could convert used 4K monitors into a borderless mini-wall of pixels, for local video composition with rendered and/or AI-generated backgrounds, https://theasc.com/articles/the-mandalorian

> the heir to rear projection — a dynamic, real-time, photo-real background played back on a massive LED video wall and ceiling, which not only provided the pixel-accurate representation of exotic background content, but was also rendered with correct camera positional data.. “We take objects that the art department have created and we employ photogrammetry on each item to get them into the game engine”

reply
freeqaz
4 months ago
[-]
How much do the NVLinks help in this case?

Do you have a rough estimate of how much this cost? I'm curious since I just built my own 2x 3090 rig and I wondered about going EPYC for the potential to have more cards (stuck with AM5 for cheapness though).

All in all I spent about $3500 for everything. I'm guessing this is closer to $12-15k? CPU is around $800 on eBay.

reply
lvl155
4 months ago
[-]
My reason for going Epyc was for Pcie lanes and cheaper enterprise SSDs via U.3/2. With AM5, you tap out the lanes with dual GPUs. Threadripper is preferable but Epyc is about 1/2 of the price or even better if you go last gen.
reply
Eisenstein
4 months ago
[-]
Why do you need such high cross card bandwidth for inference? Are you hosting for a lot of users at once?
reply
oceanplexian
4 months ago
[-]
The Epyc boards make things way easier (I have 4 epyc boards of various generations) because they have loads of x16 slots and you’re not screwing around with bifurcation and sketchy PCI splitters. Another oft-forgotten item that consumes lanes is 25 or 40Gb NICs which you might fine you want if you’re pushing big model files around to other machines or storage.
reply
darknoon
4 months ago
[-]
I tried this w/ AM5, but realized that despite there theoretically being enough lanes for dual x16 PCI-e 4.0 GPUs, I couldn't find any motherboards that are actually configured this way, since dual-GPU is dead in consumer for gaming.
reply
Tepix
4 months ago
[-]
I built this in early 2023 out of used parts and ended up with a cost of 2300€ for AM4/128GB/2x3090 @ PCIe 4.0x8 +nvLink
reply
RockRobotRock
4 months ago
[-]
I haven't been able to find a good answer on what difference NVLink makes or which applications support it.
reply
fragmede
4 months ago
[-]
NVLink is what makes multiGPU work. It lets the GPUs talk to each other across a high bandwidth (600 Gbps), low latency link. Tensorflow and PyTorch both support it, among other things. It's not this weird thing that's a side note, the interconnect between nodes is what makes a supercomputer super. You don't hear about it much because you don't hear about a lot of details of supercomputer stuff in mainstream media.
reply
RockRobotRock
4 months ago
[-]
Thank you, but this doesn't really answer OPs or my question. Is NVLink required if you want to run an LLM model which exceeds the memory of a single GPU? What are the benchmark comparisons with and without it?

I've heard that NVLink helps with training, but not so much with inferencing.

reply
modeless
4 months ago
[-]
I wonder how the cost compares to a Tinybox. $25k for 6x 4090 or $15k for 6x 7900XTX. Of course that's the full package with power supplies, CPU, storage, cooling, assembly, shipping, etc. And a tested, known good hardware/software configuration which is crucial with this kind of thing.
reply
Tepix
4 months ago
[-]
If you merely want CUDA and lots of VRAM there‘s no reason to pick expensive 4090s over used 3090s
reply
halJordan
4 months ago
[-]
Well there is and it's called performance. You dont have to push your version of what an appropriate price/performance ratio is
reply
Tepix
3 months ago
[-]
Well, if you run out of VRAM, you drop off a performance cliff. That's a whole order of magnitude slower than just using a slower GPU but fitting everything into VRAM.
reply
angoragoats
4 months ago
[-]
You can build a setup like in the OP for somewhere around $10k, depending on several factors, the most important of which are the price you source your GPUs at ($700 per 3090 is a reasonable going rate) and what CPU you choose (high core count, high frequency Epyc CPUs will cost more).
reply
itomato
4 months ago
[-]
With a rental option coming, it’s hard for me to imagine a more profitable way to use a node like that.
reply
choilive
4 months ago
[-]
I have a similar setup in my basement! Although its multiple nodes, with a total of 16x3090s. Also needed to install a 30A 240V circuit as well.
reply
lvl155
4 months ago
[-]
That last part is often overlooked. This is also why sometimes it’s just not worth going local especially if you don’t need all that compute power beyond a few days.
reply
buildbot
4 months ago
[-]
100% agree, anything beyond 4x gpu’s is getting into the very annoying to power territory and makes the cloud very attractive. I already can trip a 15A circuit on 115v power with just 3x4090s and a SPR-X cpu.

It also costs a lot to power. In the summer, 2x more than you expect, because unless it’s outside, you need cool 1000+ watts of extra heat with your AC. All that together and runpod starts to look very tempting!

reply
choilive
4 months ago
[-]
Getting that circuit installed was pretty cheap likely because its in an unfinished and unconditioned basement. The basement stays comfortable even during the summer. The heat does seem to work its way into the rest of the house but the additional cooling load is only about 20% more than usual. It lowers the heating cost about the same amount during the winter so it works itself out.
reply
buildbot
4 months ago
[-]
Yeah location of your place, climate, and placement of the server in the house will affect this a lot. I'm on the top story of a building, even in the winter I rarely need to turn on my heat, just getting by on the waste heat of the rest of the building. My assortment of machines will easily keep the living room at 25C+ with a window open unless it's below 10C out! If I could keep the servers in the cool parking garage, I'd save a lot of money...

Getting a circuit put is also much more difficult in a shared building...

Runpod has 3090s for .43 per hour! .22 spot. If your power costs .3$ per kWh, and you need to spend _another_ .3$ per kWh in cooling, say if you live in apartment in the Bay Area and it's summer, that's ~48 days to equal the cost of 30 days on runpod. So you are still saving some money, though much less than you might think and possibly spending more than spot instances!

reply
choilive
4 months ago
[-]
Yeah I worked it out and I am saving ~75% vs running my inference workloads on RunPod. $650/mo in electricity vs $2,500/mo to do the same thing on RunPod. Been in near continuous operation over 9 months, so the system has basically paid for itself with the savings.
reply
flixf
4 months ago
[-]
Very interesting! How are the 8 GPUs connected to the motherboard? Based on the article and the pictures, he doesn't appear to be using PCIe risers.

I have a setup with 3 RTX 3090 GPUs and the PCIe risers are a huge source of pain and system crashes.

reply
lbotos
4 months ago
[-]
I had the same question. I was curious what retimers he was using.

I've had my eye on these for a bit https://c-payne.com/

reply
plantain
4 months ago
[-]
Looks like SlimSAS.
reply
system2
4 months ago
[-]
Typical crypto miner setup. I had two 6GPU setups with 1200W PSUs and 6 PCIE slots with PCI extender cables. Its value dropped harder than a cyber truck's after a few months.

The worst thing is dust. They would accumulate so much every week I had to blow the dust off with an air compressor.

Electricity cost was around $4 a day (24 x $0.20~). If online GPU renting is more expensive, maybe the initial cost could be justifiable.

reply
Havoc
4 months ago
[-]
> Typical crypto miner setup.

Except not doing the sketchy x1 pcie lanes. That’s the part that makes nice LLM setups hard

reply
system2
4 months ago
[-]
Can you tell me what's sketchy about it? I have not had an issue with any one of the 12 extenders and bandwidth held well without any issues. Please explain if possible if LLM requires a different type of extender.
reply
Havoc
4 months ago
[-]
Eh perhaps poor choice of words.

It works fine for crypto but LLM performance is far more sensitive to bandwidth. You lose a ton of performance if you’ve got PCIe in the loop, never mind one lane pcie. That’s why nvlink is (was) a thing - trying to cut that out entirely.

reply
system2
4 months ago
[-]
Got it. I was planning to switch my miners to LLM farm. I will test and see how much of a difference it will make. Thanks.
reply
killingtime74
4 months ago
[-]
Did everyone just miss the fact that the post says the intention is to run Llama 3 405b but it has less than 1/4 of the VRAM required to do so? Did you just change your goals mid build? It's commonly known how much ram is required for a certain parameter size.
reply
nathanasmith
4 months ago
[-]
The system has 512 GB of RAM so while it'll be slower at inference, he really has about 704 GB at his disposal to run the model assuming he distributes the weights across the VRAM and system RAM.
reply
schaefer
4 months ago
[-]
Amazing writeup. And what a heavy hitter of an inaugural blog entry...

This might be the right time to ask: So, on the one hand, this is what it takes to pack 192gb of Nvidia flavored vram into a home server.

I'm curious, is there any hope of doing any interesting work on a MacBook Pro Which currently can be max-spaced at 128 GB of unified memory (for the low, low price of $4.7k).

I know there's no hope of running cuda on the macbook, and I'm clearly out of my depth here. But the possibly naive day-dream of tossing a massive LLM into a backpack is alluring...

reply
Eisenstein
4 months ago
[-]
Download kobodlcpp and give it a try. It is a single exec and uses metal acceleration with an Apple Arm CPU.
reply
schaefer
4 months ago
[-]
Thanks!
reply
sireat
4 months ago
[-]
I was under the mistaken impression that you could not go beyond 2x3090 for reasonable inference speed.

My assumption was that going beyond 2 cards incurs significant bandwidth penalty when going from NVLink between 2x3090s to PCIe for communicating between the other 3090s.

What kind of T/s speeds are you getting with this type of 8x3090 setup?

Presumably then even crazier 16x4090 would be an option for someone with enough PCIe slots/risers/extenders.

reply
SmellTheGlove
4 months ago
[-]
I thought I was balling with my dual 3090 with nvlink. I haven’t quite yet figured out what to do with 48GB VRAM yet.

I hope this guy posts updates.

reply
lxe
4 months ago
[-]
Run 70B LLM models of course
reply
thelastparadise
4 months ago
[-]
Or train a cute little baby llama.
reply
3eb7988a1663
4 months ago
[-]
What is the power draw under load/idle? Does it noticeably increase the room temperature? Given the surroundings (aka the huge pile of boxes behind the setup), curious if you could get away with just a couple of box fans instead of the array of case fans.

Are you intending to use the capacity all for yourself or rent it out to others?

reply
NavinF
4 months ago
[-]
Box fans are surprisingly power hungry. You'd be better off using large 200mm PC fans. They're also a lot quieter
reply
michaelt
4 months ago
[-]
If you care about noise, I also recommend not getting 8 GPUs with 3 fans each :)
reply
illiac786
4 months ago
[-]
I dream from a future where the „home server with heat recuperation“ appliance will be common enough I can get a worker to install it physically for me - I have little electrical skills and zero plumbing skills. And I also hope that by then power consumption will have gone down.
reply
maaaaattttt
4 months ago
[-]
Looking forward to reading this series.

As a side note I’d love to find a chart/data on the cost performance ratio of open source models. And possibly then a $/ELO value (where $ is the cost to build and operate the machine and ELO kind of a proxy value for the average performance of the model)

reply
renewiltord
4 months ago
[-]
I have a similar one with 4090s. Very cool. Yours is nicer than mine where I've let the 4090s rattle around a bit.

I haven't had enough time to find a way to split inference which is what I'm most interested in. Yours is also much better with the 1600 W supply. I have a hodge podge.

reply
deoxykev
4 months ago
[-]
Are you able to run 405B? 4Bit quant vram requirements are just shy of 192GB.
reply
Tepix
4 months ago
[-]
So, how do you connect the 8th card if you have 7 PCIe 4.0 x16 slots available?
reply
manav
4 months ago
[-]
PCIe bifurcation - so splitting one of the x16 slots into two x8 or similar.
reply
metadat
4 months ago
[-]
Worth mentioning - this also cuts the available bandwidth to each card by 50%.
reply
angoragoats
4 months ago
[-]
While you're technically correct, assuming you're using PCIe 4.0 or higher, the performance difference between x8 and x16 is practically zero.
reply
metadat
4 months ago
[-]
Is this because the graphics cards are not using all available PCIe bandwidth? Or why?
reply
angoragoats
4 months ago
[-]
Yes. In fact, even running a video card in a 4x slot (again, assuming PCIe 4.0) results in only a modest (5-20%, depending on what you’re doing) drop in speeds.
reply
Tepix
4 months ago
[-]
Even for training?
reply
angoragoats
4 months ago
[-]
I haven’t done a ton of training but everything I’ve heard and read indicates that PCIe 4.0 8x provides enough bandwidth for just about any application. You might see a negligible drop in performance, but no more than a few percent.
reply
tshadley
4 months ago
[-]
"Why PCIe Risers suck and the importance of using SAS Device Adapters, Redrivers, and Retimers for error-free PCIe connections."

I'm a believer! Can't wait to hear more about this.

reply
elorant
4 months ago
[-]
The motherboard has 7 PCie slots and there are 8 GPUs. So where does the spare one connect to? Is he using two GPUs in the same slot limiting the bandwidth?
reply
ganoushoreilly
4 months ago
[-]
may be using an nvme to pcie adapter, common in the crypto mining world
reply
buildbot
4 months ago
[-]
It’s an epyc server board, it probably has actual U.2/MCIO pcie ports on the board that can be merged back into a 16x slot in the bios. I had/have several boards like that.
reply
lowbloodsugar
4 months ago
[-]
Sometimes I think about dropping $10k to $20k on a rig like this and then I remember I can rent 8xH100s and 8xA100s with 640GB VRAM for $20/hr.
reply
InsomniacL
4 months ago
[-]
When you moved in to your house, did you think you would finish a PC build with 192GB of VRAM before you would finish the plaster boarding?
reply
killingtime74
4 months ago
[-]
Maybe they removed it for better ventilation
reply
LetsGetTechnicl
4 months ago
[-]
Just an eye watering amount of compute, electricity and money just to run LLM's... this is insane. Very cool though!
reply
bogwog
4 months ago
[-]
Awesome! I've always wondered what something like this would look like for a home lab.

I'm excited to see your benchmarks :)

reply
Havoc
4 months ago
[-]
Very cool. But also bit pricey unless you can actually utilize it 24/7 in some productive fashion
reply
throwpoaster
4 months ago
[-]
Did you write this with the LLM running on the rig?
reply
emptiestplace
4 months ago
[-]
Does this post actually seem LLM generated to you?
reply
throwpoaster
4 months ago
[-]
It reads like an LLM draft with a human edit, yes.
reply
cranberryturkey
4 months ago
[-]
this is why we need an actual AI blockchain, so we can donate GPU and earn rewards for the p2p api calls using the distributed model.
reply
walterbell
4 months ago
[-]
> donate GPU .. earn rewards

Is a blockchain needed to sell unused GPU capacity?

reply
bschmidt1
4 months ago
[-]
That's actually interesting. While crypto GPU mining is "purposeless" or arbitrary, would be way cooler if to GPU mine meant to chunk through computing tasks in a free/open queue (blockchain).

Eventually there could be some tipping point where networks are fast enough and there are enough hosting participants it could be like a worldwide/free computing platform - not just for AI for anything.

reply
vunderba
4 months ago
[-]
I also think this idea has been explored a little bit at least in terms of GPU distribution networks for AI (Petal and Horde come to mind).

https://stablehorde.net

https://petals.dev

reply
yunohn
4 months ago
[-]
This idea has been brought up tons of times by grifters aiming to pivot from Crypto to AI. The reason that GPUs are used for blockchains is to compute large numbers or proofs - which are truly useless but still verifiable so they can be distributed and rewarded. The free GPU compute idea misses this crucial point, so the blockchain part is (still) useless unless your aim is to waste GPU compute instead.

IRL all you need is a simple platform to pay and schedule jobs on other’s GPUs.

reply
fragmede
4 months ago
[-]
folding@home predates Bitcoin by eight years. the concept isn't inherent to grifters
reply
yunohn
4 months ago
[-]
Folding at home does not use a blockchain, further proving non-grifters don’t need it. That was the point being discussed, not distributed computing as a concept.
reply
anonymousDan
4 months ago
[-]
I don't think you are being fair to the previous poster. As I read it they are simply pointing out that there is precedent for such decentralized contribution of compute resources. However folding at home doesn't allow to reward users for their contributions AFAIK. So maybe if a Blockchain based reward system could be layered on top of that it could increase participation. That's a big if I grant you but don't see how it is completely inconceivable that such a thing might be possible.
reply
yunohn
4 months ago
[-]
I think the word blockchain confuses people, including you and the previous poster. Maybe you could clarify your “layering” idea and how it would work for further discussion.

Folding at home can track user contributions and issue micro/payments as they see fit. Crucially, this does not need an immutable chain of truth to do.

Instead, if we added a blockchain, then we would require 2 sets of participants - those who run the useful simulations for science, and those who run the useless calculations for the blockchain. A complete waste of resources.

reply
anonymousDan
4 months ago
[-]
No idea about the previous poster but I'm pretty sure I know how blockchain(s) work thanks. I don't claim to have a concrete proposal but the idea of proof of useful work has been around for a while as a research area (https://eprint.iacr.org/2017/203.pdf). Having a system that supports arbitrary computations might be hard, but perhaps any task for which solutions are easy-to-verify but difficult to compute might be a good fit. Alternatively, if creating an open/decentralized compute system is a goal then a proof of stake blockchain could allow users to post tasks with associated rewards (again in cases where solutions are easy-to-verify but hard to compute).
reply
bschmidt1
4 months ago
[-]
I imagined if it was proof-of-work the mining would actually be the compute work requested. Everyone is racing to solve the problem just like in Bitcoin, except the problem is the requested GPU task. The fastest/first one to provide a result gets to update the ledger (and receives the reward).

Maybe you run a private platform too like git/GitHub if there are real world payments and user accounts, but I wonder why couldn't that technology be used? Does "blockchain" just have an irreparably bad name at this point?

reply
cloudking
4 months ago
[-]
Similar concept https://petals.dev/
reply
kcb
4 months ago
[-]
Problem is once you have to scale to multiple GPUs the interconnect becomes the primary bottleneck.
reply
rvnx
4 months ago
[-]
You could just buy a Mac Studio for 6500 USD, have 192 GB of unified RAM and have way less power consumption.
reply
lvl155
4 months ago
[-]
This is something people often say without even attempting to do a major AI task. If Mac Studio were that great they’d be sold out completely. It’s not even cost efficient for inference.
reply
vunderba
4 months ago
[-]
I'm seeing this misunderstanding a lot recently. There's TWO components to putting together a viable machine learning rig:

- Fitting models in memory

- Inference / Training speed

8 x RTX 3090s will absolutely CRUSH a single Mac Studio in raw performance.

reply
1123581321
4 months ago
[-]
Crush by what factor?
reply
lostmsu
4 months ago
[-]
80x-240x
reply
angoragoats
4 months ago
[-]
You could for sure, but the nVidia setup described in this article would be many times faster at inference. So it’s a tradeoff between power consumption and performance.

Also, modern GPUs are surprisingly good at throttling their power usage when not actively in use, just like CPUs. So while you need 3kW+ worth of PSU for an 8x3090 setup, it’s not going to be using anywhere near 3kW of power on average, unless you’re literally using the LLM 24x7.

reply
exyi
4 months ago
[-]
Even if you are running it constantly, the per token power consumption is likely going to be in a similar range, not to mention you'd need 10+ macs for the throughput.
reply
robotnikman
4 months ago
[-]
I have a 3090 power capped at 65%, I only notice a minimal difference in performance
reply
cranberryturkey
4 months ago
[-]
Can Reflection:70b work on them?
reply
christianqchung
4 months ago
[-]
Pretty sure it'll work where any 70b model would, but it's probably not noticably better than Llama 3.1 70b if the reports I'm reading now are correct.[1]

[1]https://x.com/JJitsev/status/1832758733866222011

reply
angoragoats
4 months ago
[-]
Maybe you meant to reply to a different comment? Work on what?

Edit: I guess to directly answer your question, I don’t see why you couldn’t run a 70b model at full quality on either a M2 192GB machine or on an 8x 3090 setup.

reply
steve_adams_86
4 months ago
[-]
I know it's a fraction of the size, but my 32GB studio gets wrecked by these types of tasks. My experience is that they're awesome computers in general, but not as good for AI as people expect.

Running llama3.1 70B is brutal on this thing. Responses take minutes. Someone running the same model on 32GB of GPU memory seems to have far better results from what I've read.

reply
irusensei
4 months ago
[-]
You are probably swapping. On M3 max with similar memory bandwidth the output is around 4t/s which is normally on par with most people's reading speed. Try different quants.
reply
steve_adams_86
4 months ago
[-]
I'm on an M2 max so I shouldn't be too far behind. I'm not actually sure how the model I'm using was quantized to be honest. I'll give it a try.
reply
flemhans
4 months ago
[-]
Are people running llama 3.1 405B on them?
reply
rspoerri
4 months ago
[-]
I'm running 70B models (usually in q4 .. q5_k_m, but possible to q6) on my 96Gbyte Macbook Pro with M2-Max (12 cpu cores, 38 gpu cores). This also leaves me with plenty of ram for other purposes.

I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.

edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.

reply
throwthrowuknow
4 months ago
[-]
Not going to work for training from scratch which is what the author is doing.
reply
rspoerri
4 months ago
[-]
192GByte of RAM are not enough to train 405B models. Reflection 70B requires 140GByte of RAM in fp16, 405 would need ~810Gbyte of RAM.
reply
throwthrowuknow
4 months ago
[-]
Pretty sure he said he’s inferencing llama3 405 and training his own custom model from scratch. He didn’t say how big his custom model will be.
reply
kcb
4 months ago
[-]
and have way less power
reply