And suddenly you can start playing with distributed software, even though it's running on a single machine. For resiliency tests you can unplug one machine at a time with a single click. It will annihilate a Pi cluster in Perf/W as well, and you don't have to assemble a complex web of components to make it work. Just a single CPU, motherboard, m.2 SSD, and two sticks of RAM.
Naturally, using a high core count machine without virtualization will get you best overall Perf/W in most benchmarks. What's also important but often not highlighted in benchmarks in Idle W if you'd like to keep your cluster running, and only use it occasionally.
I run a K8s "cluster" on a single xcp-ng instance, but you don't even really have to go that far. Docker Machine could easily spin up docker hosts with a single command, but I see that project is dead now. Docker Swarm I think still lets you scale up/down services, no hypervisor required.
It was also how I learned to setup a Hadoop cluster, and a Cassandra cluster (this was 10 years ago when these technologies were hot)
Having knowledge of these systems and being able to talk about how I set them up and simulated recovery directly got me jobs that 2x and then 3x my salary, I would highly recommend all medium skilled developers setup systems like this and get practicing if you want to get up into the next level
Or the oldie-but-goodie paper "Scalability! But at what COST?": https://www.usenix.org/system/files/conference/hotos15/hotos...
Long story short, performance considerations with parallelism go way beyond Amdahl's Law, because supporting scale-out also introduces a bunch of additional work that simply doesn't exist in a single node implementation. (And, for that matter, multithreading also introduces work that doesn't exist for a sequential implementation.) And the real deep down black art secret to computing performance is that the fastest operations are the ones you don't perform.
> After fixing the thermals, the cluster did not throttle, and used around 130W. At full power, I got 325 Gflops
I was sort of surprised to find that the top500 list on their website only goes back to 1993. I was hoping to find some ancient 70’s version of the list where his ridiculous Pi cluster could sneak on. Oh well, might as well take a look… I’ll pull from the sub-lists of
https://www.top500.org/lists/top500/
They give the top 10 immediately.
First list (June 1993):
placement name RPEAK (GFlop/s)
1 CM-5/1024 131.00
10 Y-MP C916/16256 15.24
Last list he wins, I think (June 1996): 1 SR2201/1024 307.20
10 SX-4/32 64.00
First list he’s bumped out of the top 10 (November 1997): 1 ASCI Red 1,830.40
10 T3E 326.40
I think he gets bumped off the full top500 list around 2002-2003. Unfortunately I made the mistake of going by Rpeak here, but they sort by Rmax, and I don’t want to go through the whole list.Apologies for any transcription errors.
Actually, pretty good showing for such a silly cluster. I think I’ve been primed by stuff like “your watch has more compute power than the Apollo guidance computer” or whatever to expect this sort of thing to go way, way back, instead of just to the 90’s.
But certainly don’t imitate his choices, his economics aren’t your economics!
Then look at Apple’s ARM offerings, and AWS Graviton if you need ARM with raw power.
If you need embedded/GPIO you should consider an Arduino, or clone. If you need GPIOs and Internet connectivity, look at an ESP32. GPIOs, ARM and wired ethernet? Consdier the the STM32H.
Robotics/machine vision applications, needing IO and lots of compute power? Consider a regular PC with an embedded processor on serial or USB. Or nvidia jetson if you want to run CUDA stuff.
And take a good hard look at your assumptions, as mini PCs using the Intel N100 CPU are very competitive with modern Pis.
But single board computers with something external to do your GPIO is often way more compelling.
A lot of others are stuck in a loop where they essentially review tech for making more youtube videos - render times, colour accuracy, camera resolution, audio fidelity.
They are essentially for kids to play around with learning computers by blinking LEDs and integrating with circuit boards. The idea of building a high performance cluster with pis is dumb from day one
Where a "kid" may be a 53 years old with 30+ years softdev experience who ultimately got to get to the stuff he wanted to for quite some time, and the "blinking LEDs" are a bunch of servos programmatically controlled based on input from a bunch of sensors. While there are definitely better alternatives based on various narrow metrics, especially when it may come to actual productization, the ease (and cheapness, so you don't think much about that spending) of starting with all those easily available for RPi servo array drive boards and various IO ports array boards and all the available software - it is hard to imagine how it can be more easy/cheaper/available than it already is with all that actual compute power and full-featured Linux environment.
I would be pretty regretful of just the first sentence in the article, though:
> I ordered a set of 10 Compute Blades in April 2023 (two years ago), and they just arrived a few weeks ago.
That's rough.
Somehow I've actually gotten every item I backed shipped at some point (which is unexpected).
Hardware startups are _hard_, and after interacting with a number of them (usually one or two people with a neat idea in an underserved market), it seems like more than half fail before delivering their first retail product. Some at least make it through delivering prototypes/crowdfunded boards, but they're already in complete disarray by the end of the shipping/logistics nightmares.
And then, there's the sourcing problem. Components that looked like they were in big supply when the hardware was specced, can end up being in short supply, or worse end of lifed while you're trying to get all the firmware working.
It's most fun when you can prove the vendor's datasheet is lying about some pin or some function, but they still don't update it after a decade or more. So everyone integrating the chip who hasn't before hits the exact same speed bump!
Faith in the perfect efficiency of the free market only works out over the long term. In the short term we have a lot of habits that serve as heuristics for doing a good job most of the time.
For those like me that don't know the joke:
Two economists are walking down the street. One of them says “Look, there’s a twenty-dollar bill on the sidewalk!” The other economist says “No there’s not. If there was, someone would have picked it up already.”
Competition is what creates efficiency. Without it you live in a lie.
... and even then it doesn't always prove true.
They're good for long as the development costs dominate the total costs.
If one just wants a cheap desktop box to do desktop things with, then they're a terrible option, price-wise, compared to things like used corpo mini-PCs.
But they're reasonably cost-competitive with other new (not used!) small computers that are tinkerer-friendly, and unlike many similar constructs there's a plethora of community-driven support for doing useful things with the unusual interfaces they expose.
The current RPi 5 makes no sense to me in any configuration, given its pricing.
If your server has a lot of idle time, ARM will always win.
Nobody is really building CPU clusters these days.
The best option for DP throughput for hobbyists interested in HPC might be old AMD cards from before they, too, realized that scientific folks would pay up the nose for higher precision.
Frontier is right behind it with the same arrangement.
Having honest to god dedicated GPUs on their own data bus with their own memory isn't necessarily the fastest way to roll.
For comparison there are 9,988,224 GPU compute units in El Capitan and only 1,051,392 CPU cores. Roughly one CPU core to push data to 10 GPU CUs.
If your goal is to play with or learn on a cluster of Linux machines, the cost effective way to do it is to buy a desktop consumer CPU, install a hypervisor, and create a lot of VMs. It’s not as satisfying as plugging cables into different Raspberry Pi units and connecting them all together if that’s your thing, but once you’re in the terminal the desktop CPU, RAM, and flexibility of the system will be appreciated.
Handy: https://700c.dk/?powercalc
My Pi CM4 NAS with a PCIe switch, SATA and USB3 controllers, 6 SATA SSDs, 2 VMs, 2 LXC containers, and a Nextcloud snap pretty much sits at 17 watts most of the time, hitting 20 when a lot is being asked of it, and 26-27W at absolute max with all I/O and CPU cores pegged. €3.85/mo if I pay ESB, but I like to think that it runs fully off the solar and batteries :)
Pretty sure most of us aren't running anywhere close to full load 24/7, but whoa, Irish power is expensive. In the central US I pay $0.14/KWh.
It's definitely not suited for production, but there, you won't find old blade servers either (for the power to performance issue).
If the goal is a lot of RAM and you don’t care about noise, power, or heat then these can be an okay deal.
Don’t underestimate how far CPUs have come, though. That machine will be slower than AMD’s slowest entry-level CPU. Even an AMD 5800X will double its single core performance and even walk away from it on multithreaded tasks despite only having 8 cores. It will use less electricity and be quiet, too. More expensive, but if this is something you plan to leave running 24/7 the electricity costs over a few years might make the power hungry server more expensive over time.
That combo gives you the better part of a gigabyte of L3 cache and an aggregate memory bandwidth of 600 GB/s, while still below 1000W total running at full speed. Plus your NICs are the fancy kind that let you play around with RoCEv2 and such nifty stuff.
It would also be relevant to then also learn how to do stuff properly with SLURM and Warewulf etc. instead of a poor mans solution with Ansible playbooks like in these blog posts.
Commodity desktop cpus with 32 or 64GB RAM can do all of this in a low-power and quiet way without a lot more expense.
The only problem in practice is that server CPUs don't support S3 suspend, so putting whole thing to sleep after finishing with it doesn't work.
Makes me wonder if I should unplug more stuff when on vacation.
Fuckin nutty how much juice those things tear through.
Rates have gone up enormously because the cost of wildfires is falling on ratepayers, not the utility owners.
Regulated monopolies are pretty great, aren’t they? Heads I win, tales you lose.
That said, I'm of the opinion that power/water/internet should all be state/county/city ran. I don't want my utilities companies to have profit motives.
My water company just got bought up by a huge water company conglomerate and, you guessed it, immediate rate increases.
If your local regulators approved the merger and higher rates, your complaint is with them as much as the utility company.
Not saying that some regulators are not basically rubber stamps or even corrupt.
I did (as did others), in fact, write in comments and complaints about the rate increases and buyout. That went unheard.
https://core.coop/my-cooperative/rates-and-regulations/rate-...
https://www.servethehome.com/lenovo-system-x3650-m5-workhors...
What's the margin on unplugging vs just powering off?
$50/month for 100W continuous usage isn't totally mad, and that could climb even higher over the rest of the decade.
Still only $50/month, not $150, but I very much care about 100W loads doing no work.
That said, I am not sure those numbers are true. I am in California (PG&E with East Bay community generation), and my TOU rates are much lower than those.
Also $150 for 100w is crazy, thats like $1.70 per kWh; it would cost about $150 a year at the (high) rates of southern Sweden.
Personally it’s cheaper to buy the hardware that does spend most of its time idling. Fast turnaround on very large private datasets being key.
It also means it performs like a 10 year old server CPU, so those 28 threads are not exactly worth a lot. The geekbench results, for whatever value those are worth, are very mediocre in the context of anything remotely modern: https://browser.geekbench.com/processors/intel-xeon-e5-2690-...
Like a modern 12-thread 9600x runs absolute circles around it https://browser.geekbench.com/processors/amd-ryzen-5-9600x
The homelab group on Reddit is full of people who don't understand any of this - they have full racks in their house that could be replaced with one high-end desktop.
A lot of that group is making use of the IO capabilities of these systems to run lots of PCI-E devices & hard drives. There's not exactly a cost-effective modern equivalent for that. If there were cost-effective ways to do something like take a PCI-E 5.0 x2 and turn it into a PCI-E 3.0 x8 that'd be incredible, but there isn't really. So raw PCI-E lane count is significant if you want cheap networking gear or HBAs or whatever, and raw PCI-E lane count is $$$$ if you're buying new.
Also these old systems mean cheap RAM in large, large capacities. Like 128GB RAM to make ZFS or VMs purr is much cheaper to do on these used systems than anything modern.
Like if you have a large media library, you need to push maybe 10MB/s, you don't need 128GB of RAM to do that...
It's mostly just hardware porn - perhaps there are a few legit use cases for the old hardware, but they are exceedingly rare in my estimate.
For just streaming a 4k bluray you need more than 10MB/s, Ultra HD bluray tops out at 144 Mbit/s. Not to mention if that system is being hit by something else at the same time (backup jobs, etc...).
Is the 128GB of RAM just hardware porn? Eh, maybe, probably. But if you want 8+ bays for a decent sized NAS then you're already quickly into price points at which point these used servers are significantly cheaper, and 128GB of RAM adds very little to the cost so why not.
If anything, 2nd hand AMD gaming rigs make more sense than old servers. I say that as someone with always off r720xd at home due to noise and heat. It was fun when I bought it during winter years ago, until summer came.
And what case are you putting them into? What if you want it rack mounted? What about >1gig networking? What if I want a GPU in there to do whisper for home assistant?
Used gaming rigs are great. But used servers also still have loads of value, too. Compute just isn't one of them.
Maybe one of the Fractal Designs cases with a bunch of drive bays?
> What if you want it rack mounted?
Companies like Rosewill sell ATX cases that can scratch that itch.
> What about >1gig networking?
What about PCI Express card? Regular ATX computers are expandable.
> What if I want a GPU in there to do whisper for home assistant?
I mean... We started with a gaming rig, right? Isn't a GPU already implicit?
A lot of business are paying obscene money to cloud providers when they could have a pair of racks and the staff to support it.
Unless you're paying attention to the bleeding edge of the server market, to its costs (better yet features and affordability) this sort of mistake is easy to make.
The article is by someone who does this sort of thing for fun, and views/attention, and im glad for it... it's fun to watch. But it's sad when this same sort of misunderstanding happens in professional settings, and it happens a lot.
So for $3000, that's 3000 hours, or 125 days, (if just wastefully leave them on all the time, instead of turning them on when needed).
Say you wanted to play around for a couple of hours, that's like.. $3.
(That's assuming there's no bonus for joining / free tier, too.)
The desktop equivalent of your 10 T3 Micro instances is about $600 if you buy new. For example a Lenovo ThinkCentre M75q Gen 2 Tiny 11JN009QGE has 8x3.2GHz processor with hyperthreading. That's 16 virtual cores compared to the 20 vcpus of the T3 instances, but with much faster cores. And 16GB RAM allows you to match the 1GB per instance.
If you don't have anything and feel generous throw in another $200 for a good monitor and keyboard plus mouse. But you can get a used crap monitor for $20. I'd give you one for free just to be rid of it.
That's a total of $800, or 33 days of forgetting to shut down the 10 VMs. Maybe half that if you buy used.
Granted not everyone has $800 or even $400 to drop on hobby projects, renting VMs often does make sense
I regularly rent this for a few hours at a time for learning and prototyping
This. Some cloud providers offer VMs with 4GB RAM and 2 virtual cores for less than $4/month. If your goal is to learn how to work with clusters, nothing beats firing up a dozen VMs when it suits your fancy, and shut them down when playtime is over. This is something you can pull off in a couple of minutes with something like an Ansible script.
In the cloud, worst case I have a bill over 5-6 digits.
And I know my ADD, 2 is not super unlikely.
But if you're someone like me who intends to actively use the hardware for real-world purposes, the cloud often simply can't compete on price. At home, I have a mini PC with a 5600G, 32GB of RAM, and a few TBs of NVME storage. The entire thing cost less than $600 a few years ago, and consumes around 20W of power on average.
Even on the cheapest cloud providers available, an equivalent setup would exceed that price in less than half a year. SSD storage in particular is disproportionately expensive on the cloud. For small VMs that don't need much storage, it does make sense, but as soon as you scale up, cloud prices quickly start ballooning.
You don’t need hardware to learn. Sure it helps but you can learn from a book and pen and paper exercises.
[1] The Framework Desktop is a beast:
https://news.ycombinator.com/item?id=44841262
[2] HP ZBook Ultra:
Also the Mac Studio is a bit hampered by its low compute-power, meaning you really can't use a 100b+ dense model, only MoE feasibly without getting multi minute prompt-processing times (assuming 500+ tokens etc.)
It was expensive, but slow it is not for small queries.
Now, if I want to bump the context window to something huge, it does take 10-20 seconds to respond for agent tasks, but it’s only 2-3x slower than paid cloud models, in my experience.
Still a little annoying, and the models aren’t as good, but the gap isn’t nearly as big as you imply, at least for me.
On my 96 GB DDR5-6000 + RTX 5090 box, I see ~20s prefill latency for a 65k prompt and ~40 tok/s decode, even with most experts on the CPU.
A Mac Studio will decode faster than that, but prefill will be 10s of times slower due to much lower raw compute vs a high-end GPU. For long prompts that can make it effectively unusable. That’s what the parent was getting at. You will hit this long before 65k context.
If you have time, could you share numbers for something like:
llama-bench -m <path-to-gpt-oss-120b.gguf> -ngl 999 -fa 1 --mmap 0 -p 65536 -b 4096 -ub 4096
Edit: The only Mac Studio pp65536 datapoint I’ve found is this Reddit thread:
https://old.reddit.com/r/LocalLLaMA/comments/1jq13ik/mac_stu ...
They report ~43.2 minutes prefill latency for a 65k prompt on a 2-bit DeepSeek quant. Gpt-oss-120b should be faster than that, but still very slow.
Right now the Macs are viable purely because you can get massive amounts of unified memory. Be pretty great when they have the massive matrix FMA performance to complement it.
>cost effective
lmao
Did OP really think his fellow humans are that moronic that they just didn't find out you can plug in together a cuple of rasperri pis?
Some Raspberry Pi products are sold at a loss, so I could see how it's in the realm of possibility.
YouTube is absolute jam packed full of people pitching home "lab" sort of AI buildouts that are just catastrophically ill-advised, but it yields content that seems to be a big draw. For instance Alex Ziskind's content. I worry that people are actually dumping thousands to have poor performing ultra-quantized local AIs that will have zero comparative value.
1) How much worse / more expensive are they than a conventional solution?
2) What kinds of weird esoteric issues pop up and how do they get solved (e.g. the resizable BAR issue for GPU's attached to RPi's PCIe slot)
Another fun fact, the network module of the pi is actually connected to the USB bus, so there's some overhead as well as a throughput limitation.
Fun fact, the Pi does not have a power button, relying on software to shut down cleanly. If you lose access to the machine, it's not possible to avoid corrupted states on the disk.
Despite all of this, if you want to self host some website, the raspberry pi is still an amazingly cost effective choice, from anywhere between 2 to 20000 monthly users, one pi will be overprovisioned. And you can even get an absolutely overkill redundant pi as a failover, but still a single pi can reach 365 days of uptime with no problem, and as long as you don't reboot or lose power or lose internet, you can achieve more than a couple of nines of reliability.
But if you are thinking of a third, much less a 10th raspberry pi, you are probably scaling the wrong way, way before you reach the point where a quantity matters ( a third machine), it becomes cost effective to upgrade the quality of your one or two machines.
On the embedded side it's the same story, these are great for prototyping, but you are not going to order 10k and sell them in production, maybe a small 100 test batch? But you will optimize and make your own PCB before a mass batch.
It's really not though. I've been a Pi user and fan since it was first announced, and I have dozens of them, so I'm not hating on RPi here; we did the maths some time back here on HN when something else Pi related came up.
If you go for a Pi5 with say 8GB RAM, by the time you factor in an SSD + HAT + PSU + Case + Cooler (+ maybe a uSD), you're actually already in mini-PC price territory and you can get something much more capable and feature complete for about the same price, or for a few £ more, something significantly more capable, better CPU, iGPU, you'll get an RTC, proper networking, faster storage, more RAM, better cooling, etc, etc, and you won't be using much more electricity either.
I went this route myself and have figuratively and literally shelved a bunch of Pis by replacing them with a MiniPC.
My conclusion, for my own use, after a decade of RPi use, is that a cheap mini PC is the better option these days for hosting/services/server duty and Pis are better for making/tinkering/GPIO related stuff, even size isn't a winner for the Pi any more with the size of some of the mini-PCs on the market.
The only 100% required thing on there is some sort of power supply, and an SD card, and I suspect a lot of people have a spare USB-C cable and brick lying around. A cooler is only recommended if you're going to be putting it under sustained CPU load, and they're like $10 on Amazon.
Particularly with Pi 5, any old brick that might be hanging around has a fair chance at not being able to supply sufficient power.
Zero of any of that is needed. The new Pi "works best" with a cooler sure but at standard room temps will be fine for serving web apps and custom projects and things. You do not need an SSD. You do not need a HAT for anything.
Apparently the Pi 5 8gb is $120 though WTF.
What personal web site or web app or project can't run just fine on a Pi Zero 2 though? It's a little RAM starved but performance wise it should be sufficient.
Other than second-hand mini PCs, old laptops also make great home servers. They have built in UPS!
Also the other peripherals you consider are irrelevant, since you would need them (or not), in other setups. You can use a pi without a PSU for example. And if you use an SSD, you have to consider that cost in whatever you compare it to.
>I went this route myself and have figuratively and literally shelved a bunch of Pis
>and I have dozens of them,
Reread my post? I meant specifically that Pis are great for the 1 to 2 range. with 3 pis you should change to something else. So I'm saying they are good at the 100$-200$ budget, but bad anywhere above that.
From the official website:
> Does Raspberry Pi 5 need active cooling?
> Raspberry Pi 5 is faster and more powerful than prior-generation Raspberry Pis, and like most general-purpose computers, it will perform best with active cooling.
Starting with the Pi 4, they started saying that a cooler isn't required, but that it may thermal throttle without one if you keep the CPU pegged.
> Another fun fact, the network module of the pi is actually connected to the USB bus, so there's some overhead as well as a throughput limitation.
> Fun fact, the Pi does not have a power button, relying on software to shut down cleanly. If you lose access to the machine, it's not possible to avoid corrupted states on the disk.
With all these caveats in mind, a raspberry pi seems to be an incredibly poor choice for distributed computing
Exactly. This build sounds like the proverbial "1024 chickens" in Seymour Cray's famous analogy. If nothing else, the communications overhead will eat you alive.
> DO NOT TAKE HOME THE FREE 1U SERVER YOU DO NOT WANT THAT ANYWHERE A CLOSET DOOR WILL NOT STOP ITS BANSHEE WAIL TO THE DARK LORD AN UNHOLY CONDUIT TO THE DEPTHS OF INSOMNIA BINDING DARKNESS TO EVEN THE DAY
Currently the cloud providers are dumping second gen xeon scalables and those things are pigs when it comes to power use.
Sound wise its like someone running a hair dryer at full speed all the time and it can be louder under load.
Don’t hate the player, hate the game.
The ones that are dead straight with no clickbait are 10/10 (the worst performers), and usually by a massive margin. Even with the same thumbnail.
The sad fact is, if you want your work seen on YouTube, you can't just say "I built a 10 node Raspberry Pi blade cluster and ran HPL and LLMs on it".
Some people are fine with a limited audience. And that's fine too! I don't have to write on my blog at all—I earn negative income from that, since I pay for hosting and a domain, but I hope some people enjoy the content in text form like I do.
Youtube demonstrably wants clickbait titles and thumbnails. They built tooling to automatically A/B test titles and thumbnails for you.
Youtube could fix this and stop it if they want, but that might lose them 1% of business so they never will.
They love that you blame creators for this market dynamic instead of the people who literally create the market dynamic.
Like what about the people who maintain the alpha/sparc/parisc linux kernels? Or the designers behind idk tilera or tenstorrent hardware.
I do get to see and play with a lot of interesting systems, but for most of them, I only get to go just under surface-level. It's a lot different seeing someone who's reverse engineered every aspect of an IBM PC110, or someone who's restored an entire old mainframe that was in storage for years... or the group of people who built an entire functional telephone exchange with equipment spread over 50 years (including a cell network, a billing system, etc.).
This is why I said "almost anyone." If I changed your words, I could disagree with you as well.
I greatly respect Jeff's work, but he's a professional YouTuber, so his projects will necessarily lean towards clickbait and riding trends (Jeff, I don't mean this as criticism!) He's been a great advocate for doing interesting things with RasPis, but "interesting" != "rational"
Thank you!
Slower. 4 times slower.
TL;DR, just buy one framework desktop and it's better than the Pi AI cluster of the OP in every single performance metrics including cost, performance, efficiency, headache, etc.
And regarding efficiency, in CPU-bound tasks, the Pi cluster is slightly more efficient. (Even A76 cores on a 16nm node still do well there, depending on the code being run).
Unless you can keep your compute at 70% average utilization for 5 years - you will never save money purchasing your hardware compared to renting it.
$3,000 is well under many "oopsie billsies" from cloud providers.
And that's outside of the whole "I own it" side of the conversation, where things like latency, control, flexibility, & privacy are all compelling reasons to be willing to spend slightly more.
I still run quite a number of LLM services locally on hardware I bought mid-covid (right around 3k for a dual RTX3090 + 124gb system ram machine).
It's not that much more than you'd spend if you're building a gaming machine anyways, and the nifty thing about hardware I own is that it usually doesn't stop working at the 5 year mark. I have desktops from pre-2008 still running in my basement. 5 year amortization might have the cloud win, but the cloud stops winning long before most hardware dies. Just be careful about watts.
Personally - I don't think pi clusters really make much sense. I love them individually for certain things, and with a management plane like k8s, they're useful little devices to have around. But I definitely wouldn't plan to get good performance from 10 of them in a box. Much better off spending roughly the same money for a single large machine unless you're intentionally trying to learn.
If it's for personal use, do whatever... there's nothing wrong with buying a $60,000 sports car if you get a lot of enjoyment out of driving it. (you could also lease if you want to trade up to the "faster model" next year) For business, renting (and managed hosting) makes more sense.
Like, if you buy that card it can still be processing things for you a decade from now.
Or you can get 3 months of rental time.
---
And yes, there is definitely a point where renting makes more sense because the capital outlay becomes prohibitive, and you're not reasonably capable of consuming the full output of the hardware.
But the cloud is a huge cash cow for a reason... You're paying exorbitant prices to rent compared to the cost of ownership.
But also when it comes to Vast/RunPod it can be annoying and genuinely become more expensive if you have to rent 2x the number of hours because you constantly have to upload and download data, checkpoints, continuous storage costs, transfer data to another server because the GPU is no longer available, etc. It's just less of a headache if you have an always available GPU with a hard drive plugged into the machine and that's it
Plus cloud gaming is always limited in range of games, there are restrictions on how you can use the PC (like no modding and no swapping savegames in or out).
2) Hardware optimization (the exact GPU you want may not always be available for some providers)
3) Not subject to price changes
4) Not subject to sudden Terms of Use changes
5) Know exactly who is responsible if something isn't working.
6) Sense of pride and accomplishment + Heating in the winter
I assumed this was a novelty, like building a RAID array out of floppy drives.
The economics of spending $3,000 on a video probably work out fine.
A lot of people (here, Reddit, elsewhere) speculate about how good/bad a certain platform or idea is. Since I have the means to actually test how good or bad something is, I try to justify the hardware costs for it.
Similar to testing various graphics cards on Pis, I've probably spent a good $10,000 on those projects over the past few years, but now I have a version of every major GPU from the past 3 generations to test on, not only on Pi, but other Arm platforms like Ampere and Snapdragon.
Which is fun, but also educational; I've learned a lot about inference, GPU memory access, cache coherency, the PCIe bus...
So a lot of intangibles, many of which never make it directly into a blog post or video. (Similar story with my time experiments).
Yeah, this is a now-long-wide-known issue with LLM processing. This can be remediated so that all nodes split computation, but then you'll come back to classical supercomputing problem of node interconnect latency/bandwidth bottlenecks.
It looks to me that many such interconnect simulate Ethernet cards. I wonder if it can be recreated using the M.2 slot rather than using that slot for node-local data, and cost effectively so(like cheaper than bunch of 10GE cards and short DACs).
I believe the Rasp Pi cluster is one of the cheapest multi node / MPI machines you can buy. That's useful even if it is t fast. You need to practice the programming interfaces, not necessarily make a fast computer.
However, NUMA is also a big deal. The various AMD Threadrippers with multi-die memory controllers are better on this regards. Maybe the aging Threadrippers 1950x, yes it's much slower than modern chips but the NUMA issues are exaggerated (especially poor) on this old architecture.
That exaggerates the effects of good NUMA and now you as a programmer can get more NUMA skills.
Of course, the best plan is to spend $20,000,000++ on your own custom NUMA nodes cluster out of EPYCs or something.
-------
But no. The best supercomputers are your local supercomputers that you should rent some time from. You need a local box to see various issues and learn to practice programming.
I think the biggest problem with cluster products is that they just don't work out of the box. Vendors haven't really done the "last 2%" of development required to make them viable - its left to us purchasers to get the final bits in place.
Still, it'll make a fun distributed computing experimental platform some day.
Just like the Inmos Transputer I've got somewhere, sitting in a box, waiting for a power supply ..
Not so good, and this is the sort of title. you need to bring the punters in for YouTube.
I don't mean to sound too cynical, I appreciate Jeff's videos, just wanted to point out that if you've spent money and time on content you can either ditch it or make a regret video.
Just so long as the thumbnails don't have an arrow on them I'm happy.
Which got me thinking about how do these frontier AI models work when you (as a user) run a query. Does your query just go to one big box with lots of GPUs attached and it runs in a similar way, but much faster? Do these AI companies write about how their infra works?
And yes, they basically have 1 Tbps+ interconnects and throw tens or hundreds of GPUs at queries. Nvidia was wise to invest so much in their networking side—they have massive bandwidth between machines and shared memory, so they can run massive models with tons of cards, with minimal latency.
It's still not as good as tons of GPU attached to tons of memory on _one_ machine, but it's better than 10, 25, or 40 Gbps networking that most small homelabs would run.
I’m finally at the point where I can dedicate time for building an AI with a specific use case in mind. I play competitive paintball and would like to utilize AI for a handful of things. Specifically hit detections in video streams. Pi’s were my natural choice simply because of low cost of entry and wide range of supported products to get a PoV running. I even thought about reaching out to Jeff and asking his input.
This post didn’t change my direction too much, but it did help level set some realistic expectations. So thanks for sharing.
Quickly learned that there is so much more to manage when you split a task up across systems, even when the system (like Cinema 4D) is designed for it.
Not at all the best, but they were cheap. If i WANTED the best or reliable, i'd actually buy real products.
'Worth it any more'? At this size, never. A Pi is a Pi is a Pi!
A few are fine for toying around; beyond that, hah. Price:perf is rough, does not improve with multiplication [of units, cost, or complexity].
Was it fast? No. But that wasn't the point. I was learning about distributed computing.
Maybe I'm missing something.
But still can be decent for HPC learning, CI testing, or isolated multi-node smaller-app performance.
Getting some NUC-like machines makes a lot more sense to me. You’ll get 2.5Gb/s Ethernet at the least and way more FlOPS as well.
I don't know anyone who would think this actually.
You'd be surprised by the number of emails, Instagram DMs, YouTube comments, etc. I get—even after explicitly showing how bad a system is at a certain task—asking if a Pi would be good for X, or if they could run ChatGPT on their laptop...
Was it a learning experience?
More importantly, did you have some fun? Just a little? (=
Also no. The guy's a youtuber
On the other hand, will this make him 100+k views? Yes. It's bait - the perfect combo to attract both the AI crowd and the 'homelab' enthusiasts (of which the bulk are yet to find any use for their raspberry devices)...
Jeff has many useful OSS software used by many companies around the world daily - including mine. What have you created ?
https://www.youtube.com/c/JeffGeerling
"978K subscribers 527 videos"
Jeff's had a pattern of embellishing controversies, misrepresenting what people say, and using his platform to create narratives that benefit his content's engagement. This is yet another example of farming outrage to get clicks. I don't understand why people drool over his content so much.
I then used many of his ansible playbooks on my day to day job, which paid my bills and made my career progress.
I don't check youtube so I didn't know that he was an "youtuber", I do know his other side and how mucH I have leveraged his content/code in my career
Not that its a problem, I don't see why it would inherently be a negative thing. Dude seems to make some good content across a lot of different mediums. Cheers to Jeff.
https://www.jeffgeerling.com/projects
And the inference is that he is doing this for clicks, i.e. clickbait. The very title is disingenuous.
Your attack on the poster above you is childish.
Nothing that is not AGPL-licensed, so you and your company haven't taken advantage of it.
I am not sure how this relates to my comment though.
All you needed to do is buy 4x xtx 7900 used on ebay and build a four node raspberry pi cluster using the external GPU setup you've come up with in one of your previous blog posts [0].
[0] https://www.jeffgeerling.com/blog/2024/use-external-gpu-on-r...
the common denominator is always capital gain
capitalism is the reason why we haven't been able to go back to the moon and build bases there
blanket-blaming capitalism without good reasoning is becoming the new red-flag of "can't think critically"
private space companies, despite decades of hype and funding, have stagnated by comparison
the fact that SpaceX depends heavily on government contracts just to function is yet another proof: their "innovation" isn't self sustaining, it's underwritten by taxpayer money
are you denying that NASA landed on the Moon?
Elon psyop doesn't work on me, i know who is behind it all, they need a charismatic sales man for the masses, just like Ford, Disney, Reagan and all, masking structural power with a digestible story for the masses
> blanket-blaming capitalism without good reasoning is becoming the new red-flag of "can't think critically"
it's quite the opposite, people unable to take criticism of capitalism, talk about "critical thinking", how is China doing?
It’s an overrated, overhyped little computer. Like ok it’s small I guess but why is it the default that everyone wants to build something new on? Because it’s cheap? Whatever happened to buy once, cry once? Why not just build an actual powerful rig? For your NAS? For your firewalls? For security cameras? For your local AI agents?
Cf. the various Beagle boards which have mainline linux and u-boot support right from release, together with real open hardware right down to board layouts you can customise. And when you come to manufacture something more than just a dev board, you can actually get the SoC from your normal distributor and drop it on your board - unlike the strange Broadcom SoCs rpi use.
I'm quite a lot more positive about rp2040 and rp2350, where they've at least partially broken free of that Broadcom ball-and-chain.
No, you are dismissive because you don't care about the use-cases.
The R.Pi 4 , 400, and the 500 are great models. Consider all the advantages together:
i= support for current Debian
ii= stellar community
iii= ease of use (UX), especially for people new to Debian and/or coding and/or Linux
iv= quiet, efficient, low power and passively cooled
v= robust enough to be left running for a long time
There are cheaper, more performant x86 and ARM dev boards and SOCs. But nothing compares to the full set of advantages.
That said, building a $3K A.I. cluster is just a senseless, expensive lark. (^;
I don't need to transcode + I need something I can leave on that draws little power.
I have a powerful rig, but the one time I get to turn it off is when I'd need the media server lol.
There's a lot of scenarios where power usage comes into play.
These clusters don't make much sense to me though.
I know for many who run SBCs (RK3588, Pi, etc.), very little is 1-2W idle, which is almost nothing (and doesn't even need a heatsink if you can stand some throttling from time to time).
Most of the Intel Mini PCs (which are about the same price, with a little more performance) idle at 4-6W, or more.
Unless you have a robot body for your potential RPi, don't buy one.
it's nice to take it on road trips / into hotels.
can't really imagine hauling a server around.
we probably have different definitions of "very little power".
> can't really imagine hauling a server around.
These two sentences contradict each other.
I can fit a raspberry pi and external ssd in my pocket.
I cannot do that for a server.
I could use a laptop, but simply plugging in a firestick to the hotel tv or a projector when camping is nicer.
No, I wouldn’t think.