I run a single-node K8s cluster on a dedicated server because it's way cleaner to manage than the previous mess and mix of docker compose + traefik routing + random stuff installed as package on the host.
I can create "vhosts" for practically anything in a declarative manner, and if the cluster blows up, I have 5 small scripts to bootstrap it and all I need is `kubectl apply -k .`.
Uptime, self healing, reproducibility, separating the system from app. There's probably a half dozen more.
K8s comes with resource consumption tax certainly but for anything beyond the trivial it's usually justified.
> Separate VM's for different apps works well for isolation
Sounds inefficient along with a lot more work doing the plumbing than simply writing a 100 lines of yaml.
https://commaok.xyz/ai/just-in-time-software/
I mean, I don't do that, but I'll type a prompt.
let me draw this out the way i've been playing with:
a classic vm exists, and supports kvm — this means you can run stuff like firecracker in there
an ssh server runs on this vm, and when you connect to it, you're dropped into a repl/tui where you can list existing microvms, create new ones, or destroy existing ones, and, of particular use, you can attach to one.
as an added nicety, if you connect with `ssh user+dev@example.com`, your connection skips the management interface and you are dropped into the `dev` machine — if it didn't yet exist, you wait 3s, and now it does
vms can talk to each other internally, can connect out, and persist if the server needs a restart
what i don't have yet is proper multi-tenancy, it treats each ssh key as an account, which is fine since it's just me; incoming connections is not figured out, internal supervisor to keep services running inside each microvm, isolation inside firecracker, snapshoting or backups, and the whole shopping list that would make it an actual mvp
as argued by OP, you can see this happening with exe.dev, and less explicitly with sprites.dev
Maybe they’re assuming some massive amount of compute will be necessary for future tasks? Self hosted LLMs? I’m currently finding it difficult to come up with more uses for my vps beyond hosting trillium and some personal applications I’ve made
The "cooperative task" they're engaged in is just, broadly, meeting your needs, whatever they are.
The isolation is a desirable property, and I agree this is much preferable to a highly inter-coupled bunch of machines, and also that thia stretches the typical sense in which we refer to a "compute cluster", but I don't think it's an entirely invalid framing of the term.
Not really. In my experience clustering implies multiple compute elements serving the same function with a coordination mechanism to provide redundancy and/or enhanced capacity.
JBOD vs. RAID.
It sits on top of Kubernetes and seems very hand wavy about how you create and manage those clusters.
The article itself reminds me of the enthusiasm I felt for plan9 when I first heard about it back in uni. I also thought everyone should have their own compute grids and that clustered computing was the future; of course now I realize there's a lot of reasons why that doesn't actually work. Considering this appears to be a start-up ad, I hope the author knows something I don't.
ClusterdOS appears to be a kubernetes-in-a-box multiple node setup that's goal is to work so well that the user doesn't know or care what it's doing. I wouldn't trust an LLM with managing one machine by itself, let alone a whole cluster of them running the incredibly complex mess that Kubernetes is (and that's not even counting the 8 other layers of software this is), so this feels like an order of magnitude worse.
[0] Using LLMs for sysadmin research or boilerplate writing is one thing, but after a certain amount of use you're really just paying $X a month for Anthropic to manage your systems for you. I'd rather just pay a real person to do it at that point. I'd also rather people get over their pathological fear of learning how to run a server but I've given up on that.
> see CEO of Tailscale apenwarr's vibe-researched thread
“Vibe-research” is now a core part of my vocabulary.
A big advantage of clusters, and horizontal scaling in general, is the ability to easily dynamically scale to meet demand.
If you're running a system on a single machine that has N GB of memory and you need to scale to N+1, what do you do? Provision a new machine and migrate everything over?
No-one operates online real-time systems like this. Clusters make it much easier and less expensive to handle this.
On top of that, it's probably true that in some pure numerical problem-count sense, "most problems" don't need a cluster, but that's misleading. It's like saying "most businesses are mom-and-pop shops." Perhaps true, but it ignores hundreds of thousands of larger businesses, or even small business that have big data needs.
There are plenty of problems that involve large amounts of data, and that's increasingly true with ML applications.
I'm at a company of ~100 people which you've probably never heard of (classified as a "small" company in government stats, so not included in the hundreds of thousands figure I mentioned above.) We have 1.9 PB of data for our main environment. When we run processes that deal with it all, the clusters scale to thousands of vCPUs and tens of terabytes of RAM.
Several processes that run daily scale to 500+ vCPUs and many TB of RAM. For the latter, the data itself could probably fit in RAM on a humongous machine, but the CPUs wouldn't fit on a single machine. And we'd have to size the machines carefully every time we start them up. Clusters can scale up dynamically according to the demands of the jobs they're executing.
so I guess idk what you mean by 'elastic' here.
You can have more than one CPU and more than one storage connected to one mainboard and that works because the interconnect fabric is very fast.
We don't have have the possibility to connect different computers at the same kind of speed that would let them work together seamlessly.
10Gbps is now very cheap and 100Gbps is viable at hobby scale. That's Ethernet. I don't know anything about CXL and so on.
we built machines with RDMA that allowed fast one-sided transfers between memories at a decent fraction of the memory bandwidth. and operating systems that ran services to present a unified operating system interface on top of that.
there is a whole history of distributed operating systems if you're interested