FilterHN

Ask HN: I have 24 core server with 1TB of DDR4 RAM, what should I run?

24 points

by pajeets

8 months ago

| past

| 26 comments

| HN

First time I'm doing colocation and as experiment I picked up a 1TB DDR4 RAM server with 24 cores (around 2.2ghz each)

What should I do now? What stuff should I run?

I have a hello world Flask app running but obviously its not enough to use the full potential.

I'm thinking of running KVM and selling a few VDS to friends or companies.

Also thought of running thousands of Selenium browser tests but I do this maybe once a year, not enough to fully utilize the server 24/7

Help! I might have gone overboard with server capacity, I will never have to pay for AWS again, I can literally run every single project, APIs, database I want and still have space left over.

▲

freedomben

8 months ago

[-]

I would definitely run KVM, and set up virt-manager locally and connect to the server remotely over SSH. Whether you sell VPSs to friends or not, you can use it as a quick scratch pad for spinning up different os's for various testing purposes, both long lived and short.

I have a similar piece of hardware but 256 GB instead of a terabyte of RAM, and that is what I do. It has come in incredibly convenient to be able to spin up VMs as needed. I started creating different VMs for purposes. I would have normally just used the same host for, and have really enjoyed it.

I also run about a dozen personal services on there, such as audio bookshelf, archive box, jellyfin, navidrome, and more. Surprisingly, the archive box instance uses quite a bit of memory and CPU, so the box does get a fair amount of exercise. I have not looked very closely, but I believe archive box is using that memory and compute mainly for running headless Chrome. I have a browser extension installed that archives nearly every page I visit automatically, so especially during busy browsing times, I keep that thing running pretty hot.

In the past I set up a self-hosted openshift instance on it, spread across six VMs. I actually loved that and the only reason I'm no longer running it is because I broke it by messing around with risky and dangerous things that no sane person should ever do, and then did not want to dedicate the time to rebuild it. Someday I will recreate it again.

Whatever you decide to do with it, this is a really awesome problem to have!

▲

brodouevencode

8 months ago

[-]

Chrome with 12 tabs open

▲

solardev

8 months ago

[-]

Or one tab with Javascript enabled and uBlock off.

▲

tomcam

8 months ago

[-]

Here at HN we appreciate optimism but not blatant fantasy

▲

pajeets

8 months ago

[-]

gonna need quantum computing once you breach in to the mid 20s

▲

etcd

8 months ago

[-]

I hear that quantum computing can run all the tabs in parallel.

▲

throw4950sh06

8 months ago

[-]

All possible tabs exist in an open/not open state and the wave function collapses once you open a tab. Instant internet using your multiverse modem!

▲

rozenmd

8 months ago

[-]

Run a rails server, a DB and you'll probably not need to worry about scale until you're earning millions a year.

▲

stn8188

8 months ago

[-]

Here's a random idea: run a freely available electromagnetic simulation agent! My favorite EM solver has an "agent" that can run on a server, you simply point your local GUI to the agent and take advantage of more computational power.

https://www.simberian.com/

▲

BizarroLand

8 months ago

[-]

I have a similar setup, but with 20 cores and only 144gb ram but a 4070.

I run proxmox on it, have servers for pihole and networking, AI, docker & portainer & jellyfin.

I've just about run out of the things I want to do with it and I barely use 10% of its capacity.

▲

cmacleod4

8 months ago

[-]

You could run hundreds of nodes for the Autonomi peer-to-peer distributed storage network, see https://autonomi.com/ .

▲

austin-cheney

8 months ago

[-]

Simulations of maximum concurrency.

With hardware like that I would research a couple of things:

* Maximum number of servers, HTTP/WS, I could run simultaneously. I am working on a server application to do this right now. Each server should be 1, 2, or 4 ports. You could run HTTP over WS which allows both protocols over a single port. If you want to allow both TLS and insecure connections it would still be two 2 ports, or 4 ports if you are isolating HTTP and WS from each other.

* Maximum number of simultaneous sockets. I would test for the maximum number of open sockets connected to a single server instance and the maximum average number of sockets open to the maximum number of servers from the prior bullet point.

* Once you have confidence with both prior points I would then research the maximum amount of cross-talk. If your multiple servers can talk between each other then your servers almost achieve SMP. They could talk to each other via sockets like they are talking to everything else, but IPC communication would be even faster.

* Once you have all that then you have sufficient infrastructure in place to investigate more precise performance concerns. For example I have found that in my own personal implementation I could transmit messages via WebSockets almost 11x faster than I could receive the messages, but it could be that my own implementation is poorly executed. I also found that under the most ideal conditions my, likely poor, WebSocket implementation was still at least 8x faster than HTTP can go to over 80x and beyond performance after accounting for scale of high message frequency and socket concurrency.

* Once you have your performance bottlenecks identified you can then research performance bottlenecks on data transfer from a large data source of high frequency access. Then with that identified you can train AI on it for concurrency simulations that self-learns.

In the end you can sell your research. Consider that data centers require their own power plants to operate. While the hardware in those data centers is likely assembled in an efficient manner the software servers running on that hardware is often not efficient, and that costs millions of dollars a month just in electricity.

▲

kjok

8 months ago

[-]

I’ve a proposal. I’ve created a Hypervisor that can launch lightweight and efficient VMs, similar to containers. Workloads in these VMs behave like isolated processes (isolated from other VMs) You can pack and sell 10x more VMs. Unlike traditional VMs, you don’t have to worry about turning them off so as to free resources. Would love to work together with you if you’re interested in using it.

▲

ActorNightly

8 months ago

[-]

For research, you can do things like PSO or Genetic algorithm optimizations. Those benefit more from CPU than GPU cause there is not so much matrix math going on, especially for complex fitness functions that require sequences.

You could do this with the small Llama model, where the fitness function is basically the ability generate correct code and self detect errors, and adjust the weights based on the optimization algorithm.

▲

RGamma

8 months ago

[-]

Maybe donate part of it to the NixOS hydra build farm? (not sure how easy that is). You can compile everything in RAM like that.

▲

changexd

8 months ago

[-]

I know that my colleague runs proxmox on this, you can do many things with it, my company is currently running proxmox and set up many vms as kubernetes nodes, I always appreciate people who set up their own lab, that colleague of mine somehow has enterprise grade storage server, nics... and many more inside his house just to have fun

▲

5Qn8mNbc2FNCiVV

8 months ago

[-]

Maybe you can scratch the experiment and downscale it? Seems like you are very very very far away from needing something as beefy as that. Unless you got it for very cheap, I'd feel bad having so much money in idle hardware

▲

pizza

8 months ago

[-]

Fuzzers, caches, graph algorithms, climate models, massive compilations, in-mem filesystems, VMs, Llama 3 405 B (will be slow)

▲

maz1b

8 months ago

[-]

I've been thinking about doing the same, seems much more cost effective. How much will you sell the VDS for?

▲

evanjrowley

8 months ago

[-]

Test rowhammer attacks to gauge how resistant/susceptible your dense memory sticks are vs. other systems.

▲

etcd

8 months ago

[-]

Do some AI research that takes advantage of those characteristics.

▲

brudgers

8 months ago

[-]

Erlang?

▲

yetihehe

8 months ago

[-]

I run smallish (3k devices and hubs currently connected, about 15k total sensors) IoT "platform" on a signle 16 core, 64GB ram server with Erlang, it uses up about 3-5% of machine. Our stress testing indicated it could easily go up to 250k devices on one instance.

▲

hulitu

8 months ago

[-]

> Ask HN: I have 24 core server with 1TB of DDR4 RAM, what should I run?

Xbill.

▲

dkekenflxlf

8 months ago

[-]

What is the framerate in Q3A, is the only benchmark

▲

jiehong

8 months ago

[-]

Some databases?

Make it a kubernetes node?

Make it run long async jobs like ETL kind of stuff.

▲

workfromspace

8 months ago

[-]

Doom and Far Cry are the two extremes I can think of.

▲

evanjrowley

8 months ago

[-]

Try llama.cpp with the biggest LLM you can find.

▲

pajeets

8 months ago

[-]

need a 3090 at least for that

▲

kkielhofner

8 months ago

[-]

llama.cpp and others can run purely on CPU[0]. Even production grade serving frameworks like vLLM[1].

There are a variety of other LLM inference implementations that can run on CPU as well.

[0] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#su...

[1] - https://docs.vllm.ai/en/v0.6.1/getting_started/cpu-installat...

▲

pajeets

8 months ago

[-]

wait this is crazy

what model can i run on 1TB and how many tokens per second ?

for instance Nvidia Nemotron Llama 3.1 quantized at what speed ? ill get a GPU too but not sure how much VRAM I need for the best value for your buck

▲

kkielhofner

8 months ago

[-]

> what model can i run on 1TB

With 1TB of RAM you can run nearly anything available (405B essentially being the largest ATM). Llama 405B in FP8 precision fits in H100x8 which is 640GB VRAM. Quantization is a very deep and involved well (far too much for an HN comment).

I'm aware it "works" but I don't bother with CPU, GGUF, even llama.cpp so I can't really speak to it. They're just not even remotely usable for my applications.

> tokens per second

Sloooowwww. With 405B it could very well be seconds per token but this is where a lot of system factors come in. You can find benchmarks out there but you'll see stuff like a very high spec AMD EPYC bare metal system with very fast DDR4/5, tons of memory channels, etc doing low single-digit tokens per second with 70B.

> ill get a GPU too but not sure how much VRAM I need for the best value for your buck

Most of my experience is top-end GPU so I can't really speak to this. You may want to pop in at https://www.reddit.com/r/LocalLLaMA/ - there is much more expertise there for this range of hardware (CPU and/or more VRAM limited GPU configs).

▲

dmitrygr

8 months ago

[-]

An electron app (maybe two)

▲

zoezoezoezoe

8 months ago

[-]

Honestly, that is a crazy server to put in a datacenter just for fun lol, and i have no idea what you could do with it, selling VDS space sounds like a good idea, especially if you can find someone who wants heaps of memory. Though I'm curious if you dont mind answering a few of my questions: - Where did you colocate (ie what city), do you live there? - How much was it to colocate (you can give ballpark if you dont want to give the full amount) / do you pay monthly or something, did you pay per U or is it something else - What costs do you pay (eg electricity or bandwidth that you use) Thanks, and good luck xd

▲

GiorgioG

8 months ago

[-]

Dokku

▲

pajeets

8 months ago

[-]

so 4 vCPU per customer + 40gigs of RAM ?

▲

wutwutwat