How can I do my research as a GPU poor?
23 points
1 month ago
| 10 comments
| HN
I need to train 70b parameter LLMs for my research on world simulation and self-improving system. Right now I can only train small models on my 8gb 3050 or for a couple of dollars on the cloud, but I'm lacking resources to train better, faster models. What's your advice?
protocolture
1 month ago
[-]
Have been looking at this myself and it seems the advice isnt good.

Its basically Cost/Speed/Size Pick 2 or maybe even 1.

Some people have been able to run large LLM's on older slower CUDA gpus. A lot of them are truly ancient and have found their way back to ebay simply due to market conditions. They aren't good, but they work in a lot of use cases.

There are a couple of janky implementations to run on AMD instead. But reviews have been so mixed that I decided not to even test it. Ditto multi GPU setups. I thought that having access to 16 8g AMD cards from old mining rigs would have set me in good stead but apparently it benches roughly the same as just using a server with heaps of RAM because of the way the jobs are split up.

The cloud services seem to be the best option at the moment. But if you are going to spend 100 bucks vs going to spend 1000 bucks it might be worth just to fork out for the card.

Also honestly, hoping that someone else has a better idea in this thread because it will be useful to me too.

reply
gperkins978
1 month ago
[-]
A100's are cheap. Are those bad now?
reply
protocolture
1 month ago
[-]
Define cheap?

I am seeing an average cost of 15k+ on feebay.

I think anyone with 15k could put together a rig with enough VRam to handle a decent model.

The question is more focused on a budget that seems sub 2k.

reply
luussta
1 month ago
[-]
thanks!
reply
worstspotgain
1 month ago
[-]
I'd say the first step is to rule out LoRA. If LoRA is an option, it buys you more than just the rig savings:

- You can deploy multiple specialized LoRAs for different tasks

- It massively reduces your train-test latency

- You get upstream LLM updates for "free," maybe you can even add the training to your CI

reply
luussta
1 month ago
[-]
thanks!
reply
jebarker
1 month ago
[-]
Is there a different angle on the research you can take that doesn't require training models of that size?

When choosing research problems it's important to not only follow what's interesting but also what's feasible given your resources. I frequently shelve research ideas because they're not feasible given my skills, time, data, resources etc

reply
RateMyPE
1 month ago
[-]
AWS, Azure and GCP have programs for Startups and for Researchers that give free credits for their respective platforms. Try applying for those programs.

They usually give between $1000 and $5000 worth of credits, and they may have other requirements like being enrolled in college, but you should check each of their respective programs to find out more.

reply
_davide_
1 month ago
[-]
I have a couple of M40 with 24 gb on a desktop computer, i had to tape the pci-e to "scale it down to 1x". It's ok for interference and play around with small trainings, but I can barely infer a quantized version of 70b parameters. Training anything bigger than 3b parameters in human timescale is impossible. Either you scale it down or ask for a sponsor. It's frustrating because IT had always been approachable, until now...
reply
8organicbits
1 month ago
[-]
Are you part of a university? Do they have resources or funding you can access?
reply
luussta
1 month ago
[-]
I'm not part of any university
reply
RecycledEle
1 month ago
[-]
Look into buying a used Dell c410x GPU chassis (they are $200 on my local CraigsList) and a Dell R720. Then add some cheap GPUs.

For a few thousand dollars you might get some processing power.

But you will never be able to pay your electric bill.

reply
thijson
1 month ago
[-]
Is there a cheap GPU you suggest? I was looking at NVidia P40's as they have 24 GB of ram, and cost a few hundred dollars.
reply
gperkins978
1 month ago
[-]
I was going to recommend an A100, but they are not available cheaply anymore. I set someone up with two of them I managed to score from Amazon for like $400 each, but used. They had 40gb RAM, but it was HBM and super fast. There were mining cards based off of these as well, but I have no idea what was stripped down to make them cheaper. The only issue is that you need to heat sink them if they are not going in a server, and in a rack-mounted server, there needs to be proper cooling (big dog fans).

But I just looked again, and A100's are not around at reasonable prices anymore. I cannot even find the old mining equivalents (they used to be everywhere, and CHEAP). Perhaps many people are building similar systems now. After the Ether merge it was a great time to build.

reply
NBJack
1 month ago
[-]
Yes, just ensure you can keep them cool. I have an old m40 that opened up my options with 24GB of VRAM, installed on a traditional case with a 3d printed cooler adapter + fan. While it isn't always fun getting older cards to work, it is certainly viable (and scalable if needed).
reply
luussta
1 month ago
[-]
thanks!
reply
muzani
1 month ago
[-]
reply
dragonwriter
1 month ago
[-]
That project says distributed inference, OP is looking for training, not inference, solutions.
reply
Log_out_
1 month ago
[-]
Do an internship at nvidiavdrover development?
reply
newsoul
1 month ago
[-]
Choose Your Weapon: Survival Strategies for Depressed AI Academics

https://arxiv.org/abs/2304.06035

reply