Exploring inference memory saturation effect: H100 vs. MI300x
57 points
21 days ago
| 6 comments
| dstack.ai
| HN
mufasachan
20 days ago
[-]
Great benchmark, very interesting. Although, I am not sure about the extrapolation of the H200 from the lambda bench. From my understanding, Lambda and theirs bench used different models - LLama 405B and Mistral 123B - with different bench and inference libs. Since the study is focused on memory-hungry scenario, I am really curious to know why they took H100 instead of H200.
reply
bihan_rana
20 days ago
[-]
Yes it’s a different model + backend and obviously the extrapolation will never be as good as experimental values. but, 1. We have only used the multiplier value 3.4, and not the exact throughput from Lambda’s experiment. 2. We have also used the same input/output sequence length as Lambda's experiment. 3. Also our extrapolated value is inline with the specs of H200 when compared to Mi300x
reply
mufasachan
20 days ago
[-]
Thanks for the details!
reply
byefruit
21 days ago
[-]
Bit weird to show $ per 1M tokens and not include the actual costs of the systems anywhere.

It would be interesting to know the outright prices for those systems as well as their hourly rental rates at the moment.

reply
latchkey
21 days ago
[-]
reply
cheptsov
21 days ago
[-]
Yes, we should have included the price too—thanks for pointing that out. I forgot about it. Appreciate you adding the links here! And yes, Lambda's and Hot Aisle's prices were used in the calculation.
reply
fngarrett
21 days ago
[-]
Great read. Have you compared performance with other Llama models (3, 3.2) or have you just done benchmarking with 3.1?

Is there some intuition as to why 3.1 might outperform 3.2 on MI300X?

reply
cheptsov
21 days ago
[-]
In this one we were only using 3.1 405B FP8. We took one model to simplify the setup and were mostly looking at the memory saturation effect. So basically we compared inference metrics of the same model. I suppose comparing 3.1 and 3.2 will be difficult as they are different models entirely. But open to ideas
reply
JackYoustra
20 days ago
[-]
Glad to see hot aisle again providing support to the AMD research community!
reply
latchkey
20 days ago
[-]
reply
honzafafa
21 days ago
[-]
Cool!
reply
sylware
20 days ago
[-]
Finally somebody tackling the issue of AI sleep.

May need a neural net with a third dimension, namely 3D.

Let's wait for the outcome, but they should have a look at this regime in the human brain.

reply