Inference cost at scale with napkin math
34 points
4 days ago
| 1 comment
| injuly.in
| HN
smalltorch
1 hour ago
[-]
>This largely depends on whether you own or rent your hardware. At $40,000 per B200, your lifetime cost per user is 40_000/num_users. In the 100% duty cycle case (worst for cost), that's 6k$ per user. Realistically, serving 300 users per GPU you'll spend a lifetime cost of about $133 per user, plus the datacenter/upkeep bill. If you rent the GPU, the cost is more straightforward. At an hourly rate of $43, your hourly cost per user is 4/num_users. For num_users=300 you get an hourly rate of about $0.013 per user, or $9.36 per month.

This leads me to believe you can buy a GPU but leave it at a data center?

Do people do this? I don't understand. Or are you equating upkeep bill to electricity on premises?

reply
__s
1 hour ago
[-]
reply
smalltorch
1 hour ago
[-]
So what's the cost separating them from placing this box at their premise?

Network throughout?

reply
namibj
50 minutes ago
[-]
Plus power and cooling.
reply