Compute, bytes of ram used, bytes in model, bytes accessed per iteration, bytes of data used for training.
You can trade the balance if you can find another way to do things, extreme quantisation is but one direction to try. KANs were aiming for more compute and fewer parameters. The recent optimisation project have been pushing at these various properties. Sometimes gains in one comes at the cost of another, but that needn't always be the case.
Not many people would like today models comparable to what was SOTA 2 years ago.
To run models locally and have results results as good as the models running in data centers we need both efficiency and to hit a wall in AI improvement.
None of those two conditions seem to become true for the near future.