Reasoning is cost apparently our monthly Claude bill has become astronomical for the org. Nearly 3x our saas's cloud spend.
Apparently we are going to get limited access to codex at severely reduced plans.
I have tried some local models such as Kimi, however most are barely functional.
I am very concerned as the expectation of amount of work done is to remain consistent. Ignoring the fact teams have made entire workflows around Claude I am very worried and frustrated.
How can I help my team ease this transition? Are their local models that run well on local machines that only have 16gb ram?
Companies doing well and growing pass on crazy revenue opportunities because the ones they are focused on are even higher roi. They don’t get bogged down in incremental costs.
If they are cost cutting on dev tooling some bright boss is going to realize the real savings is in cutting down on devs. If you are an American dev team your health insurance costs smoke any amount of token spend.
Since my team is mostly remote running LLM on a cluster in the office is not really viable short term.
I was considering having something run locally within out building but the time when something like that would be avaliable is not near term so i am trying to make the best of what i can do.