So yah, cool, caching all of that... but give it a couple of months and a better technique will come out - or more capable models.
Many years ago when disc encryption on AWS was not an option, my team and I had to spend 3 months to come up with a way to encrypt the discs and do so well because at the time there was no standard way. It was very difficult as that required pushing encrypted images (as far as I remember). Soon after we started, AWS introduced standard disc encryption that you can turn on by clicking a button. We wasted 3 months for nothing. We should have waited!
What I've learned from this is that often times it is better to do absolutely nothing.
1. If your agent needs to write a lot of code, it's really hard to beat Claude Code (cc) / Agent SDK. We've tried many approaches and frameworks over the past 2 years (e.g. PydanticAI), but using cc is the first that has felt magic.
2. Vendor lock-in is a risk, but the bigger risk is having an agent that is less capable then what a user gets out of chatgpt because you're hand rolling every aspect of your agent.
3. cc is incredibly self aware. When you ask cc how to do something in cc, it instantly nails it. If you ask cc how to do something in framework xyz, it will take much more effort.
4. Give your agent a computer to use. We use e2b.dev, but Modal is great too. When the agent has a computer, it makes many complex features feel simple.
0 - For context, Definite (https://www.definite.app/) is a data platform with agents to operate it. It's like Heroku for data with a staff of AI data engineers and analysts.
For brownfield work, work on hard stuff or work in big complex codebases you'll save yourself a lot of pain if you use Codex instead of CC.
I'd stay clear of any llm abstraction. There are so many companies with open source abstractions offering the panacea of a single interface that are crumbling under their own weight due to the sheer futility of supporting every permutation of every SDK evolution, all while the same companies try to build revenue generating businesses on top of them.
Have you experimented with using semantic cache on the chain of thought(what we get back from the providers anyways) and sending that to a dumb model for similar queries to “simulate” thinking?
If you have to use some of the client side SDKs, another good idea is to have a proxy where you can also add functionality without having to change the frontend.
My bet is that agent frameworks and platform will become more like game engines. You can spin your own engine for sure and it is fun and rewarding. But AAA studios will most likely decide to use a ready to go platform with all the batteries included.
What they want is to get things done. The model is simply means to an end. As long as the task is completed, everything else is secondary.
Caching is unrelated to memory, it's about how to not do the same work over and over again due to the distributed nature of state. I wrote a post that goes into detail from first principles here with my current thoughts on that topic [1].
> Are these things really that forgetful?
No, not really. They can get side-tracked which is why most agents do a form of reinforcement in-context.
> Why is there a virtual file system?
So that you don't have dead-end tools. If a tool creates or manipulates state (which we represent on a virtual file system), another tool needs to be able to pick up the work.
> Why can't the agent just know where the data is?
And where would that data be?