Shameless plug: We have been working on this problem at Mem0 to solve the long-term memory problem with LLMs. GitHub: https://github.com/mem0ai/mem0
I checked your documentation and the only way I can find to run mem0 is with a hosted model. You can use the OpenAI API, which many local backends can support, but I don't see a way to point it at localhost. You would need to use an intermediary service to intercept OpenAI API calls and reroute them to a local backend unless I am missing something.
Instead of long-term memory I'd be happy if it had short-term reliability. I lost count the number of times this week that Claude failed to process prompts because it was down.
I've noticed a bug where long conversations timeout on new sends on mobile because of processing time, but in reality the prompt is sent and responded to, it just doesn't show up until you leave and return to the conversation.
Personally I use LangChain/Python for this, and that way any new AI features I create therefore easily work across ALL LLMs, and my app just lets the end user pick the LLM they want to run on. Every feature I have works on every LLM.
Doubly baffling since the underlying project does support LLMs and this is clearly just a showcase piece.
And so now your interpretation of things is that I misinterpreted his misinterpretation. Great work and thanks for your helpful insights.