Δ-Mem: Efficient Online Memory for Large Language Models
128 points
6 hours ago
| 8 comments
| arxiv.org
| HN
semiquaver
2 hours ago
[-]
Hmm, this is a case where HN’s title mangling changed the meaning of the title. Lower case delta (δ) is used intentionally. I don’t think HN should automatically modify the casing of non-ascii chars.
reply
setopt
1 hour ago
[-]
Even for ASCII chars, nomenclature in math and physics is usually case-sensitive.
reply
usernametaken29
4 hours ago
[-]
> δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning

This doesn’t solve the capacity problem of memory. You can cram more into one context window, but then again you need to associate them with input queries. That’s very hard because slight variations in input create hugely different activations. So really, it doesn’t improve caching. This paper might do a thing or two approximating the compression limit for context windows, but there’s a fundamental limit on how much information can go into it. What you really need is contextual search, as in, different events and objects with the same abstractions and semantic lead to same response, so you can cache effectively… on this front the paper does little to improve “memory” in a meaningful way

reply
jsemrau
2 hours ago
[-]
I am currently working on deep context query which uses dynamically generated regex to pull only the relevant context blocks. By using lightweight RegEx pattern matching to detect semantic intent and filter structured context sections accordingly, you avoid the attention degradation that comes from stuffing semantically redundant information into the window

https://jdsemrau.substack.com/p/tokenmaxxing-and-optimizing-...

reply
structuredPizza
1 hour ago
[-]
The more real world use cases we see, the more we see the use of a well thought out regex as a bridge from probabilistic to deterministic.
reply
jandrese
1 hour ago
[-]
So instead of a FIFO approach to memory management it instead continually degrades the existing data the more you put in? Details start getting lost or mangled more and more over time?
reply
kordlessagain
2 hours ago
[-]
Like Ferricula: https://deepbluedynamics.com/ferricula (site/docs still in progress).
reply
3form
4 hours ago
[-]
Interesting points:

- fixed size of the memory seems like a good idea to overcome the current limitations

- skimming through the thing, I can't find any mention of the cost?

- I would need more time to read it in-depth to see if this is legitimate and not just fancy form of overfitting or training on testing data

reply
raverbashing
4 hours ago
[-]
Interesting that the headline is showing Δ-Mem while the paper uses δ-mem

Is it a lowercase to uppercase conversion going on here?

reply
sillysaurusx
3 hours ago
[-]
Correct!
reply
ktallett
5 hours ago
[-]
The obvious energy saving step would be to utilise previous searches by others. Many of the tasks people do are rather similar, it is such an energy waste to start again each time.

(Obviously ignoring the huge energy saver, which is to observe if you even need to bother doing the task at all.)

reply
405126121
4 hours ago
[-]
I had this thought and created https://pushrealm.com which is essentially a sort of Stackoverflow written by agents.

My theory was that if an agent burns 30 minutes resolving an issue not present in training data, posting the solution would prevent other agents re-treading the same thinking steps.

reply
spockz
4 hours ago
[-]
So you mean caching? :-)
reply
ktallett
3 hours ago
[-]
I see why, but I don't feel this is the solution. Being able to search thru the endless LLM responses is not viable. However having useful memories, similar to human brain is more important. I sense this is why neuromorphic computing is the next step, energy efficient and doesn't remember much of what isn't useful to be stored.
reply
visarga
2 hours ago
[-]
Why not preserver the essential memories in text? Why neuromorphic?
reply
ktallett
2 hours ago
[-]
You are better being able to quickly deduce ways of acting from memories of previous scenarios, than have to attempt every scenario to build a fresh memory of each, which is a lot of memory, and requires exposure to every situation before being able to do it.
reply
duskdozer
4 hours ago
[-]
A lot of what I see people using LLMs for would be more cheaply and reliably done by [scripts]. A search engine style suggestion thing like "Have you tried `sed`?" would be beneficial imo
reply
tyre
3 hours ago
[-]
In my experience, Claude is more than happy to go to Unix tools rather than write its own. Sometimes it will write a lil python script to solve something, but more often than not it’ll pipe together Unix utilities.

This has the benefit of it knowing all of the arcane flags, especially for formatting output.

reply
DeathArrow
5 hours ago
[-]
I see lots of techniques proposed to give LLM the capacity to recall things, I even saw a lot of memory plugins for AI coding agents, I tried some myself.

What I want to see is something that was tested and proved in practice to be genuinely useful, especially for coding agents.

reply
stephantul
4 hours ago
[-]
How would you conceptualize recall in this case? Is searching through the current version of your code and possibly git history not enough?
reply
rush86999
4 hours ago
[-]
You would think git history should be the first thing an agent would look at, as they make so many mistakes before they get to the correct answer. They don't.

I haven't measured, but documenting bug fixes and architecture seems to help, along with TDD patterns, including integration tests.

I would probably add it to Claude.md to look for all of the above when tackling a new bug.

reply
visarga
2 hours ago
[-]
I made a harness that preserves memory for both user messages and task execution. One reason this works is related to judge agents - they can't review information that was not written down. So I track everything in my harness. The judge agents bring the most benefit, based on my evals. The coding agent can execute a task without all the ceremony just as well, but judging needs something to grasp on, besides code. And adding new perspectives helps a lot, it is the most useful intervention. My flow is - user emits a task, the agent plans, then judge agents review the plan, then main agent executes, then judge again reviews the execution. Might consume more tokens to track execution and judgements, but worth it.
reply
brookst
2 hours ago
[-]
My Claude code frequently looks through git history, both when planning and debugging.
reply
cubefox
3 hours ago
[-]
Papers being voted high on Hacker News are usually uncorrelated with their actual importance. It's basically a lottery. There are regularly more interesting papers going semi viral on Twitter.
reply
MeteorMarc
2 hours ago
[-]
On huggingface it was #3 paper of the day, which is neutral towards your hypothesis.
reply
kingkawn
2 hours ago
[-]
What about broad unsupportable generalizations on hackernews, how do those rank?
reply
belabartok39
2 hours ago
[-]
Did AI generate this paper too?
reply