FilterHN

LCM: Lossless Context Management [pdf]

43 points

by ClintEhrlich

10 hours ago

| past

| 4 comments

| papers.voltropy.com

| HN

▲

ClintEhrlich

6 hours ago

[-]

Hi, I'm Clint, one of the co-authors of this paper.

I'd like to quickly summarize what is different about our approach and why it matters.

Our work was inspired by brilliant research done at MIT CSAIL on "Recursive Language Models" (RLMs). One of the controversies has been whether these models are just a formalization of what agents like Claude Code already do vs. whether they bring new capabilities to the table.

By outperforming Claude on the major long-context benchmark, we provide a strong signal that something fundamentally new is happening. (In other words, it's not "just Claude Code" because it demonstrably outperforms Claude Code in the long-context regime.)

Where our contribution, LCM, differs from RLMs is how we handle recursion. RLMs use "symbolic recursion" -- i.e., they have an LLM write a script to recursively call itself in order to manipulate the context, which is stored in a REPL. This provides maximum flexibility... but it often goes wrong, since the LLM may write imperfect scripts.

LCM attempts to decompose the recursion from RLMs into deterministic primitives so that the control flow can be managed by an engine rather than left to the whims of the LLM. In practice, this means we replace bespoke scripts with two mechanisms: (1) A DAG-based context management system that works like paged virtual memory, except for managing conversations and files; and (2) Operator-level recursion, like "Map" for LLMs, which lets one tool call process thousands of tasks.

An analogy we draw in the paper is the evolution from GO-TO statements (of Dijkstra's "Considered Harmful" fame) to structured programming. RLMs are maximally expressive, but all of that power comes with the risk of things going awry. We have built a more mechanistic system, which can provide stronger guarantees when deployed in production with today's models.

Happy to answer any questions! Thanks for taking a look at the paper!

▲

jorl17

6 hours ago

[-]

Thank you so much for your work!

I've echoed the sentiment here on HN (and elsewhere) that these kinds of mechanisms seem to be a pathway to extending context longer and longer and longer and I wish I could toy around with this technology right now (can I?). I'm so excited!!

Your work is the shoulders-built-on-shoulders upon which other giants shall keep on building. Thank you so much.

▲

ClintEhrlich

6 hours ago

[-]

Thanks for the kind words.

Yes, we think there is a ton of low-hanging fruit from taking lessons from OS/PL theory and applying them to LLM tooling.

This is our first contribution in that direction. There will be more!

▲

ClintEhrlich

6 hours ago

[-]

Oh and to be clear YES you can try it!!!

Just bring an API key. :)

github.com/voltropy/volt

▲

vessenes

6 hours ago

[-]

This looks super useful! And it’s intellectually appealing to think that the LLM will have the ability to think back precisely and we can rely on DAG tooling to reason about and keep track of history (and correct history).

Have you considered making an openclaw plugin/PR for it? I understand you have your own coding CLI tool, but I don’t think this looks so hard to implement that it can’t be implemented elsewhere.

Either way, thanks for sharing this.

▲

ClintEhrlich

6 hours ago

[-]

Yes, that is actually the next thing we are shipping!

We have heard from a ton of OpenClaw users that the biggest barrier to them getting everything they want out of their agents is that memory is not a solved problem.

LCM could be a great solution to that. Stay tuned -- will ship it ASAP.

▲

vessenes

5 hours ago

[-]

Riffing on this a little, there’s a few things that would be useful:

1 - global namespace - for the gateway agent/coordinator - would make inspecting results of subagent tasks much more safe and efficient, and all the benefits of precision across compaction boundaries for the main chat thread. I could see giving the subagents access to it, or just prompting them fresh and storing results in the global memory - probably the second is better.

2 - permissioned memory spaces - stuff that a given subagent should know without giving them global memory access. Then a gateway could mark some stuff ‘available’ as part of prompting.

This would be a super useful set of primitives - from reading the paper, I think you could do this relatively cheaply, maybe a tagging system for branches/nodes in the DAG. openclaw keeps some sort of track of what subagents should have access to already in the form of skills, but I haven’t looked into the actual permissions architecture.

▲

belisarius222

5 hours ago

[-]

Did somebody say 'global namespace'? I spent years working on one of those as part of Urbit... In general, I think you're right. Each conversation is an append-only log at the lowest layer, and I see no reason not to expose that fact as a global namespace, as long as permissions are handled gracefully.

Of course getting permissions to work well might be easier said than done, but I like this direction.

▲

ClintEhrlich

5 hours ago

[-]

Just passed this on to my co-author who is working on the plug-in. Really appreciate the suggestions!

We will probably ship a fairly basic version to start, but I think there are a lot of cool things that can be added.

▲

vessenes

6 hours ago

[-]

Love it. Yes, compaction is a huge pain point in openclaw, and it EATS tokens.

▲

quotemstr

5 hours ago

[-]

Cool. I agree (consistent with your GOTO analogy) that imposing structure on the model (or a human) can constrain the search space and lead to better choosing given a fixed decision budget.

> deterministic primitives

Are agent-map and LLM-map the only two options you've given the model for recursive invocations? No higher-level, er, reduction operators to augment the map primitives?

▲

belisarius222

5 hours ago

[-]

Hi, I'm the other author on this paper. You've asked a good question. I had originally planned on writing an agentic_reduce operator to complement the agentic_map operator, but the more I thought about it, the more I realized I couldn't come up with a use case for it that wasn't contrived. Instead, having the main agent write scripts that perform aggregations on the result of an agentic_map or llm_map call made a lot more sense.

It's quite possible that's wrong. If so, I would write llm_reduce like this: it would spawn a sub-task for every pair of elements in the list, which would call an LLM with a prompt telling it how to combine the two elements into one. The output type of the reduce operation would need to be the same as the input type, just like in normal map/reduce. This allows for a tree of operations to be performed, where the reduction is run log(n) times, resulting in a single value.

That value should probably be loaded into the LCM database by default, rather than putting it directly into the model's context, to protect the invariant that the model should be able to string together arbitrarily long sequences of maps and reduces without filling up its own context.

I don't think this would be hard to write. It would reuse the same database and parallelism machinery that llm_map and agentic_map use.

▲

quotemstr

4 hours ago

[-]

Cool! It'll be interesting to follow your work. I've been thinking, as well, about quorum and voting systems that might benefit from some structure. The primitives you've described are great for the "do N things one time each" case, but sometimes I (and the AI) want "do one thing N times: pick the best somehow". (I mean, you can express that with map/reduc over small integers or something, but still: different flavor.) You can even bring public choice theory into it.

▲

SafeDusk

1 hour ago

[-]

Very cool! Excited to incorporate this into https://toolkami.com which is built upon RLM. Thanks for the great work!

▲

carshodev

3 hours ago

[-]

Seems that this would be useful for subagents aswell. You could still allow an agent down the line to inspect the thinking traces/steps of a subagent, by creating a mapping of the content. Thus keeping it compressed but accesible if requested.

▲

ClintEhrlich

7 minutes ago

[-]

Our system uses sub-agents as a core part of its architecture.

That terminology can be confusing, because in other cases (and sometimes in our own architecture, like when executing thousands of operations via MAP) a sub-agent may be a smaller model given less complex individual tasks.

But the core mechanism we use for simulating unlimited context is to allow the main model to spin up instances of itself (sub-agents) with the previously summarized portion of the context expanded into its full, uncompressed state.

Expanding summaries into full text in sub-agents rather than the main thread is a critical part of our architecture, because it prevents the main context window from filling up.

▲

dworks

2 hours ago

[-]

I built an RLM workflow as a coding agent skill: https://github.com/doubleuuser/rlm-workflow

It has two differences:

1. It does not store chat history, reasoning traces etc, only workflow artifacts (requirements, codebase analysis, implementation plan, etc). I frankly do not believe those things are relevant.

2. It is significantly simpler and more lightweight, using only markdown files