FilterHN

Context Sculpting

20 points

by perceptronblues

21 hours ago

| past

| 5 comments

| perceptiontheory.bearblog.dev

| HN

▲

theowaway213456

19 hours ago

[-]

Not a single mention of prompt caching in this article, which is a massive benefit of append-only context.

▲

unholiness

18 hours ago

[-]

If it were, I can in theory see situations where improving content cleanliness is worth blowing away the KV cache.

But I absolutely can't see how feeding the entire context into a more expensive model multiple times per task, just to propose context edits that might indirectly help, could ever be worthwhile.

▲

jauntywundrkind

17 hours ago

[-]

Cost wise yes, but in terms of getting the correct best work done? Meh, not helpful!

I think more what's missing here is the comparison of different tries, from the same head. And there prompt caching does help!

▲

JSR_FDED

20 hours ago

[-]

All this mucking about with harnesses and context is really just Markdown engineering.

▲

0gs

20 hours ago

[-]

i definitely considered something like this for the local-first harness i made ... i just don't think most people have the RAM to be able to run two good models yet. maybe i'm wrong though. but i also think a single "agent" can compartmentalize itself into subdivisions better than we imagine (i.e., much much better than any single human can). i ended up creating a broker, though, so at least the tool calls don't eat up as much context. and the auto-reset thing is definitely legit.

▲

andai

19 hours ago

[-]

See also: agent harness in 50 lines (based on mini-swe-agent).

https://minimal-agent.com/

I followed this tutorial earlier today and I'm having a lot of fun with it.

https://gist.github.com/a-n-d-a-i/cb5e929b4c87b8d185760d0264...

I added a 2nd while loop so that it takes user input. And vendored my tiny llm lib (so it's 150 lines now, and dependency free :)

---

As for context-sculpting, the economics are different when not touching the context gives you the >98% discount everyone's doing now. (Although it might be worth fiddling with the suffix... not sure yet!)

e.g. this issue: "ToolSearch saves ~15K tokens per request in prompt size, but at the cost of breaking prefix-based caching for models like DeepSeek that rely on stable prefixes. For heavy users of DeepSeek through OpenRouter, the savings from smaller prompts are dwarfed by the increased cost from cache misses."

https://github.com/QwenLM/qwen-code/discussions/4065

▲

saint-evan

7 hours ago

[-]

This is really interesting 'vibe research'. I also really like the term as well, it points to how much more easy exploration has become with these tools. Anyway, this idea of context sculpting is something I’ve been circling for a while too, especially since the edit button appeared in chat UIs. The takeaway is similar to yours but leans closer to something like user-side auto-compaction.

What tends to happen is that I’ll reach a junction with Claude where I’m reading its output and starting to manually make changes yeah?. The output is usually already very good, close enough to what I want that it's way more efficient for both of us if I just carry it the rest of the way myself. Or I 'ruin' the clean flow by veering off into questions, clarifications, or broader changes. After a while of this back-and-forth review process, after definitely attaining an 'improvement checkpoint', I scroll back up to what I think of as a 'conversational context checkpoint' and then summarise all the changes we’ve converged on as succinctly as possible including the 'why' (verrry important to always tell them the why). I edit that checkpoint message, and that becomes a new branch of the conversation starting from there. All the noise, all the intermediate negotiation, disappears, and what remains is a compressed version of only what I want the model to carry forward.

Sometimes, if the changes are too extensive to easily summarise, I go even further back, one message above where Claude started generating files. I edit that instead, rebuild the context with only what matters, and then present the corrected output as if I authored it myself sometimes even saying something like, 'So Claude I tries to implement this myself, what do you think? Does it match what we agreed on? Do you disagree with anything or have questions or see issues?'. It works surprisingly well up to even catching new subtler problems in that improved code. It’s very close to your idea of context sculpting, just executed through manual, human-intuited branching rather than model-side compaction.

The interesting constraint I want to point out is judgement. The model doesn’t really have it in the human sense. Knowing what is important is exactly the hard part. Prompting it into the right shape figuring that out for itself is almost like an art form, whereas for the human there’s already an intuitive sense of... salience? running underneath everything. That's the sauce really. You, the human, are the most efficient and effective OUTER_MODEL

Another difference is structural. In this workflow, you are effectively editing your own message and then forking the conversation from that point downwards. It behaves more like a git branch than a linear chat.

I’ve never fully bought into agents either, at least not in the sense of something that meaningfully replaces the human in the loop. I don't want to either 'cause what then? Even if agents work, the question becomes who maintains the code, who understands it deeply enough to extend it without decay. So even this manual technique still depends on full human involvement. Each prompt is modular, structured, information dense, and intentionally non-open-ended. That non-open-endedness is like avoiding a circular dependency where groups of related contexts are littered all over and yu can't pick a proper checkpoint for the chunk of work you're currently doing. This again assumes you're solving problems in logical bite-sizes anyway.

But I consider prompting as less like conversation and more like maintaining modular, versioned intent. It should feel like a sciency discipline rather than purely improvisation Al. Steer it, don't let it steer and never append either; wipe the slate with an edit or new conversation but an edit keeps more of the nuances of that specific model instance. It's very interesting research you've done here. Well done to yu and codex!

EDIT: typo!