Sem: New primitive for code understanding – not LSPs, but entities on top of Git
69 points
7 hours ago
| 10 comments
| ataraxy-labs.github.io
| HN
andai
4 hours ago
[-]

  $ sem impact authenticateUser

  ⊕ function authenticateUser (src/auth/login.ts:26)

    → depends on:    db.findUser, rateLimiter.check
    ← used by:       loginRoute, authMiddleware
    ! 42 entities transitively affected
    ᛋ 7 tests affected
Okay that is pretty cool. I appreciate this information as a human also.

I got about halfway through reinventing something like this last year (minus the git part). I was trying to make a graph of dependencies in the codebase. (I actually got pretty far with a regex!)

reply
Scaevolus
2 hours ago
[-]
Something like https://www.kythe.io/?
reply
rohanucla
4 hours ago
[-]
Ha, the regex approach is honestly how a lot of people start with this problem and you can get surprisingly far with it until you hit the edge cases around aliased imports, re-exports, and nested scopes where things start falling apart. That's basically why we went with tree-sitter under the hood it gives you the actual parse tree so you don't have to keep patching regex patterns for every new language construct.
reply
jawns
3 hours ago
[-]
The "Try it. 10 seconds." section at the bottom of the page hijacks an existing tool (git diff) and installs a pre-commit hook.

But there are no instructions for how to reverse those actions if you don't like the tool. Feels a little user-hostile to me.

reply
rohanucla
3 hours ago
[-]
I am sorry, should have put up a warning there, but You can do sem unsetup, if you go to the github, you will understand more about the way to reverse it.
reply
bendjejdjdh
2 hours ago
[-]
tone deaf comment. "read the docs to undo it" is user hostile.
reply
opem
1 hour ago
[-]
op literally wrote `sem unsetup` in their comment, so, I don't see what is "tone deaf" in this comment!
reply
Brian_K_White
1 hour ago
[-]
"Ah yeah, you're right. I apologize for that. Here is what to do. I'll update the page."

What an asshole! Plus the uninstall steps were completely inconsiderate single 2 word command. Outrageous.

I can't even think of a better possible response.

reply
myko
1 hour ago
[-]
iiuc they answered the question directly and then told them where they can find further answers, didn't seem tone deaf at all
reply
hankbond
5 hours ago
[-]
I am interested in subtle ways in which we can change how we write software to get better outcomes out of harnesses (model + tools + skills). I'm imagining that use of Sem will be more effective on code written in some shapes than others.

Can you describe what ways this might be beyond just breaking up code into smaller functions?

An example of this is that Models tend to create unit tests that are mostly just mock + reimplementations of imperative code in the functions they test. If you could force behavioral testing by only allowing test creation agents to accessing the function docstring, name/args/types, branch statements and log events, you could potentially avoid these classes of weak tests being created. But that would mean that your code has to optimize to providing signal via those elements.

This is just an example I'm not sure that would actually work.

reply
hankbond
5 hours ago
[-]
Also I keep seeing solutions in this space that are doing inheritance and call stack dependency linkage, but I haven't seen the same level of exploration into data lifetime dependency. Not lifetime in the way it exists in Rust (to my knowledge), but like including when you copy data and transform that copy. The motivation is "if I change this variable, enumerate all the areas that change would propagate to". The idea is similar, to evaluate blast radius of modifications. Ideally something like this could make refactoring more token efficient and consistent as well.

I don't know if you can reliably do that with static analysis tho. I would be interested in some sort of debug attachment like process that does a code coverage type evaluation. If you can't tell this is at least on the edge of (if not past) my depth of expertise

reply
rohanucla
4 hours ago
[-]
This is a really interesting direction, you're essentially talking about data flow or taint analysis, where you track how a value propagates through copies and transformations rather than just following call edges. Honestly pure static analysis gets you partway there but it hits real limits once you run into dynamic dispatch, runtime branching, or serialization boundaries where data gets written somewhere and read back in a completely different part of the codebase.

We're on the structural side right now with call graphs and dependency edges, but a hybrid approach that combines the static graph with runtime instrumentation to fill in the gaps is definitely something I'd love to explore. Thanks for the feedback.

reply
hankbond
1 hour ago
[-]
https://en.wikipedia.org/wiki/Taint_checking

I'm sorry for distracting from your engaging and thoughtful reply but I can't help but giggle at the name of this concept.

reply
rohanucla
4 hours ago
[-]
What I've been more interested in lately is structural intelligence as a field in whole.

Things with LLMs break because our infra was always designed for analyzing lines(tools like grep fuzzy matching) and working on quite small sections of code. LLMs struggle with this in cases when they have to analyze different parts of a codebase they either get too much context where you're throwing whole files at them, or too little where they only see the function in isolation, with no real understanding of how the pieces actually connect to each other.

That's really the gap sem is trying to fill. With sem impact you can give an agent the precise blast radius of a change instead of guessing which files matter, and sem diff --patch lets you enforce that a change only touches specific functions and reject anything that bleeds outside that boundary something that's really hard to do with line-level diffs.

Your testing idea is actually closer than you might think. sem already extracts entity signatures, dependencies, and call graphs, so you could build a harness that gives the test-writing agent only the function signature with its dependency graph and behavioral contract, while withholding the implementation entirely. That would force the agent toward behavioral tests because it literally can't see the internals to mock them. I haven't built this harness myself yet but sem graph and sem inspect expose everything you'd need.

The general principle is that sem gives you a structural map of the codebase to both constrain and validate what the model produces, rather than treating code as flat text and hoping the model figures out the relationships on its own.

Another usecase can be about figuring out dead code present in the codebase.

Edit: Also one last thing because I started working on this while solving the fundamental issue of why merge conflicts were occuring with git, so you might also like the merge drive I open sourced on the same Github org - Weave

reply
hankbond
1 hour ago
[-]
> a structural map of the codebase to both constrain and validate what the model produces

I think this in apt and concise description of what this is trying to accomplish. I'm feeling like we had some really great gains in Model improvements both at the top end and the bottom over the last 6-7 months, but the next period is likely to be defined by harness improvements. I appreciate that your effort is being applied to this particular problem set because I think its far more fundamental to improving agentic performance in code bases than yet another memory framework.

reply
qudat
5 hours ago
[-]
I really like this idea and have been experimenting with it over a week or so.

I think there’s an opportunity to use an AST diff system for code forges where you don’t present the user with line diffs in the UI — or at least not as the first diff the user sees.

I firmly believe code review should happen in your editor.

reply
rohanucla
4 hours ago
[-]
Really glad you've been using it, and yeah that's exactly the direction I've been thinking about. The line diff as the default view in code forges has always felt like an accident of history it definitely was easy to compute, but not what's actually useful for understanding what changed.
reply
jiggunjer
11 minutes ago
[-]
Another potential use case: This may help jujutsu auto split a large revision into small orthogonal revs.

Sometimes agent makes a monolithic commit and it's a lot of work to manually split code you didn't write. After such an auto split I can manually squash related revs into feature/ticket level.

reply
docheinestages
4 hours ago
[-]
I doubt if this actually solves a real problem for humans or agents, especially in complex projects. It might help if the examples show scenarios where this tool and its commands could make a difference.
reply
rohanucla
4 hours ago
[-]
Lemme give you an example. when you're working in a 100K-file TypeScript monorepo and you change a utility function that parses API responses. git diff tells you that you changed n lines in that function. What it doesn't tell you is which services, components, and tests actually depend on that function across the repo. You're left grepping for the function name, hoping nobody aliased the import or re-exported it through a barrel file. sem impact gives you that full downstream dependency list in seconds, so you know exactly what to review and test before you ship.
reply
awoimbee
4 hours ago
[-]
The benchmarks aren't great, they're super specific to sem's output: why would I ask Claude how many "entities" were modified by a commit and do I need a tool specifically for this request ? Note that an "entity" is a sem-specific concept...
reply
rohanucla
4 hours ago
[-]
Thanks for pointing it out. I agree with you here, my testing process was quite specific to sem's output but also would love any suggestion from you of how you would design the whole testing process for this kind of tool?

I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.

reply
onlyrealcuzzo
3 hours ago
[-]
Okay, this looks great, but for the love of God... please cut this out:

> AI agents are 2.3x more accurate when given sem output vs raw line diffs. See the benchmark.

No... This is not convincing of anything. These are not real world tasks.

You're trying to pretend like your tool makes AI agents 2.3x better at coding or bug fixing.

It doesn't.

Your benchmark doesn't prove that.

Your tool is cool. Sell it for what it is. Not for what it's not.

reply
Animats
5 hours ago
[-]
Is this for checking what Claude Code just did to your repo?
reply
rohanucla
5 hours ago
[-]
It can do that, but that's a small slice of what it does. sem parses your codebase into entities (functions, classes, methods) and builds a dependency graph across files.

So instead of line level analysis the whole granularity of seeing changes and tracking thing shifts to entities. It helps in attention mapping of your agent and lets you track the changes faster.

LSPs have been doing it for quite long but using treesitters is faster even tho type awareness is not great with this approach but overall working across multiple languages with a single tool can be quite helpful.

reply
throw1234567891
3 hours ago
[-]
The tool looks great! Thanks for sharing.
reply
dboreham
2 hours ago
[-]
A step in the right direction, and interesting in that it layers over existing git rather than requiring a whole new (unfamiliar, untested) SCCS.
reply
rohanucla
2 hours ago
[-]
git is actually great, and there are not much of the issues as the world says about it, and the best is to build complimentary layers that makes it even stronger is the best bet I guess.
reply