Formal Verification Gates for AI Coding Loops
63 points
4 hours ago
| 5 comments
| reubenbrooks.dev
| HN
alexpotato
30 minutes ago
[-]
I work in DevOps at a firm that has been very enthusiastic about using LLMs (in the good sense).

The phases were basically:

- try out having the LLM do "a lot"

- now even more

- now run multiple agents

- back to single agents but have the agents build tools

- tools that are deterministic AND usable by both the humans (EDIT: and the LLMs)

The reasons:

1. Deterministic tools (for both deployments and testing) get you a binary answer and it's repeatable

2. In the event of an outage, you can always fall back to the tool that a human can run

3. It's faster. A quick script can run in <30 seconds but "confabulating" always seemed to take 2-3 minutes.

Really, we are back to this article: https://spawn-queue.acm.org/doi/10.1145/3194653.3197520 aka "make a list of tasks, write scripts for each task, combine the scripts into functions, functions become a system"

reply
appstorelottery
2 hours ago
[-]
I think it's all about keeping state in the determinant space. I've come across the same issue, the key was to not rely on LLM performing workflow - the runtime needs to enforce.
reply
singron
1 hour ago
[-]
These guard types are great and I've heavily used them in the past. But why codegen them?

E.g. the jwt auth example has some major problems since the verification rules aren't fully specified in the spec. The jwt-token verified rule only checks that the string isn't empty, but it doesn't actually verify that it is correctly parsed, non-expired, and signed by a trusted key. The authenticated-user rule doesn't check that the user-id actually came from the jwt. If you hand-wrote your constructor, you would ensure these things. Similarly, all the other constructors allow passing in whatever values you like instead of checking the connections of the real objects.

By calling the constructor for these types, you are making an assertion about the relationship of the parameter values. If AI is calling the constructor, then it's able to make it's own assertions and derive whatever result it wants. That seems backwards. AI should use the result of tenant-access to deduce that a user is a member of tenant, but if they can directly call `(tenant-access user-id tenant-id true)`, then they can "prove" tenant-access for anything. In the past, we have named the constructors for these types `TenantAccess.newUnverified`, and then heavily scrutinized all callers (typically just jwt-parsers and specific database lookups). You can then use `TenantAccess.{userId,tenantId}` without scrutiny elsewhere.

reply
eximius
2 hours ago
[-]
So, capabilities/type systems. Building code architecture guardrails steep enough the AI won't jump the fence/take shortcuts.
reply
pyrex41
4 hours ago
[-]
Author here. The TL;DR: move rules from prompts into types the compiler refuses to violate, then bounce the AI coding loop off those refusals. The repo is github.com/pyrex41/Shen-Backpressure. Builds a lot on Geoff Huntley's backpressure idea -- none of this is rocket science, just an effort to apply sound programming principles in a world of LLM coding agents.
reply
Mikhail_K
2 hours ago
[-]
Thank you, interesting work. Please, clarify what is possibly a naive question - your README states that the constraints imposed by your tool are weaker than the formal verification guarantees. Why not implement the backpressure as the full formal verification barrier? Too complex to implement?
reply