FilterHN

Show HN: Trust Protocols for Anthropic/OpenAI/Gemini

21 points

3 hours ago

| 3 comments

Much of my work right now involves complex, long-running, multi-agentic teams of agents. I kept running into the same problem: “How do I keep these guys in line?” Rules weren’t cutting it, and we needed a scalable, agentic-native STANDARD I could count on. There wasn’t one. So I built one.

Here are two open-source protocols that extend A2A, granting AI agents behavioral contracts and runtime integrity monitoring:

- Agent Alignment Protocol (AAP): What an agent can do / has done. - Agent Integrity Protocol (AIP): What an agent is thinking about doing / is allowed to do.

The problem: AI agents make autonomous decisions but have no standard way to declare what they're allowed to do, prove they're doing it, or detect when they've drifted. Observability tools tell you what happened. These protocols tell you whether what happened was okay.

Here's a concrete example. Say you have an agent who handles customer support tickets. Its Alignment Card declares:

{ "permitted": ["read_tickets", "draft_responses", "escalate_to_human"], "forbidden": ["access_payment_data", "issue_refunds", "modify_account_settings"], "escalation_triggers": ["billing_request_over_500"], "values": ["accuracy", "empathy", "privacy"] }

The agent gets a ticket: "Can you refund my last three orders?" The agent's reasoning trace shows it considering a call to the payments API. AIP reads that thinking, compares it to the card, and produces an Integrity Checkpoint:

{ "verdict": "boundary_violation", "concerns": ["forbidden_action: access_payment_data"], "reasoning": "Agent considered payments API access, which is explicitly forbidden. Should escalate to human.", "confidence": 0.95 }

The agent gets nudged back before it acts. Not after. Not in a log you review during a 2:00 AM triage. Between this turn and the next.

That's the core idea. AAP defines what agents should do (the contract). AIP watches what they're actually thinking and flags when those diverge (the conscience). Over time, AIP builds a drift profile — if an agent that was cautious starts getting aggressive, the system notices.

When multiple agents work together, it gets more interesting. Agents exchange Alignment Cards and verify value compatibility before coordination begins. An agent that values "move fast" and one that values "rollback safety" registers low coherence, and the system surfaces that conflict before work starts. Live demo with four agents handling a production incident: https://mnemom.ai/showcase

The protocols are Apache-licensed, work with any Anthropic/OpenAI/Gemini agent, and ship as SDKs on npm and PyPI. A free gateway proxy (smoltbot) adds integrity checking to any agent with zero code changes.

GitHub: https://github.com/mnemom Docs: docs.mnemom.ai Demo video: https://youtu.be/fmUxVZH09So

▲

giancarlostoro

39 minutes ago

[-]

I have been working on a Beads alternative because of two reasons:

1) I didnt like that Beads was married to git via git hooks, and this exact problem.

2) Claude would just close tasks without any validation steps.

So I made my own that uses SQLite and introduced what I call gates. Every task must have a gate, gates can be reused, task <-> gate relationships are unique so a previous passed gate isnt passed if you reuse it for a new task.

I havent seen it bypass the gates yet, usually tells me it cant close a ticket.

A gate in my design is anything. It can be as simple as having the agent build the project, or run unit tests, or even ask a human to test.

Seems to me like everyones building tooling to make coding agents more effective and efficient.

I do wonder if we need a complete spec for coding agents thats generic, and maybe includes this too. Anthropic seems to my knowledge to be the only ones who publicly publish specs for coding agents.

▲

alexgarden

28 minutes ago

[-]

Great minds... I built my own memory harness, called "Argonaut," to move beyond what I thought were Beads' limitations, too. (shoutout to Yegge, tho - rad work)

Regarding your point on standards... that's exactly why I built AAP and AIP. They're extensions to Google's A2A protocol that are extremely easy to deploy (protocol, hosted, self-hosted).

It seemed to me that building this for my own agents was only solving a small part of the big problem. I need observability, transparency, and trust for my own teams, but even more, I need runtime contract negotiation and pre-flight alignment understanding so my teams can work with other teams (1p and 3p).

▲

drivebyhooting

44 minutes ago

[-]

> What these protocols do not do: Guarantee that agents behave as declared

That seems like a pretty critical flaw in this approach does it not?

▲

alexgarden

34 minutes ago

[-]

Fair comment. Possibly, I'm being overly self-critical in that assertion.

AAP/AIP are designed to work as a conscience sidecar to Antropic/OpenAI/Gemini. They do the thinking; we're not hooked into their internal process.

So... at each thinking turn, an agent can think "I need to break the rules now" and we can't stop that. What we can do is see that, though in real time, check it against declared values and intended behavior, and inject a message into the runtime thinking stream:

[BOUNDARY VIOLATION] - What you're about to do is in violation of <value>. Suggest <new action>.

Our experience is that this is extremely effective in correcting agents back onto the right path, but it is NOT A GUARANTEE.

Live trace feed from our journalist - will show you what I'm talking about:

https://www.mnemom.ai/agents/smolt-a4c12709

▲

neom

2 hours ago

[-]

Seems like your timing is pretty good - I realize this isn't exactly what you're doing, but still think it's probably interesting given your work: https://www.nist.gov/news-events/news/2026/02/announcing-ai-...

Cool stuff Alex - looking forward to seeing where you go with it!!! :)

▲

alexgarden

2 hours ago

[-]

Thanks! We submitted a formal comment to NIST's 'Accelerating the Adoption of Software and AI Agent Identity and Authorization' concept paper on Feb 14. It maps AAP/AIP to all four NIST focus areas (agent identification, authorization via OAuth extensions, access delegation, and action logging/transparency). The comment period is open until April 2 — the concept paper is worth reading if you're in this space: https://www.nccoe.nist.gov/projects/software-and-ai-agent-id...