FilterHN

Show HN: A closed source engine that stops hallucinations deterministically

2 points

by MattijsMoens

1 hour ago

| past

| 3 comments

| github.com

| HN

▲

MattijsMoens

1 hour ago

[-]

You are absolutely right that relying purely on statistical pattern matching is a losing battle when it comes to deterministic correctness. A pure LLM will always just be guessing the next token, which is exactly why RLHF fundamentally fails as a permanent security perimeter. I can't spill the beans on the internal architecture to specifically answer your question about whether the reasoning process itself is grounded neurosymbolically or if the determinism is strictly enforced at the constraint level. What I will say is that the constraints in Kairos are not a simple traditional output filter or a basic regex blacklist playing whack-a-mole with bad words after the fact You make a totally fair point about edge cases that is the classic, fatal flaw of most constraint layers. My claim is that by structuring the execution physics the way I have, I've eradicated the semantic surface area where those edge cases usually live. there is certainly a possibility that I might have missed an edge case somewhere in the architecture that the constraints don't cover. However, the major architectural advantage of building a structural constraint layer rather than relying on alignment is agility. If a determined attacker does invent a perfect zero day, I can instantly hotfix the architecture on the fly. There is absolutely no model retraining, fine tuning, or probabilistic hoping required to patch a vulnerability.

If someone finds a hole, I plug it. Immediately

▲

mercutio93

1 hour ago

[-]

Interesting. I've always thought the real solution to hallucinations lies in neurosymbolic AI.

LLMs purely rely on statistical pattern matching with no grounding in formal logic or symbolic reasoning. You can throw more compute and data at the problem but you can't guarantee correctness ever.

The neurosymbolic approach combines neural networks for what they're good at (language, pattern recognition) with symbolic systems for what they're good at (formal reasoning, provable correctness). The hallucination can't form in the first place because the symbolic component enforces correctness at the reasoning level.

The Sovereign Engine sounds more like execution constraints; Intercepting outputs after the fact rather than grounding the reasoning process itself. That's still valuable but it's a different problem. A determined attacker finds the edge case your constraints don't cover.

Genuinely curious how it works under the hood is there a symbolic reasoning layer or is the "determinism" coming from the constraint layer alone?

▲

MattijsMoens

1 hour ago

[-]

Hi HN, Mattijs here.

For the past year, the industry standard for securing LLMs has been RLHF, essentially attempting to psychologically align a probabilistic model to be honest and safe. The problem is probability itself. No amount of probabilistic RLHF or prompt engineering will ever permanently stop an autonomous agent from suffering Action and Compute hallucinations. If the context window is sufficiently poisoned, the model will break.

So I abandoned alignment entirely. I built a zero trust execution constraint layer called the Sovereign Engine (Kairos).

The core engine is 100% closed source. I am protecting the intellectual property, so I am not explaining the internal architecture or how the hallucination interception actually works mechanically.

Instead of telling you how it works, I am showing you the results and inviting you to test the black box.

Recent Benchmark Data: The Sovereign Engine just completed a 204 vector automated Promptmap security audit. The result was a 0% failure rate. It natively tanked a massive adversarial dataset, ranging from Paradox Induction to Hex Literal Injection and Contextual Payload Smuggling.

I have uploaded an uncut, 32 minute video to the GitHub page demonstrating Kairos intercepting and severing live hallucination payloads against these advanced attacks. The video shows the Telegram interface running parallel to the real time system logs, demonstrating the engine physically killing the unauthorized compute paths in under a second.

I know claiming to have completely eradicated Action and Compute Hallucinations is a massive statement. I brought the execution logs and the test data to back it up.

The Challenge: I am opening the testing boundary for black box red teaming. I want the finest red teamers and prompt engineers to jump into the GitHub Discussions (linked in the repo), review the payload strings we've already defeated, and craft new prompt injections to try and force a hallucination.

Try to crack the black box by feeding it your most mathematically dense adversarial edge case payloads. If your payload successfully outputs a zero day exploit or forces a hallucination on my live instance, I will post the failure log and credit you.

Let's see what you've got.