Show HN: Early detection of LLM hallucinations via structural dissonance
4 points
3 hours ago
| 1 comment
| github.com
| HN
Hi HN,

I've been exploring a different angle on hallucination detection.

Most approaches react after the fact — fact-checking, RAG, or token probabilities. But hallucinated outputs often show structural warning signs before semantic errors become obvious.

I built ONTOS, a research prototype that monitors structural coherence using IDI (Internal Dissonance Index).

ONTOS acts as an 'External Structural Sensor' for LLMs.

It is model-agnostic and non-invasive, designed to complement existing safety layers and alignment frameworks without needing access to internal weights or costly retraining.

Core idea: Track both local continuity (sentence-to-sentence) and global context drift, then detect acceleration of divergence between them in embedding space.

Analogy: Like noticing a piano performance becoming rhythmically unstable before wrong notes are played. Individual tokens may look fine, but the structural "tempo" is collapsing.

What's in the repo:

• Dual-scale monitoring: Local jumps vs global drift • Pre-crash detection: IDI triggers on acceleration, not just deviation • Black-box compatible: No access to model internals needed

Key limitations:

• Detects structural instability, not factual truth • Sentence-level demos (not token-level yet) • Research prototype, not production-ready

What I'd love feedback on:

• Does structural monitoring feel more robust than semantic similarity alone? • What edge cases where hallucinations are structurally perfect? • Fundamental blockers to using this as an external safety sensor?

GitHub: https://github.com/yubainu/SL-CRF

Critical feedback welcome — early-stage exploration.

yubainu
2 hours ago
[-]
One thing I didn’t emphasize in the post: this work started partly from thinking about how black-box generative models might be audited under emerging regulations like the EU AI Act, where access to model internals or weights can’t be assumed.

Instead of aiming for human-readable explainability, ONTOS looks at whether it’s possible to leave behind reproducible, quantitative traces of structural stability during generation — something closer to audit evidence than a narrative justification.

I don’t claim this says anything about factual correctness or ethics. The narrower question is: was this generation process structurally stable, predictable, or already collapsing internally, even if the output still looks fluent on the surface.

I’m curious whether people see structural monitoring like this as complementary to existing safety / compliance approaches, or fundamentally limited in ways I might be missing.

reply