FilterHN

Validation pipeline that blocks AI-generated files with schema errors

1 points

7 hours ago

| 0 comments

Every time I used an LLM to generate structured knowledge files, the output would drift — wrong enum values, missing fields, dates in the wrong format, tags as strings instead of arrays. The files looked fine until something downstream broke: a Dataview query returning nothing, a CI check failing, a search index corrupting.

The standard fix is post-hoc validation — check after writing, fix manually. That doesn't scale past a few dozen files.

So I built a pipeline where the commit gate is the product:

  Prompt → LLM → Validation Engine → Error Normalizer → Retry Controller → Commit Gate → File

The LLM is the only non-deterministic component. Everything else is pure functions. If output fails schema checks, it never touches disk — the normalizer converts error codes into correction instructions and sends them back to the LLM. If the same error fires twice on the same field, it aborts instead of looping — that pattern means your schema has a boundary problem, not the model.

The taxonomy lives in an external akf.yaml — not compiled into the tool:

  enums:
    domain: [ai-system, api-design, devops, security]
    level: [beginner, intermediate, advanced]
    status: [draft, active, completed, archived]

Change your ontology without touching code or redeploying.

What it catches: wrong enum values (E001), missing required fields (E002), bad date formats (E003), type mismatches like tags: "security" instead of tags: [security] (E004), domain values outside your taxonomy (E006).

Interfaces: CLI, Python API, REST (FastAPI), MCP server in progress.

  pip install ai-knowledge-filler
  akf generate "Write a guide on Docker networking"
  akf validate ./vault/

Works with Claude, GPT-4, Gemini, Ollama. 560 tests, 91% coverage. MIT license.

GitHub: github.com/petrnzrnk-creator/ai-knowledge-filler PyPI: pypi.org/project/ai-knowledge-filler

Curious whether others have hit this — what are you doing when AI-generated content drifts out of spec?

No one has commented on this post.