I'd run the following 5-10 times with one model, then again with a 2nd model.
"Verify the correctness and completeness of all security configs/rules in SETUP.md. Consider if anything is missing, and if anything is not needed. Do not modify any files; only write potential findings to report.txt"
"Verify all findings and claims in report.txt."
Replace "SETUP.md" with whatever you're working on.
It's both terrifying and incredible watching what the models get correct and what they get completely wrong.
However, after enough runs they tend to settle on a state they claim does not need any more edits. And that result is generally useful with much fewer errors/hallucinations compared to a single run.
Why not just look through the actual Claude code codebase and use your own AI to deconstruct it