Anyone else? How did you manage it?
The trap is when correctness is subjective. Tone, phrasing, whether something 'sounds right' - no automated check helps there, so you're back to reviewing everything.
For structured data like invoices, I've found pattern-matching against known values beats LLMs anyway. Less hallucination risk, faster, and when it fails at least it fails obviously rather than confidently wrong.
I treat it like hiring a consultant. They do a lot of work, but I still review the output before making a decision or passing it on.
Sending something with errors to my boss or peers makes me look stupid. Saying it was caused by unrevised AI makes me look stupider.
To minimize hallucinations, yes AI should be set up for deterministic behaviour (depending on your use case, for example, in recruiter, yes it should be deterministic so it produces the same evaluation for the same candidate every time). Secondly, having another AI check hallucination can be a good starting point, assigning scores and penalizing the first AI can also lead to more grounded responses.
This is a valuable read: https://www.ufried.com/blog/ironies_of_ai_1/
So before anyone concludes that coding agents prove that AI can be useful, find some use cases with similar characteristics.
Part of the reason I like Perplexity is because of the embedded references, and I always, always, double check the sources and holler at the Perp AI when it is clearly confabulating or misinterpreting. Still gives me insights and is useful, but trust-but-verify isn't just about arms control ;)
Not at all