FilterHN

Ask HN: Strategies to Reduce AI Hallucinations?

38 points

by altdataseller

1 month ago

| past

| 13 comments

| HN

What are some optimal strategies to reduce hallucinations by AI, specifically ChatGPT?

Are there special prompts you find effective?

▲

petercooper

1 month ago

[-]

My longtime favorite prompt to trigger a hallucination was "Did King Henry VIII have any grandchildren?" Famously, he did not, but almost every model, till quite recently, would answer yes, often with the most bizarre reasoning.

The way to resolve it on most models over a certain size is a common tactic used with LLMs: ask the LLM to "think through your answer first". For example, you have a system prompt akin to: "Before answering, think through the facts and brainstorm about your eventual answer in <thinking>..</thinking> tags. Answer only 'yes' or 'no' in <answer>..</answer> tags. Do not include any other text in the answer."

In my current evals (based around numerous similar tricky factual questions) this tactic works on all but the smallest and least proficient models (since they don't tend to have strong enough factual knowledge to think it through). Forcing the models to answer simply 'yes' or 'no' yields only a correct answer on the SOTA models (but some speculate GPT-4o might actually be doing this sort of 'thinking' process on the backend automatically anyway).

▲

sshine

1 month ago

[-]

Wow, that works.

Here's a prompt that it consistently hallucinates on:

> What's the black-and-white creature from Scavenger's Reign called?

The correct answer is "Hollow."

Without this prompt it hallucinates random things like "Beast of Burden", "Tooth Skin", "Hexapod", etc.

On my first attempt with the added prompt, it performed a web search before it answers correctly.

Without this addition to the prompt, it just answers right away, i.e. without searching.

It may not always search the web given this prompt -- it seems to be a heuristic.

I can disable the searching by prefixing the prompt with "Don't search the web."

I tried to explore how to ask it to not hallucinate, but it's pretty hard:

> Q: Don't search the web. Before answering, think through the facts and brainstorm about your eventual answer in: What's the black-and-white creature from Scavenger's Reign called? If you can't provide an exact answer, and need to resort to guessing, just say you don't know.

> A: The black-and-white creature from "Scavenger's Reign" is known as the "Hexapod." [...]

▲

langcss

1 month ago

[-]

Cool, just tried it on a few:

Llama3-8b-8192, Gemma2-9b-lt, Llama-3.1-70b-Versatile - All Said Yes

Llama-3.1-8b-Instant:

King Henry VIII had three children who survived to adulthood: Mary I (Mary Tudor, also known as Bloody Mary), Elizabeth I, and Edward VI.

Mary I had one child, a son named Edward VI, who became King of England at the age of 9 after Henry VIII's death. Edward VI died at the age of 15, so he did not have any children.

Elizabeth I never married and did not have any children.

Therefore, King Henry VIII did not have any grandchildren.

???!

Llama3-70b-8192: Also similar to above.

Of course if I ask again I may get a different answer!

▲

jhanschoo

1 month ago

[-]

Some basic theoretical intuition behind this:

the compute graph for an LLM given context before it has to emit a token is traditionally bounded. By asking an LLM to generate intermediate answers, and have that visible as context, you are essentially allowing the LLM to use more compute before it gives you an answer. My guess is that SOTA models are far more flexible in that regard.

▲

piva00

1 month ago

[-]

Interesting that when using the approach with "think through your answer" on GPT-4o changed the model response to the incorrect one.

> Q: Did King Henry VIII have any grandchildren?

> A: King Henry VIII did not have any grandchildren. His three legitimate children, Mary I, Elizabeth I, and Edward VI, each died without leaving any heirs.

> ...

> Q: Think through the answer you gave me.

> A: You're correct; I need to reevaluate my initial response. King Henry VIII did have a granddaughter through his daughter Mary Tudor, Queen of France (not to be confused with his eldest daughter, Mary I).

▲

pizza

1 month ago

[-]

Lilian Weng's blog - a goldmine - has an in-depth post on this: https://lilianweng.github.io/posts/2024-07-07-hallucination/. She leads safety and alignment at OpenAI so might be worth checking out :^)

▲

dheera

1 month ago

[-]

Explicitly allow it the option to be unsure, e.g. "If you do not know the answer, respond with 'none'" or "If you are unsure of the answer, just say that", etc.

Otherwise it does what humans do when asked interview questions, they bullshit because if you bullshit is a 20% chance of landing the job, whereas if you say "I don't know" there is a 0% chance of landing the job. The kind of RLHF training that was put into ChatGPT probably replicates a similar reward structure.

▲

Havoc

1 month ago

[-]

This assumes the it “knows” what it knows ahead of time which isn’t true. It’s why models struggle to tell you how many Rs are in strawberry or why it can’t tell you how many words are in its response. Also doesn’t really have a concept of certainty beyond perhaps logprobs which are flimsy indicators at best and the model isn’t inherently aware of them

▲

konschubert

1 month ago

[-]

I think you're overthinking this.

If you tell it that it can be unsure, the likelihood that an answer is correct increases. Make of that philosophically what you will. If it works it works.

▲

dheera

1 month ago

[-]

"Prove Fermat's Last Theorem using game theory"

https://i.imgur.com/XbLanp1.png

"Prove Fermat's Last Theorem using game theory. If you think this is a bullshit question or are unsure, please just say that."

https://i.imgur.com/knbaPcq.png

▲

jumploops

1 month ago

[-]

1. Give examples of the format of response(s) you want

2. Explicitly call out null conditions (e.g. return { “results”: [] })

3. Use multiple prompts, one to “think”/explain and then one to transform the result

4. Don’t use function calling to get structured output, just use JSON mode

One non-obvious trick we use is to tell the LLM what it said previously as a system messages, not just as user messages, even if the LLM didn’t actually output that specific text.

▲

keiferski

1 month ago

[-]

There are various methods that work at lessening the amount of hallucinations, but in general I think it’s much more productive to use a Generate then Verify approach. If the information you’re creating is both important and novel to you (I.e., you can’t tell if it’s correct or not on your own) then you need the verification step.

▲

HenryBemis

1 month ago

[-]

I am asking 'it' to validate the answers, and list the details. I.e. when I am asking it to go through some framework or regulation, and I am asking it to list the "technical controls that can be derived from the text" (e.g. law saw "you need to encrypt" thus an internal control is to "regularly check for encryption, blah blah blah".

So I am asking 'it' to create a table (instead of just a list of questions) that would include: 1a) suggested control 1b) example of evidence that would quality/pass/fail the control 2) article of law (i.e. Article 5 paragraph 10) 3) short quote from the article

Then I ask it to check its own output, and change 3) to add the full text/the whole paragraph.

99% is correct, and it is easier to scroll and see with my own eyes that the 'paragraph' is the same

▲

constantinum

1 month ago

[-]

One strategy(not directly related to ChatGPT) is to use two models, one for extraction/generation and the other "challenger" to verify the extracted answer. Refer: https://docs.unstract.com/editions/cloud_edition#llmchalleng...

▲

trte9343r4

1 month ago

[-]

I use the same strategy. More like an interview. Many AI related tricks can be automated.

▲

bosco_mcnasty

1 month ago

[-]

could the generator and challenger be cross trained against each other, so as to actually both get better? like a generative-challenger network (GCN) or something like this?

▲

langcss

1 month ago

[-]

I think this is how RLHF actually works. https://huyenchip.com/2023/05/02/rlhf.html#3_1_reward_model

▲

planb

1 month ago

[-]

If you are talking about interactive ChatGPT sessions, if you suspect that it hallucinated, just tell it to "confirm that via a web search".

▲

llm_trw

1 month ago

[-]

Feed it grounding text that's about as long as the output text you expect it to produce.

They are called transformers for a reason.

▲

msnkarthik

1 month ago

[-]

All answers appreciated, but how do you send so much of context when communicating with GPT via an API and not directly through chat. Wondering this for a B2B saas use-case.

▲

OutOfHere

1 month ago

[-]

First, start with a good model. GPT-4-(Turbo), which is available in the paid subscription, should hallucinate less than GPT-4o.

▲

more_corn

1 month ago

[-]

Look up the facts and dump them into the context window.

▲

JSDevOps

1 month ago

[-]

“please don’t take drugs because chatting to me is mind numbingly boring. Thanks”