FROM hn.hacker_news
LIMIT 100
"Oops I forgot the limit clause and now owe MotherDuck and OpenAI $93 billion."My understanding is that this is mainly down to how floating point arithmetic works. Any performant LLM will be executing a whole bunch of floating point arithmetic in parallel (usually on a GPU) - and that means that the order in which those operations finish can very slightly affect the result.
[1] https://github.com/ggerganov/llama.cpp/blob/master/examples/...
It seems to be possible to work around it by mixing up the very start of your prompt (e.g., with an iteration number), but it’s messed up some of our workflows which rely on running multiple hits with the same prompt to gather a consensus output.
Would be great if they let us disable it.