The Prompt() Function: Use the Power of LLMs with SQL
41 points
by sebg
12 hours ago
| 3 comments
| motherduck.com
| HN
delichon
11 hours ago
[-]

  FROM hn.hacker_news
  LIMIT 100
"Oops I forgot the limit clause and now owe MotherDuck and OpenAI $93 billion."
reply
domoritz
11 hours ago
[-]
I love the simplicity of this. Hurray for small models for small tasks.
reply
korkybuchek
11 hours ago
[-]
Interesting -- is there any impact from LLM outputs not being deterministic?
reply
drdaeman
10 hours ago
[-]
SQL functions can be non-deterministic just fine. E.g. SQL:2003 grammar defines DETERMINISTIC | NOT DETERMINISTIC characteristic for CREATE FUNCTION. Or, e.g. PostgreSQL has IMMUTABLE | STABLE | VOLATILE clauses.
reply
yen223
2 hours ago
[-]
That's how something like `clock_timestamp()` can be supported. It returns the timestamp at time of execution, so it can change even within the same query if called multiple times: https://www.postgresql.org/docs/current/functions-datetime.h...
reply
korkybuchek
10 hours ago
[-]
Nice, TIL. Thanks!
reply
xnx
10 hours ago
[-]
Aren't LLM outputs deterministic given the same inputs?
reply
simonw
9 hours ago
[-]
Not at all. Even the ones that provide a "seed" parameter don't generally 100% guarantee you'll get back the same result.

My understanding is that this is mainly down to how floating point arithmetic works. Any performant LLM will be executing a whole bunch of floating point arithmetic in parallel (usually on a GPU) - and that means that the order in which those operations finish can very slightly affect the result.

reply
Lockal
3 hours ago
[-]
Classic implementations of LLMs (like llama.cpp) and diffusion image models allow to specify seed, and as long as it runs the same code on the same hardware with the same parallelism level the result will be the same. This is even checked in autotests[1]. The thing that produces randomized results in floating point operations (excluding bugs) is known as "stochastic rounding": it is pretty novel (from implementations standpoint) and it also can be controlled by seed. Other than that I've never seen hardware that has non-deterministic (maybe stochastic) output, but maybe we will see it in the next few years.

[1] https://github.com/ggerganov/llama.cpp/blob/master/examples/...

reply
simonw
3 hours ago
[-]
Do you know why OpenAI are unable to provide a "seed" parameter that's fully deterministic? I had assumed it was for the reason I described, but I'm not confident in my assertion there.
reply
darkteflon
7 hours ago
[-]
Funny wrinkle here: unless I’ve misread the OpenAI API docs[1], the recently added prompt caching feature cannot be explicitly disabled and automatically applies to all input prompts over 1024 tokens for a ~few minutes.

It seems to be possible to work around it by mixing up the very start of your prompt (e.g., with an iteration number), but it’s messed up some of our workflows which rely on running multiple hits with the same prompt to gather a consensus output.

Would be great if they let us disable it.

[1]: https://platform.openai.com/docs/guides/prompt-caching

reply
korkybuchek
10 hours ago
[-]
They are not, necessarily. Especially when using commercial providers who may change models, finetunes, privacy layers, and all kinds of other non-foundational-model things without notice.
reply