FilterHN

Why are your models so big? (2023)

15 points

by jxmorris12

3 days ago

| past

| 3 comments

| pawa.lt

| HN

▲

unleaded

2 minutes ago

[-]

Still relevant today. Many problems people throw onto LLMs can be done more efficiently with text completion than begging a model 20x the size (and probably more than 20x the cost) to produce the right structured output. https://www.reddit.com/r/LocalLLaMA/comments/1859qry/is_anyo...

▲

semiinfinitely

3 minutes ago

[-]

I honestly don't get why anyone still owns a "computer." Like, have you heard of an iPhone? It fits in your pocket and is literally a supercomputer that connects to the internet. Why would you waste space on a desk with a keyboard and mouse just to browse the web or send an email? It feels totally outdated to sit in a chair to do stuff you can just do on your phone while lying in bed.

▲

siddboots

36 minutes ago

[-]

I think I have almost the opposite intuition. The fact that attention models are capable of making sophisticated logical constructions within a recursive grammar, even for a simple DSL like SQL, is kind of surprising. I think it’s likely that this property does depend on training on a very large and more general corpus, and hence demands the full parameter space that we need for conversational writing.