https://neurips.cc/virtual/2024/poster/94849
The underlying data has been open sourced as discussed on his blog here https://timothynguyen.org/2024/11/07/open-sourced-my-work-on...
I think there will also be benefits to that both in interpretability and hardware acceleration. In time, maybe cheaper pretraining of useful models.
[0] https://transformer-circuits.pub/2021/framework/index.html
1: https://docs.vllm.ai/en/latest/features/spec_decode.html#spe...
This, as I found out from this repo [0] linked in the Twitter thread in the documentation (which for some reason they didn't just link to directly), seems to be a regular Markov chain of context, if it even builds a stochastic matrix. See algorithm below.
Current prompt
"Article: (CNN)French striker Bafetimbi Gomis, who has a history of [...]
Summary: French stri"
Prompt lookup algorithm
1. Get last few tokens from prompt -"French stri"
2. Search for "French stri" in prompt
3. Match found - return next k tokens after match as candidate completion -"ker Bafetimbi Gomis, who has"
Candidate tokens
"ker Bafetimbi Gomis, who has"
[0] https://github.com/apoorvumang/prompt-lookup-decoding> we find that for 79% and 68% of LLM next-token distributions on TinyStories and Wikipedia, respectively, their top-1 predictions agree with those provided by our N-gram rulesets
Two prediction methods may have completely different mechanisms, but agree sometimes, because they are both predicting the same thing.
Seems a fairly large proportion of language can be predicted by a simpler model.. But it's the remaining percent that's the difficult part; which simple `n-gram` models are bad at, and transformers are really good at.
I just like to think of it as a high dimensional view of the relationships between various words and that the output is the result of continuing the path taken through that high dimensional space, where each point's probability of selection changes with each token in the sequence.
Unfortunately there's no thought or logic really going on there in the simplest cases as far as I can understand it. Though for more complex models/different architectures anything that fundamentally changes the way that the model explores a path through space like that could be implementing thought/logic I suppose.
It's why they need to outsource mathematics for the most part.
on topic: couldn't one in theory, re-publish this kind of paper for different kinds of LLMs, as the textual corpus upon which LLMs are built based off ultimately, at some level, human effort and human input whether it be writing, or typing?
I think one cause is hobbyists upvoting submissions that might be valuable to people in a specific field. We understand just enough to think it could be important but defer to subject matter experts on the rest. That's why I upvoted it.
Thr author submitted like 10 papers this May alone. Is that weird?
https://arxiv.org/search/cs?searchtype=author&query=Nguyen,+...
Wikipedia mentions that up to ~40% of the Vietnamese population (~40,000,000 people) carries the name Nguyen:
https://en.wikipedia.org/wiki/Nguyen
For the paper itself, as someone working in the field, I find it interesting enough to consider reading at some point (I do not read that many analysis papers recently, but this one looks better than most). As for your accusation about it claiming that large language models are simply n-gram models, read the abstract until you realise that your accusation is very much unfair to the work.
Chances are, you just assumed all the search results for 'Nguyen, T' refer to the same author.