FilterHN

Show HN: I built a tiny LLM to demystify how language models work

88 points

2 hours ago

| 4 comments

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

▲

ordinarily

11 minutes ago

[-]

It's genuinely a great introduction to LLMs. I built my own awhile ago based off Milton's Paradise Lost: https://www.wvrk.org/works/milton

▲

nullbyte808

58 minutes ago

[-]

Adorable! Maybe a personality that speaks in emojis?

▲

SilentM68

46 minutes ago

[-]

Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)

▲

AndrewKemendo

1 hour ago

[-]

I love these kinds of educational implementations.

I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple

Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.

▲

dvt

43 minutes ago

[-]

> the user is immediately able to understand the constraints

Nagel's point was quite literally the opposite[1] of this, though. We can't understand what it must "be like to be a bat" because their mental model is so fundamentally different than ours. So using all the human language tokens in the world can't get us to truly understand what it's like to be a bat, or a guppy, or whatever. In fact, Nagel's point is arguably even stronger: there's no possible mental mapping between the experience of a bat and the experience of a human.

[1] https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf

▲

AndrewKemendo

31 minutes ago

[-]

Different argument

I’m not going to argue other than to say that you need to view the point from a third party perspective evaluating “fish” vs “more verbose thing,” such that the composition is the determinant of the complexity of interaction (which has a unique qualia per nagel)

Hence why it’s a “unintentional nod” not an instantiation