The Geometry of Categorical and Hierarchical Concepts in Large Language Models
120 points
4 days ago
| 8 comments
| arxiv.org
| HN
cs702
4 days ago
[-]
Very nice. Well-written. Feels "natural."

Besides helping with interpretability, my immediate thought is that maybe we could pretrain models faster by adding regularization terms in the objective function that induce representations of distinct categories to be in subspaces that are orthogonal to each other, and representations of subcategories to be in orthogonal subspaces that can form polytopes. The data necessary for doing so is readily available: Wordnet synsets. Induce representations of synsets to be orthogonal to each other and representations of hierarchically related synsets to be arranged in polytopes. There's already some evidence that we can leverage Wordnet synsets to pretrain some models faster. Take a look at https://news.ycombinator.com/item?id=40160728 for example.

Thank you for sharing this on HN.

reply
esafak
4 days ago
[-]
> We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure.

It's satisfying that the structure is precisely the one you would hope for.

reply
empath75
4 days ago
[-]
Yeah, you could explain this paper to Aristotle and he would not be that surprised by it.
reply
empath75
4 days ago
[-]
Beautiful paper, relatively well written and accessible, too.

I think everyone _knew_ in some sense that the structure of categorial information about vectors must be hierarchical and have this general kind of structure, but they managed to formalize that intuition into just a few theorems that seem sort of inevitable only in retrospect.

reply
Animats
4 days ago
[-]
Wow. This seems really important, because LLMs have been such black boxes.

Is this result useful only for basic concepts backed by huge numbers of cases in the training data, or is it more general than that?

Comments?

reply
zmgsabst
4 days ago
[-]
This is generally true, about type theories:

A type theory corresponds to a complex diagram, as outlined in topos theory. (Note: complex as in CW-complexes.)

I think it’s fascinating LLMs ended up being a similar structure — but perhaps not entirely surprising. There have been similar results, eg a topological covering can generate an ML model.

reply
mjhay
4 days ago
[-]
There's been a decent amount of work using simplicial complexes and related ideas to generalize graph neural networks, e.g. [0], [1]. If LLMs obey a similar geometry, it could be a promising direction for multimodal models and more principled RAGs with better inductive biases.

[0] https://arxiv.org/pdf/2010.03633

[1] https://arxiv.org/pdf/2012.06333

reply
mdp2021
4 days ago
[-]
reply
zyklu5
3 days ago
[-]
Well, if concepts turn out to be simplicial (or cellular) complexes maybe philosophy can be made into applied algebraic topology.
reply
mjhay
3 days ago
[-]
You may be interested in some of the work of the great Bill Lawvere, especially around formalization of Hegelian dialectics:

https://ncatlab.org/nlab/show/William+Lawvere#RelationToPhil...

reply
zyklu5
2 days ago
[-]
Thank you. It's a bit surprising to see Hegel here, I was thinking more on the lines of the analytic philosophy. Of course many early 20th c. mathematicians who were interested in philosophy such as Weyl or Gian-Carlo Rota would not have thought much of such a distinction.
reply
mjhay
2 days ago
[-]
Yeah, it didn't used to be that big of a divide. Nowadays it seems like analytic philosophers are doing endless retreads, and continental ones are also doing endless retreads, but with more confusing sentence structure.

From a 10,000 foot view, I think nailing down a more "objective" understanding of dialectics (idealist, material, whatever) is a promising direction to ameliorate this meta-problem. People arguing in journals is pretty much a dialectic problem, so understanding that can go a long way to understanding issues beyond that.

reply
100ideas
4 days ago
[-]
reminds me of the anthropic's recent work on identifying the neuron sets that correlate to various semantic concepts in Claude: https://news.ycombinator.com/item?id=40429540 "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet"
reply
szvsw
4 days ago
[-]
OpenAI also just published similar work, though Anthropic did beat them to the punch.

https://openai.com/index/extracting-concepts-from-gpt-4/

https://news.ycombinator.com/item?id=40599749

reply
cabidaher
4 days ago
[-]
In the same vein, Refusal in LLMs is mediated by a single direction: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...
reply