I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.
I have a pet theory that the visual cortex when developing is linked to some kind of mechanism such as this. You just need proteins that create some sort of resonating signal that feed into the neurons as they grow (obviously this is hand-wavy) but similar feedback loops guide nervous system growth in Zebra fish for example.
> "The core hypothesis: what makes language useful for pre-training is its structure, not its semantics."
As a layman, I've always held the intuition that semantics are the only meaningful thing."Structure without semantics" = form without function, symmetric/regular noise, right?
My naive bet is on compressing semantics into mediums more expressive/information dense than text. Like how some languages have single words/symbols to represent entire sentence-long concepts.
This is a remarkable paper. This is the first time I've heard someone training the actual thing we're trying to get this stuff to do!
---
> This raises a radical question: Is natural language the only path to intelligence?
Of course not! We have octopi, ravens etc., which in many domain display higher intelligence than frontier AIs.
"Embodied reasoning" (genetic algorithm brute force solving physical tasks for a billion years, to name one solution) is definitely one very practical form of intelligence, although we're taking some shortcuts in replicating it.
I'm wondering if simplified analog tasks like Box2D puzzled would help too (or perhaps even simpler? Hanoi? Block worlds?). I know many companies are using simulations of 3D worlds for that.
What I don't understand is how that can integrate with the LLM (physical intelligence would seem to require specialized circuitry, if only for the latency). But maybe once we have good specialized models, LLMs can be trained on their synthetic data?
I'm working on a theoretical/computational framework, the Functional Universe, intended for modeling physical reality as functional state evolution. i would say it could be used to replicate your CA process. Won't link it here to signal my good faith discussing this issue - it's on my GH.
Short answer: it’s close, but incomplete. It’s not that time organizes a log of reality; rather, reality is the accumulation of committed transitions. What you’re calling a ‘log’ it’s the ontological structure itself.
I gather you're basically saying: what we see as a transition ≠ what’s actually happening at the fundamental level. This is a legitimate and deep problem.
You’re right that observed transitions may not compose cleanly. In the Functional Universe, composition is a property of fundamental transitions. What we observe are often coarse-grained projections of many underlying transitions, which can obscure compositional structure.
But is that correct? I think organisms also come with a partial built in understanding of nature at birth.
I agree. Most organisms are quite pre-trained: they have “instincts” and natural behaviors.
E.g. newly hatched turtles know to crawl towards the ocean immediately when they hatch. They don’t learn that on their way.
It seems to me that most lifeforms come into this world pre-trained.