Despite using only unitary operations and no attention mechanism, a 1024×32 model achieves coherent TinyStories generation after < 1.8 hours of training on a single consumer GPU.
This is Part 1 - the next step is physical implementation with $50 of optics from AliExpress.
I apologize for not being clearer.
The goal isn't actually "zero power" - the goal is "so little heat dissipation in orbit is easy".
If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.
I expect to have an answer this week...