Two thoughts about where this could go: first, the internal world state would need to be learned to transfer to real-life robotics, since you can't query the internals of a game engine in training. Second, an enormous challenge for many of these world models is going to be truly unbounded environmental interactivity - Agora is still mostly about a few agents interacting in a static environment. Learning interaction will be hard, because the interactions in games are intentionally added in, by hand. But we (human learners) acquire a strong model for environental interaction very efficiently, which is part of what helps us generalise so effectively.
I'm not sure how to imagine their use in education or gaming, but it's clear that they have a real potential for being used in military programs
It's nightmarish to think these could be trained on shooting game footage and thrown into real life scenarios in some form or another