I don't get this analogy at all. Instead of a human information flows through a neural network which alters the information.
> Every lifelike detail in the final world is only there because my phone recorded it.
I might be wrong here but I don't think this is true. It might also be there because the network inferred that it is there based on previous data.
Imo this just takes the human out of a artistic process - creating video game worlds and I'm not sure if this is worth archiving.
These days most photos are also stored using lossy compression which alters the information.
You can think of this as a form of highly lossy compression of an image of this forest in time and space.
Most lossy compression is 'subtractive' in that detail is subtracted from the image in order to compress it, so the kind of alterations are limited. However there have been previous non-subtractive forms of compression (eg, fractal compression) that have been criticised on the basis of making up details, which is certainly something that a neural network will do. However if the network is only trained on this forest data, rather than being also trained on other data and then fine tuned, then in some sense it does only represent this forest rather than giving an 'informed impression' like a human artist would.
I noticed this in some photos I see online starting maybe 5-10 years ago.
I'd click through to a high res version of the photo, and instead of sensor noise or jpeg artefacts, I'd see these bizarre snakelike formations, as though the thing had been put through style transfer.
There is no previous data. This network is exclusively trained on the data he collected from the scene.
More interesting is that you made an easy to use environment authoring tool that (I haven’t tried it yet) seems really slick.
Both of those are impressive alone but together that’s very exciting.
edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly 100$. My mistake.
But there's one thing that I'm a little bit worried about: I was getting like 8 stable FPS on my 3 years old flagship phone. My concern is that these models are not optimized to run on this type of hardware, which may or may not lead to hardware obsolescence quicker than planned. And it's not like these aren't powerful, they really are.
I wonder if there are any computer vision projects that take a similar world emulation approach?
Imagine you collected the depth data also.
https://en.wikipedia.org/wiki/Convolutional_neural_network#H...
What could go wrong?
Jokes aside, this is insanely cool!
I didn't see it in an obvious place on your github, do you have any plans to open source the training code?
Link to the demo in case people miss it [1]
> using a customized camera app which also recorded my phone’s motion
Using phone's gyro as a proxy for "controls" is very clever
[1] https://madebyoll.in/posts/world_emulation_via_dnn/demo/
Imagine a similar technique but with productivity software.
And a pre-trained network that adapts quickly.
Is OP the blog’s author? Because in the post the author said that the purpose of the project is to show why NN are truly special and I wanted a more articulate view of why he/she thinks that? Good work anyway!
The special aspect of NNs (in the context of simulating worlds) is that NNs can mimic entire worlds from videos alone, without access to the source code (in the case of pokemon) or even without the source code having existed (as is the case for the real-world forest trail mimicked in this post). They mimic the entire interactive behavior of the world, not just the geometry (note e.g. the not-programmed-in autoexposure that appears when you look at the sky).
Although the neural world in the post is a toy project, and quite far from generating photorealistic frames with "trees that bend in the wind, lilypads that bob in the rain, birds that sing to each other", I think getting better results is mostly a matter of scale. See e.g. the GAIA-2 results (https://wayve.ai/wp-content/uploads/2025/03/generalisation_0..., https://wayve.ai/wp-content/uploads/2025/03/unsafe_ego_01_le...) for an example of what WMs can do without the realtime-rendering-in-a-browser constraints :)
https://netron.app/?url=https://madebyoll.in/posts/world_emu...