Project Genie: Experimenting with infinite, interactive worlds
125 points
2 hours ago
| 19 comments
| blog.google
| HN
in-silico
13 minutes ago
[-]
Everyone here seems too caught up in the idea that Genie is the product, and that its purpose is to be a video game, movie, or VR environment.

That is not the goal.

The purpose of world models like Genie is to be the "imagination" of next-generation AI and robotics systems: a way for them to simulate the outcomes of potential actions in order to inform decisions.

reply
avaer
4 minutes ago
[-]
Soft disagree; if you wanted imagination you don't need to make a video model. You probably don't need to decode the latents at all. That seems pretty far from information-theoretic optimality, the kind that you want in a good+fast AI model.

The whole reason for LLMs inferencing human-processable text, and "world models" inferencing human-interactive video, is precisely so that humans can connect in and debug the thing.

I think the purpose of Genie is to be a video game, but it's a video game for AI researchers developing AIs.

I do agree that the entertainment implications are kind of the research exhaust of the end goal.

reply
ollin
46 minutes ago
[-]
Really great to see this released! Some interesting videos from early-access users:

- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities

- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim

- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts

- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse

- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport

reply
WarmWash
33 minutes ago
[-]
The actual breakthrough with Genie is being able to turn around and look back, and seeing the same scene that was there before. A few other labs have similar world simulators, but they all struggle badly with keeping coherence of things not in view. Hence why they always walk forwards and never look around.
reply
nozbufferHere
3 minutes ago
[-]
Still amazed it took ML people so long to realize they needed and explicit representation to cache stuff.
reply
sfn42
20 minutes ago
[-]
And what if I go somewhere then go back there a week later?
reply
jsheard
17 minutes ago
[-]
Best they can do is 60 seconds, for now at least.
reply
phailhaus
48 minutes ago
[-]
I have no idea why Google is wasting their time with this. Trying to hallucinate an entire world is a dead-end. There will never be enough predictability in the output for it to be cohesive in any meaningful way, by design. Why are they not training models to help write games instead? You wouldn't have to worry about permanence and consistency at all, since they would be enforced by the code, like all games today.

Look at how much prompting it takes to vibe code a prototype. And they want us to think we'll be able to prompt a whole world?

reply
asim
16 minutes ago
[-]
Take the positive spin. What if you could put in all the inputs and it can simulate real world scenarios you can walk through to benefit mankind e.g disaster scenarios, events, plane crashes, traffic patterns. I mean there's a lot of useful applications for it. I don't like the framing at this time, but I also get where it's going. The engineer in me is drawn to it, but the Muslim in me is very scared to hear anyone talk about creating worlds.... But again I have to separate my view from the reality that this could have very positive real world benefits when you can simulate scenarios. So I could put in a 2 pager or 10 page scenario that gets played out or simulated and allow me to walk through it. Not just predictive stuff but let's say things that have happened so I can map crime scenes or anything. In the end this performance art is because they are a product company being Benchmarked by wall street and they'll need customers for the technology but at the same time they probably already have uses for it internally.
reply
jsheard
12 minutes ago
[-]
> What if you could put in all the inputs and it can simulate real world scenarios you can walk through to benefit mankind e.g disaster scenarios, events, plane crashes, traffic patterns.

This is only a useful premise if it can do any of those things accurately, as opposed to vibing up something kinda plausible based on an amalgam of vaguely related YouTube videos.

reply
MillionOClock
7 minutes ago
[-]
An hybrid approach could maybe work, have a more or less standard game engine for coherence and use this kind of generative AI more or less as a short term rendering and physics sim engine.
reply
seedie
11 minutes ago
[-]
Imo they explain pretty well what they are trying to achieve with SIMA and Genie in the Google Deepmind Podcast[1]. They see it as the way to get to AGI by letting AI agents learn for themselves in simulated worlds. Kind of like how they let AlphaGo train for Go in an enormous amount of simulated games.

[1] https://youtu.be/n5x6yXDj0uo

reply
krunck
20 minutes ago
[-]
The more of this I see the more I want to spend time away from screens and doing those things I love to do in the real world.
reply
MillionOClock
11 minutes ago
[-]
I love AI but I also hope it will paradoxically make people realize the value of real life experiences and human relationships.
reply
sy26
1 hour ago
[-]
I have been confused for a long time why FB is not motivated enough to invest in world models, it IS the key to unblock their "metaverse" vision. And instead they let go Yann LeCun.
reply
observationist
51 minutes ago
[-]
LeCun wasn't producing results. He was obstinate and insistent on his own theories and ideas which weren't, and possibly aren't, going anywhere. He refused to engage with LLMs and compete in the market that exists, and spent all his effort and energy on unproven ideas and research, which split the company's mission and competitiveness. They lost their place as one of the top 4 AI companies, and are now a full generation behind, in part due to the split efforts and lack of enthusiastic participation by all the Meta AI team. If you look at the chaos and churn at the highest levels across the industry, there's not a lot of room for mission creep by leadership, and LeCun thoroughly demonstrated he wasn't suited for the mission desired by Meta.

I think he's lucky he got out with his reputation relatively intact.

reply
qwertyi0k
4 minutes ago
[-]
To be fair, this was his job description: Fundamental AI Research (FAIR) lab. Not AI products division. You can't expect marketable products from a fundamental AI research lab.
reply
ezst
6 minutes ago
[-]
Since a hot take is as good as the next one: LLMs are by the day more and more clearly understood as a "local maximum" with flawed capabilities, limited efficiency, a $trillion + a large chunk of the USA's GDP wasted, nobody even turning a profit from that nor able to build something that can't be reproduced for free within 6 months.

When the right move (strategically, economically) is to not compete, the head of the AI division acknowledging the above and deciding to focus on the next breakthrough seems absolutely reasonable.

reply
halfmatthalfcat
42 minutes ago
[-]
Were you there or just an attentive outsider?
reply
observationist
36 minutes ago
[-]
Attentive outsider and acquaintance of a couple people who are or were employed there. Nothing I'm saying is particularly inside baseball, though, it's pretty well covered by all the blogs and podcasts.
reply
richard___
22 minutes ago
[-]
What podcast?
reply
qwertyi0k
13 minutes ago
[-]
Most serious researchers want to work on interesting problems like reinforcement learning or robotics or RNN or dozen other avant-garde subjects. None want to work on "boring" LLM technology, requiring significant engineering effort and huge dataset wrangling effort.
reply
observationist
3 minutes ago
[-]
This is true - Ilya got an exit and is engaged in serious research, but research is by its nature unpredictable. Meta wanted a product and to compete in the AI market, and JEPA was incompatible with that. Now LeCun has a lab and resources to pursue his research, and Meta has refocused efforts on LLMs and the marketplace - it remains to be seen if they'll be able to regain their position. I hope they do - open models and relatively open research are important, and the more serious AI labs that do this, the more it incentivizes others to do the same, and keeps the ones that have committed to it honest.
reply
general_reveal
13 minutes ago
[-]
You are beyond correct. World models is what saves their Reality Labs investment. I would say if Reality Labs cannot productize World Models, then that entire project needs to be scrapped.
reply
qwertox
32 minutes ago
[-]
Isn't it more like this: JEPA looks at the video, "a dog walks out of the door, the mailman comes, dog is happy" and the next frame would need to look like "mailman must move to mailbox, dog will run happily towards him", which then an image/video generator would need to render.

Genie looks at the video, "when this group of pixels looks like this and the user presses 'jump', I will render the group different in this way in the next frame."

Genie is an artist drawing a flipbook. To tell you what happens next, it must draw the page. If it doesn't draw it, the story doesn't exist.

JEPA is a novelist writing a summary. To tell you what happens next, it just writes "The car crashes." It doesn't need to describe what the twisted metal looks like to know the crash happened.

reply
phailhaus
52 minutes ago
[-]
Most people don't like putting on VR headsets, no matter what the content is. It just never broke out of the tech enthusiast niche.
reply
0xcb0
1 hour ago
[-]
I keep on repeating myself, but it feels like I'm living in the future. Can't wait to hook this up to my old Oculus glasses and let Genie create a fully realistic sailing simulator for me, where I can train sailing with realistic conditions. On boats I'd love to sail.

If making games out of these simulations work, it't be the end for a lot of big studios, and might be the renaissance for small to one person game studios.

reply
jsheard
1 hour ago
[-]
Isn't this still essentially "vibe simulation" inferred from videos? Surface-level visual realism is one thing, but expecting it to figure out the exact physical mechanics of sailing just by looking at boats, and usefully abstract them into a "gamified" form, is another thing entirely.
reply
falcor84
2 minutes ago
[-]
Why wouldn't it just hook it into something like physx?
reply
nsilvestri
1 hour ago
[-]
The bottleneck for games of any size is always whether they are good. There are plenty of small indies which do not put out good games. I don't see world models improving game design or fun factors.

If I am wrong, then the huge supply of fun games will completely saturate demand and be no easier for indie game devs to stand out.

reply
bdbdbdb
58 minutes ago
[-]
It's very impressive tech but subject to the same limitations as other generative AI: Inconsistency, inaccurate physics, limited time, lag, massively expensive computation.

You COULD create a sailing sim but after ten minutes you might be walking on water, or in the bath, and it would use more power than a small ferry.

There's no way this tech can run on a PS5 or anything close to it.

reply
WarmWash
29 minutes ago
[-]
Five years is nothing to wait for tech like this. I'm sure we will see the first crop of, however small, "terminally plugged in" humans on the back of this in the relatively near future.
reply
ziofill
52 minutes ago
[-]
You raise good points, but I think the “it’s not good enough” stance won’t last for long.
reply
Avicebron
28 minutes ago
[-]
Honestly getting a Sunfish is probably cheaper than the a VR headset if you want to "train sailing"
reply
neom
1 hour ago
[-]
...and then, the pneumatics in your living room.
reply
montebicyclelo
1 hour ago
[-]
Reminds me of this [1] HN post from 9 months ago, where the author trained a neural network to do world emulation from video recordings of their local park — you can walk around in their interactive demo [2].

I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.

(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)

[1] https://news.ycombinator.com/item?id=43798757

[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/

reply
ofrzeta
56 minutes ago
[-]
I don't know ... it's impressive and all but the result always looks kind of dead.
reply
saberience
29 minutes ago
[-]
This sort of comment reminds me about the comments by programmers two years ago.

"Sure it can write a single function but the code is terrible when it tries to write a whole class..."

reply
api
41 minutes ago
[-]
It's super cool but I see it as a much more flexible open ended take on the idea of procedurally generated worlds where hard-coded deterministic math and rendering parameters are replaced by prompt-able models.

The deadness you're talking about is there in procedural worlds too, and it stems from the fact that there's not actually much "there." Think of it as a kind of illusion or a magic trick with math. It replicates some of the macro structure of the world but the true information content is low.

Search YouTube for procedural landscape examples. Some of them are actually a lot more visually impressive than this, but without the interactivity. It's a popular topic in the demo scene too where people have made tiny demos (e.g. under 1k in size) that generate impressive scenes.

I expect to see generative AI techniques like this show up in games, though it might take a bit due to their high computational cost compared to traditional procedural generation.

reply
nickandbro
1 hour ago
[-]
This could be the future of film. Instead of prompting where you don't know what the model will produce, you could use fine-grained motion controls to get the shot you are looking for. If you want to adjust the shot after, you could just checkpoint the model there, by taking a screenshot, and rerun. Crazy.
reply
JKCalhoun
1 hour ago
[-]
I feel like people are already currently doing this. Essentially storyboarding first.

This guy a month ago for example: https://youtu.be/SGJC4Hnz3m0

reply
cloudflare728
20 minutes ago
[-]
We will probably see Ready Player One in a few decades. Hoping to stay alive till then.
reply
lexandstuff
10 minutes ago
[-]
The mass-poverty and climate changed ravaged world parts, I could definitely see.
reply
HardCodedBias
17 minutes ago
[-]
Decades?

I mean, yes, the probability of having that level of tech in decades is quite high.

But the technology is moving very fast right now. It sounds crazy, but I think that there is a 50% chance of having ready player one level technology.

It's absolutely possible it will take more time to become economical.

reply
meetpateltech
1 hour ago
[-]
Google Deepmind Page: https://deepmind.google/models/genie/

Try it in Google Labs: https://labs.google/projectgenie

(Project Genie is available to Google AI Ultra subscribers in the US 18+.)

reply
mosquitobiten
1 hour ago
[-]
Every character goes forward only, permanence is still out of reach apparently.
reply
mikelevins
1 hour ago
[-]
I've been experimenting with that from a slightly different angle: teaching Claude how to play and referee a pencil-and-paper RPG that I developed over about 20 years starting in the mid 1970s. Claude can't quite do it yet for reasons related to permanence and learning over time, but it can do surprisingly well up until it runs into those problems, and it's possible to help it past some obstacles.

The game is called "Explorers' Guild", or "xg" for short. It's easier for Claude to act as a player than a director (xg's version of a dungeon master or game master), again mainly because of permance and learning issues, but to the extent that I can help it past those issues it's also fairly good at acting as a director. It does require some pretty specific stuff in the system prompt to, for example, avoid confabulating stuff that doesn't fit the world or the scenario.

But to really build a version of xg on Claude it needs better ways to remember and improve what it has learned about playing the game, and what it has learned about a specific group of players in a specific scenario as it develops over time.

reply
ge96
43 minutes ago
[-]
Damn that was crazy the picture of the tabletop setup/cardboard robot and it becomes 3D interactive.
reply
srameshc
1 hour ago
[-]
What’s the endgame here? For a small gaming studio, what are the actual implications?
reply
in-silico
10 minutes ago
[-]
The endgame has nothing to do with gaming.

The goal of world models like Genie is to be a way for AI and robots to "imagine" things. Then, they could practice tasks inside of the simulated world or reason about actions by simulating their outcome.

reply
xyzsparetimexyz
1 hour ago
[-]
It means you should go the other way. Open world winning against smaller, handcrafted environments and stories was generally a mistake, and so is this.
reply
mediaman
25 minutes ago
[-]
What does it mean, that open world winning was a mistake? That the market is wrong, and peoples' preferences were incorrect, and they should prefer small handcrafted environments instead of what they seem to actually buy?
reply
aurumque
1 hour ago
[-]
I would think that building a environment which can be managed by a game engine is the first pass. In a few years when we are able to render more than 60 seconds it could very well replace the game engine entirely by just rendering everything in realtime based on user interactions. The final phase is just prompts which turn directly into interactive games, maybe even multiplayer. When I see the progress we've made on things like DOOM, where it can infer the proper rendering of actions like firing weapons and even updating scores on hits and such it doesn't feel like we're very far off, a few years at most. For a game studio that could mean cutting out almost everything between keyboard and display, but for now just replacing the asset pipeline is huge.
reply
mikewittie
3 minutes ago
[-]
We seem to think that Genie is good at the creative part, but bad at the consistency and performance part. How hard would it be to take 60 seconds of Genie output and pipe it into a model that generates a consistent and performant 3D environment?
reply
educasean
1 hour ago
[-]
I understand the ultimate end goal to be simulation of life. A near perfect replica of the real world we can use to simulate and test medicine, economy, and social impact.
reply
hiccuphippo
1 hour ago
[-]
It seems to be generating images in real time, not 3d scenes. It might still be useful for prototyping.
reply
saberience
29 minutes ago
[-]
There are collisions though and physics seemingly, so it doesn't seen to be a huge stretch that this could be used for games.
reply
moohaad
22 minutes ago
[-]
everyone will make his own game now
reply
RivieraKid
1 hour ago
[-]
This would be really cool if polished and integrated with VR.
reply
anxtyinmgmt
1 hour ago
[-]
Demis stays cooking
reply
JaiRathore
21 minutes ago
[-]
I now believe we live in a simulation
reply