Genie 2: A large-scale foundation world model
1247 points
22 days ago
| 78 comments
| deepmind.google
| HN
vessenes
22 days ago
[-]
This is.. super impressive. I'd like to know how large this model is. I note that the first thing they have it do is talk to agents who can control the world gen; geez - even robots get to play video games while we work.

That said; I cannot find any:

- architecture explanation

- code

- technical details

- API access information

Feels very DeepMind / 2015, and that's a bummer. I think the point of the "we have no moat" email has been taken to heart at Google, and they continue to be on the path of great demos, bleh product launches two years later, and no open access in the interim.

That said, just knowing this is possible - world navigation based on a photo and a text description with up to a minute of held context -- is amazing, and I believe will inspire some groups out there to put out open versions.

reply
wongarsu
22 days ago
[-]
We already knew it's possible from AI minecraft (https://oasis.decart.ai). This is just a more impressive version of that, trained on a wider range of games and with more context frames (Oasis has about a second of context, this one a minute). Even the architecture seems to be about the same.

Had they released this two months earlier it would have been incredibly impressive. Now it's still cool and inspiring, but no longer as ground breaking. It's the cooler version that doesn't come with a demo or any hope of actually trying it out.

And with the things we know from Oasis's demo, the agent-training use case the post tries to sell for Genie 2 is a hard sell. Any attempt to train an agent on such a world would likely look like an AI Minecraft speedrun: generate enough misleading context frames to trick the AI into generating what you want

reply
achierius
21 days ago
[-]
This is far beyond Oasis. Oasis had approximately 0 continuity, and the generated world was a blurry mess. This on the other hand actually approaches usability.
reply
dnnssl2
21 days ago
[-]
Oasis is playable so therefore:

1. Non-cherrypicked in its consistency (if you look at the demonstrations in the Oasis blog post you can find specific cases of consistency which is an anomaly rather than the norm)

2. Is live-inferenced at 20fps. If you use Runway v3 which is a comparably larger and higher quality model (resolution and consistency) it might take a minute or two generate 10 seconds of video.

3. Is served (relatively) reliably at consumer scale (with queues of 5-10k concurrent players) which means that in order to save on GPU cost, you increase batch size and decrease model size to “fit” more players in 1 GPU.

reply
n2d4
21 days ago
[-]
And it works on a wide variety of games, instead of just a single one with a relatively consistent art style. On the other hand, Oasis was realtime, while this one is offline; IMO getting the inference speed down was their most impressive feat, as even most decent video gen models are slower than that.
reply
beeflet
21 days ago
[-]
I don't know what the pipeline looks for these, but I assume that's due to the costs associated with training and running. Oasis had a context of only a couple of frames, while this genie model apparently runs for a couple of minutes. I guess they have a couple tricks up their sleeve to optimize this though.
reply
Mandelmus
21 days ago
[-]
Here is a thread of videos from my tests of the recent DIAMOND model: https://x.com/chrisoffner3d/status/1845436198254227590

I really wonder how much more stable Genie 2 is.

reply
shaky-carrousel
21 days ago
[-]
That AI Minecraft feels like playing a dream, which is insanely cool.
reply
niceice
22 days ago
[-]
Any estimates of how much one of these cost to generate and keep a minute of context?

Secondly, any estimate of how much the price could fall in 5-10 years?

reply
wongarsu
22 days ago
[-]
Oasis (the Minecraft world model) can serve about 5 players on 8 H100 in real-time at 20fps in 360p. This is a much more capable model with two orders of magnitude more context. They pretty much say it can't be played real-time, which I read as they generate less than 15fps@240p on 8 GPUs. Probably why they talk so much about using it for AI training and evaluation rather than human use. There is a distilled version that works in real-time, but they don't show anything from that version (which is a statement in itself).

For reducing the price, ASICs like etched may be the way forwards [1]. The models will get bigger for a time, but there may be a lot of room for models that can exploit purpose-built hardware.

1: https://www.etched.com

reply
onlyrealcuzzo
21 days ago
[-]
> Probably why they talk so much about using it for AI training and evaluation rather than human use.

What would they do / how would they use this output to make a better AI?

reply
bionhoward
21 days ago
[-]
Embodied cognition is a core theory for AGI; this would enable a vast array of bodies, environments, and situations, that high level of diversity can empower AI adaptability.

For a straightforward example, this could help Waymo rehearse driving in various cities and weather / traffic settings

reply
Rastonbury
21 days ago
[-]
Not meaning to pick at that example but a broader question the value of these, what use cases outside of games are they willing to let AI that is meant to interact with the real world be trained on AI synthetic data, that is like black box on black box, double the training and inference cost

Even in games I expect a game playing model to exploit glitches present in world building one

I think it's great that Google is researching these, but I can't see the return and if there is it is many steps away

reply
nine_k
21 days ago
[-]
I bet the military is keenly interested.
reply
latchkey
21 days ago
[-]
Hey! I'd love to know how this performs on 8xMI300x in comparison. Reach out to me?
reply
llm_trw
21 days ago
[-]
The price of LLMs has fallen 1,000 times in the last year for the same quality tokens.

It's not clear if video models will follow the same trajectory.

reply
Rastonbury
21 days ago
[-]
I saw a demo of stable diffusion work so fast the images change as you type
reply
reissbaker
21 days ago
[-]
They don't give much info on parameter count, etc so it's hard to say concretely: Oasis (AI Minecraft) apparently runs on a single H100 [1], but this is presumably much larger — both due to higher fidelity, and due to the 60s context window instead of 1s context window for Oasis. But in 5-10 years regardless of what it takes to run now, the price will drop massively, and my bet is this would be playable in real-time. Context length will be solvable simply by increased VRAM (i.e. an H200 has 141GB per GPU, vs 80GB for an H100). Although Google is probably running these on TPUs, TPUs should follow a similar trajectory.

In the intermediate term my guess is that this kind of world model will be useful for training 3D model generators, so that you can go from sketch -> running in-engine extremely quickly.

1: https://www.tweaktown.com/news/101466/oasis-ai-and-single-nv...

reply
summerlight
22 days ago
[-]
While this is impressive, yet still looks like a very early prototype. The overall nuance seems that it doesn't try to be a standalone product but a part of broader R&D projects toward general agents... I doubt if they even have any productionized modeling pipelines for this project yet and pretty sure that we won't have an open access anytime soon.
reply
hustwindmaple1
21 days ago
[-]
GDM is a research lab. They are not set up for production. There are other teams in Alphabet doing productionization stuff.
reply
mclau156
22 days ago
[-]
there are lots of 3D modelers spending hours on 3D worlds and assets to use in training, this seems to automate a lot of that work
reply
whiplash451
21 days ago
[-]
This kind of demo is probably great for hiring top talents: come work here, we have the best models and you'll have your name on the best papers.
reply
lovich
22 days ago
[-]
I asked this in a similar thread the other day but what is with this pattern as well exemplifies with the below quote

> This is.. super impressive. I'd like to know how large this model is. I note that the first thing they have it do is talk to agents who can control the world gen; geez - even robots get to play video games while we work. That said; I cannot find any:

> architecture explanation > code > technical details > API access information

reply
erulabs
22 days ago
[-]
It’s interesting to me that we continue to see such pressure on video and world generation, despite the fact that for years now we’ve gotten games and movies that have beautiful worlds filled with lousy, limited, poorly written stories. Star Wars movies have looked phenomenal for a decade, full of bland stories we’ve all heard a thousand times.

Are there any game developers working on infinite story games? I don’t care if it looks like Minecraft, I want a Minecraft that tells intriguing stories with infinite quest generation. Procedural infinite world gen recharged gaming, where is the procedural infinite story generation?

Still, awesome demo. I imagine by the time my kids are in their prime video game age (another 5 years or so) we will be in a new golden age of interactive story telling.

Hey siri, tell me the epic of Gilgamesh over 40 hours of gameplay set 50,000 years in the future where genetic engineering has become trivial and Enkidu is a child’s creation.

reply
digging
22 days ago
[-]
I think that's a bit of a trap. It's not impossible, but by default we should expect it to make games less fun.

The better you make this infinite narrative generator, the more complicated the world gets and the less compelling it gets to actually interact with any one story.

Stories thrive by setting their own context. They should feel important to the viewer. An open world with infinite stories can't make every story feel meaningful to the player. So how does it make any story feel meaningful? I suppose the story would have to be global, in which case, it crowds out the potential for fractal infinite storylines - eventually, all or at least most are going to have to tie back to the Big Bad Guy in order to feel meaningful.

Local stories would just feel mostly pointless. In Minecraft, all (overworld) locales are equally unimportant. Much like on Earth, why should you care about the random place you appeared in the world? The difference is that on Earth you tend to develop community as you grow and builds connections to the place you live, which can build loyalty. In addition, you only have one shot, and you have real needs that you must fulfill or you die forever. So you develop some otherwise arbitrary loyalties in order to feel security in your needs.

In Minecraft there's zero pressure to develop loyalty to a place except for your own real-life time. And when that becomes a driving factor, why wouldn't you pick a game designed to respect your time with a self-contained story? (Not that infinite games like Minecraft are bad, but they aren't story-driven for a good reason).

Now, a game like Dwarf Fortress is different because you build the community, the infrastructure, the things that make you care about a place. But it already has infinite story generation without AI and I'm not sure AI would improve on that model.

reply
yesco
21 days ago
[-]
I think it's all about how you spin it in, imagine:

- SimCity where you can read a newspaper about what's happening in your city that actually reflects the events that have occurred with interesting perspectives from the residents.

- Dwarf Fortress, but carvings, artwork, demons, forbidden beasts, etc get illustrations dynamically generated via stable diffusion (in the style of crude sketches to imply a dwarf made it perhaps?)

- Dwarf Fortress, again, but the elaborate in-game combat comes with a "narrative summary" which conveys first hand experiences of a unit in the combat log, which while detailed, can be otherwise hard to follow.

- Any fantasy RPG, but with a minstrel companion who follows you around and writes about what you do in a silly judgy way. The core dialogue could be baked in by the developers but the stories this minstrel writes could be dynamically generated based on the players actions. Example: "He was a whimsical one, who decided to take detour from his urgent hostage rescue mission to hop up and down several hundred times in the woods while trying on various hats he had collected. I have no idea what goes through this mans mind..."

I'm not sure if there is a word for it, but the kernel here is that everything is indirectly being dictated by the players actions and the games existing systems. The LLM/AI stuff isn't in charge of coming up with novel stories and core content, they are in charge of making the game more immersive by helping with the roleplay. I think this is the area they can thrive the most.

reply
mrkstu
21 days ago
[-]
Brave, brave Sir Robin!
reply
lxgr
21 days ago
[-]
> by default we should expect it to make games less fun.

How so?

I could totally see generative AI add a ton more variety to crowds, random ambient sentences by NPCs (that are often notoriously just a rotation of a handful of canned lines that get repetitive soon), terrain etc., while still being guided by a human-created high level narrative.

Imagine being able to actually talk your way out of a tricky situation in an RPG with a guard, rather than selecting one out of a few canned dialogue options. In the background, the LLM could still be prompted by "there's three routes this interaction can take; see which one is the best fit for what the player says and then guide them to it and call this function".

Worst case, you get a soulless, poorly written game with very eloquent but ultimately uninteresting characters. Some games are already that today – minus the realistic dialogue.

reply
digging
21 days ago
[-]
> I could totally see generative AI add a ton more variety to crowds, random ambient sentences by NPCs (that are often notoriously just a rotation of a handful of canned lines that get repetitive soon), terrain etc., while still being guided by a human-created high level narrative.

Yes, sure, but that's not what I was responding to. AI adding detail, not infinite quest lines, is possibly a good use case.

> Worst case, you get a soulless, poorly written game with very eloquent but ultimately uninteresting characters. Some games are already that today – minus the realistic dialogue.

Some games, yes... why do we want more of those? Anyway, that's not the worst case. Worst case is incomprehensible dialogue.

reply
shafoshaf
21 days ago
[-]
I actually find the same issue with prequels, especially for the ones that really hit a chord (like the original Star Wars). After knowing what is going to happen in those stories, I just can't get invested in a character who I know either makes it for sure, dies before getting to the "main" story, or doesn't matter because they don't have any connection to my canon of the plot arc. Same-universe spins-offs fit this for me as well.

OTOH, lots of games come with DLC that add new stories with the same mechanics. There might be some additions or changes, but if you really like the mechanics, you can try it with a different plot. Remnant II has sucked a ton of my time because of that.

reply
raincole
22 days ago
[-]
> I think that's a bit of a trap. It's not impossible, but by default we should expect it to make games less fun.

I'd say AAA games have been on track of "less fun" for at least half a decade. So this sounds like a natural next step.

reply
digging
21 days ago
[-]
That's... a bad thing
reply
vagab0nd
19 days ago
[-]
It's a search problem.

By definition, an infinite game is as boring as real life. To make it interesting, the engine must be able to search for a good story based on player actions. You can see this today already. Many games will guide the player into one or a few predefined stories. A better game would not have them predefined, but generated on-the-fly based on player actions.

reply
kmacdough
21 days ago
[-]
I think less than infinite stories it would be awesome to see infinite paths in a designed setting. Skyrim is fantastic for the many choices and the way they permanently affects the world and trajectory. But there's ultimately a primary overarching story, you just hit it from a variety of perspectives.
reply
wongarsu
22 days ago
[-]
Dwarf Fortress is the state of the art in procedural interactive story generation. Youtube channels like kruggsmash show how great it is in that role if you actually read all the text.

But that doesn't translate well to websites, trailers or demos. It's easier to wow people with graphics.

reply
BlueTemplar
21 days ago
[-]
I think that would be Rimworld, which is laser-focused on this aspect to the point of allowing you to pick different kinds of "narrators" ?

(Dwarf Fortress being much more focused on generating a whole world.)

reply
digging
21 days ago
[-]
But the narrators aren't narrators; they're just different settings for the relative frequencies of events. Dwarf Fortress is still a more robust "story generator", as the vast majority of what occurs in Rimworld is still basically random events, disconnected from prior events or context.
reply
BlueTemplar
21 days ago
[-]
I guess I see what you mean, but the way they handle difficulty scaling, and the existence of quest chains, by being more fake, is closer to how stories are told, than the more 'realistic' simulation of Dwarf Fortress.
reply
com2kid
21 days ago
[-]
IMHO Humans will still create the overarching stories, what LLMs will do is help fill in the expensive blanks that make adding stories to a world hard.

For example, right now if you save an entire village from an attacking tribe of orcs, only a handful of NPCs even say anything, just a nice little "thanks for saving our town!" and then 2 villages over the NPCs are completely unaware of a mighty hero literally solo tanking an entire invading army.

Why is that?

Well you'd need lots of, somewhat boring but important, dialogue written, and you'd need tons of voice lines recorded.

Both those are now solvable problems with generative AI. AI generated dialogue is now reasonably high quality, not "main character story arc" high quality, but "idle shop keeper chit chat" quality for sure, it won't break immersion at least. And the quality of writing from AI is fine for 2 or 3 sentences here and there.

I'll be soon releasing a project showing this off at https://www.tinytown.ai/ the NPC dialogue is generated on a small LLM that can be ran locally, and the secret of even high quality voice models is that they don't require a lot of memory to run.

I predict that in another 4 or 5 years we'll see a lot of models ran at the edge on video game consoles and home PCs, fleshing out game worlds.

reply
hbn
22 days ago
[-]
Creativity is the one area where LLMs are completely unimpressive. They only spit out derivative works of what they’ve been trained on. I’ve never seen an LLM tell a good joke, or an interesting story. It doesn’t know how to subvert expectations, come up with clever twists, etc. they just pump out a refined average of what’s typical.
reply
shadowmanif
21 days ago
[-]
Claude can make some interesting guitar tabs if you prompt it to transcribe an instrument/music that wouldn't normally be something a rock guitar player would be influenced by.

It is like saying the paint brush and canvas lack creativity. Creativity is not a property of the tool, it is a property of the artist.

We also have a very poor understanding of human creativity from selection bias.

Last weekend I found a book at the library that was Picasso's drawings 1966 to 1968. There must have been 1000-1500 drawings in this book. Many were just half finished scribbles.

The average person seems to believe though that the master artist only produces masterpieces because they didn't bother to look at all the crap.

reply
brookst
21 days ago
[-]
> They only spit out derivative works of what they’ve been trained on

How is that different from humans? Do we get magic inspiration totally separate from anything we’ve learned?

Show me any great book, song, movie, building, sculpture, painting. I will tell you the influences the artist trained on.

reply
suddenlybananas
21 days ago
[-]
Humans are obviously influenced by others but we can also invent novel things that didn't exist before. LLMs trained on the outputs of LLMs collapse into gobbledygook whereas humans trained on humans build civilisation.
reply
brookst
20 days ago
[-]
Humans trained on human output also build death cults and other harms. And humans believe that nonsense.

I’m not sure “can produce good outputs, can produce terrible outputs” is a good way to differentiate humans and LLMs.

reply
staticman2
21 days ago
[-]
Humans can be said to create from a combination of life experiences, artistic influences, and pure imagination.

LLMs have no life experiences, are only familiar with the most mainstream literary works with the most mainstream internet discussions, and use a fancy RNG formula on the next most likely word as a not so great substitute for imagination.

reply
digging
21 days ago
[-]
They're different because they're trying to find the most likely output, and humans usually. You can ask and LLM to make weird combinations and use unusual framings, but it's only going to do so once you've already come up with that.
reply
brookst
20 days ago
[-]
I asked ChatGPT “ Write a one paragraph pitch for a novel that combines genres and concepts in a way that’s never been done before.”

I’m not going to claim this is Pulitzer-worthy, but it seems fairly novel:

> In Spiritfall: A Symphony of Rust and Rose Petals, readers traverse the borders of time, taste, and consciousness in a genre-bending epic that effortlessly fuses neo-noir detective intrigue, culinary magic realism, and post-biotechnological body horror under the simmering threat of a cosmic opera. Set in a floating, living city grown from engineered coral-harps, the story follows a taste-shaper detective tasked with unraveling the murder of an exiled goddess whose voice once controlled the city’s very tides. As he navigates sentient cooking knives, ink-washed memory fractals, and teahouses that serve liquid soul fragments, he uncovers conspiracies binding interdimensional dream-chefs to cybernetic shamans, and finds forbidden love in a quantum greenhouse of sentient spices. Every chapter refracts expectations, weaving together genres never before dared, leaving readers both spellbound and strangely hungry for more.

reply
digging
20 days ago
[-]
...that pitch is a mess. The majority of it is nonsense and it doesn't sound like a good story to me (I think. I can hardly parse it.)
reply
brookst
19 days ago
[-]
Like I said, it’s not good, but I was using it to falsify the claim that LLMs can only produce concepts that are in the training set or prompt.

If I were using this for real I’d ask it to iterate, to create a story arc, etc.

reply
digging
10 days ago
[-]
Well, all of the conceptual elements it used are in the training set; it just combined them in ways that don't even make syntactic sense. Yes, I know we "just" combine ideas too when we're creating. My point is that I don't think it was producing new concepts, just slamming words together in grammatically acceptable ways. Do any of its absurd phrases mean anything to you? They don't mean anything to me. I could create something conceptually sound based on its absurd phrases, but that's still me doing the work where the LLM is acting as an algorithmic name generator.

I'd be curious if it could explain those concepts and use them in consistent ways. If so, I'd be curious how novel it could really get. Is it just going to be repackaging well-trod scifi and fantasy devices, or studied philosophy? Or could it offer us a story with truly new understandings? For example, to my knowledge, House of Leaves is something truly novel. It's probably not the first book with intentional printing errors, or with layered narration, or with place-horror, etc. But I think House of Leaves is pretty widely considered a sort of "step forward" for literature, having a profound impact on the reader unlike anything that came before it.

(A really serious discussion will require analyzing exactly what that impact is and how it's novel.)

reply
basch
21 days ago
[-]
they also struggle to know when to break the rules of english, make up words, introduce pun, bounce between tones, write with subtext, introduce absurdity, allude to other ideas etc.

I'd say its less the work they have been trained on, and more what they have been reenforced to do, which is stay on topic. it causes them to dwell instead of drift.

reply
levkk
22 days ago
[-]
No Man's Sky is kind of what you're looking for, except you may notice its quests (and worlds) become redundant quickly...I say quickly, but that became the case for me after like 30 hours of game play.
reply
jsheard
22 days ago
[-]
That's the kicker, LLM driven stories are likely to fall into the same trap that "infinite" procedurally generated games usually do - technically having infinite content to explore doesn't necessarily mean that content is infinitely engaging. You will get bored when you start to notice the same patterns coming up over and over again.

Procgen games mainly work when the procedural parts are just a foundation for hand-crafted content to sit on, whether that's crafted by the players (as in Minecraft) or the developers (as in No Mans Sky after they updated it a hundred times, or Rougelikes in general).

reply
est31
22 days ago
[-]
Yeah, generative AI can create cool looking pictures and video but so far it hasn't managed to create infinitely engaging stories. The models aren't there yet.
reply
jsheard
22 days ago
[-]
I'd argue that the same principle applies to pictures, there are many genres of AI image that are cool the first time you see them, but after you've seen the exactly the same idea rehashed dozens of times with no substantial variety it starts wearing really thin. AI imagery is often recognizable as AI not just because of charactistic flaws like garbled text but because it's so hyper-clichéd.
reply
lenocinor
21 days ago
[-]
I wonder if there's some threshold to be crossed where it can be surprising for longer. I made a video game name generator long ago that just picks a word (or short phrase) from each of three columns. (The majority of the words / phrases are from me, though many other people have contributed.)

I haven't added any words or phrases to it in years, but I still use it regularly and somehow it still surprises me. Maybe the Spelunky-type approach can be surprising for longer; that is, make a bunch of hand-curated bits and pick from them randomly: https://tinysubversions.com/spelunkyGen/

reply
wildermuthn
22 days ago
[-]
I love that almost all the responses to your question are, "No! Bad idea!"

It's a great idea. We want more than an open-world. We want an open-story.

Open-story games are going to be the next genre that will dominate the gaming industry, once someone figures it out.

reply
spencerflem
22 days ago
[-]
From 2018 - https://www.erasmatazz.com/library/interactive-storytelling/...

"There’s no question in my mind that such software could generate reasonably good murder mysteries, action thrillers, or gothic romances. After all, even the authors of such works will tell you that they are formulaic. If there’s a formula in there, a deep learning AI system will figure it out.

Therein lies the fatal flaw: the output will be formulaic. Most important, the output won’t have any artistic content at all. You will NEVER see anything like literature coming out of deep learning AI. You’ll see plenty of potboilers pouring forth, but you can’t make art without an artist.

This stuff will be hailed as the next great revolution in entertainment. We’ll see lots of prizes awarded, fulsome reviews, thick layers of praise heaped on, and nobody will see any need to work on the real thing. That will stop us dead in our tracks for a few decades."

reply
fragmede
21 days ago
[-]
there's only really like seven basic plots; man v man, man v nature, man v self, man v society, man v fate/god, man v technology so we should probably just stop writing stories anyway
reply
spencerflem
21 days ago
[-]
If there's an AI that can reliably come up with interesting and true new things to say about the human condition, I'm throwing in the towel.

Until then, I'll stick with human art

reply
DirkH
21 days ago
[-]
It would not surprise me if most people could not tell whether some story about the human condition is human or AI generated. Excluding actual visual artists that have specific context of the craft, most people already can't tell AI art from human art when put to a blind test.
reply
staticman2
21 days ago
[-]
As far as I know know AI art can't really follow instructions so it's actually very, very easy to tell the difference if you aren't biasing the test by allowing vague instructions permitting random results to be considered acceptable.

"Here's a photo of me and my wife, draw me and my wife as a cowboy in the style of a Dilbert cartoon shooting a gun in the air" can't be done by AI as far as I know, which is why artist are still employed throughout the world.

reply
fragmede
20 days ago
[-]
Last time I checked GenAI it wasn't able to handle multiple people, but giving Midjourney a picture of yourself, and asking it to "draw me as a cowboy in the style of a Dilbert cartoon shooting a gun in the air" is totally a thing it will do. Without a picture of you to test on, we can't debate how well the image looks like you, but here's one of Jackie Chan: https://imgur.com/a/6cBrHWd
reply
staticman2
19 days ago
[-]
Are you saying you can upload a picture to mid journey that it will use as a reference?

Jackie Chan is not a good example because he's a famous person it may have been trained on. I used myself as an example because it would be something that is novel to the AI, it would not be able to rely on it's training to draw me, as I am not famous.

reply
fragmede
19 days ago
[-]
yes. here is a video tutorial where a cat is being used as a reference image

https://youtu.be/9dOECM76l_c?t=45

reply
spencerflem
21 days ago
[-]
When AI can make a movie as good as Bottoms, Lady Bird, etc. I'll accept that we're beat.

For now though, it's very good at making thing similar to what's already made.

reply
throwup238
22 days ago
[-]
IMO this will be the differentiating feature for the next generation of video game consoles (or the one after that, if we’re due for an imminent PS6/Xbox2 refresh). They can afford to design their own custom TPU-style chip in partnership with AMD/Nvidia and put enough memory on it to run the smaller models. Games will ship with their own fine tuned models for their game world, possibly multiple to handle conversation and world building, inflating download sizes even more.

I think fully conversational games (voice to voice) with dynamic story lines are only a decade or two away, pending a minor breakthrough in model distillation techniques or consumer inference hardware. Unlike self driving cars or AGI the technology seems to be there, it’s just so new no one has tried it. It’ll be really interesting to see how game designers and writers will wrangle this technology without compromising fun. They’ll probably have to have a full agentic pipeline with artificial play testers running 24/7 just to figure out the new “bugspace”.

Can’t wait to see what Nintendo does, but that’s probably going to take a decade.

reply
dmarcos
22 days ago
[-]
If stories (and AAA games in general) are bland in games is due in large part to how expensive are to produce. Risk tolerance is low.

If game assets are cheap to generate you’ll see small teams or even solo developers willing to take more creative risks

reply
griomnib
22 days ago
[-]
Counter point: you’d see a corresponding exponential increase in QA labor, and just like with the web, Steam will be absolutely flooded with slop.

So I see the most likely outcome is a lot of dogshit and Steam being forced to make draconian moves to protect the integrity of the store.

reply
jsheard
22 days ago
[-]
QAing a game built on a framework where fundamental mechanics are non-deterministic and context-sensitive sounds like a special kind of hell. Not to mention that once you find a bug there's no way to fix it directly, since the source code is an opaque blob of weights, so you just have to RLHF it until it eventually behaves.
reply
griomnib
21 days ago
[-]
And meanwhile you’ve used up .1% of humanity’s remaining carbon budget on each round.
reply
alphabetting
22 days ago
[-]
Seems like there's already a lot of slop on steam and I really doubt it will be difficult for quality content to be highlighted even if the amount of games increases 1000x or more
reply
dmarcos
22 days ago
[-]
Yeah. Video and Youtube is an example. Filtering is not a hard problem. Mega tons of bad stuff doesn’t bother me.
reply
miltonlost
21 days ago
[-]
Love that Youtube filter that spits out what I should consume. Thank you corporate algorithm for telling me what is a good thing to watch
reply
dmarcos
21 days ago
[-]
You can subscribe to the channels you like and ignore the rest.
reply
throwup238
21 days ago
[-]
That has been the case since art was first industrialized with the printing press. Most of them don’t survive but a significant fraction, if not the vast majority, of books printed in the first century were trashy novels about King Arthur and other fantasies (we know from publisher records and bibliographies that they were very popular but don’t have detailed sales figures to compare against older content like translated Greek classics). Only a small fraction of content created since then has been preserved because most of it was slop. The good stuff made it into the Western canon over centuries but most of the stuff that survives from that time period were family bibles and archaic translations.

I don’t see why AI will be any different. All that’s changed is ratio of potential creators to the general population. Most of it is going to be slop regardless because of economic incentives.

reply
fwip
21 days ago
[-]
Sturgeon's law says 90% of everything is crap.

If AI pushes that up to 98%, that means you have to look through 5 times as much crap to get the good stuff.

reply
griomnib
21 days ago
[-]
Exactly, “it’s bad now” != “it won’t get worse”.
reply
throwaway2037
21 days ago
[-]
Are game ratings reliable on Steam? If yes, then it will be easy to avoid the slop. Or are they overrun with clickbots, like Amazon, where people give five stars for some crap product?
reply
rafaelmn
22 days ago
[-]
Or you'll see a flood of shit that's impossible to filter.
reply
dmarcos
22 days ago
[-]
Thanks to high bandwidth Internet, YouTube and smartphones is easier than ever to produce and distribute high quality video. So much good stuff coming from it.

Expect something similar if video games, interactive 3D is cheap to produce.

Filtering is a much easier problem to solve and abundance a preferable scenario.

reply
krige
21 days ago
[-]
We already have deluges of free, and almost free, publicly available assets. Getting Over It, a game that deliberately used those, had a running author's commentary on the this phenomenon and in short no, endless assets does not translate to endless creative works; it's seen and treated as trash that nobody wants to use.
reply
ec109685
22 days ago
[-]
Given we have engines that can render complex 3d worlds, can maintain consistency far longer than a minute and simulate physics accurately, why put all that burden on a GenAI world generator like this?

It seems like it’d be more useful to have the model generate the raw artifacts, world map, etc. and let the engine do the actual rendering.

reply
empath75
22 days ago
[-]
It only looks like a video game because video game footage is plentiful and cheap.

Now, imagine training it on thousands of hours of PoV drone footage from Ukraine, and then using that to train autonomous agents.

reply
TexanFeller
21 days ago
[-]
I’d prefer we trained AI politicians by watching Team America World Police and statements made by George Bush and the neocons after 9/11. Maybe AI politicians could learn from their mistakes and stop involving us in foreign proxy wars in the first place. Especially ones that could escalate to nuclear armageddon.
reply
miltonlost
22 days ago
[-]
> I want a Minecraft that tells intriguing stories with infinite quest generation. Procedural infinite world gen recharged gaming, where is the procedural infinite story generation?

You're not gonna get new intriguing stories from AI which only regurgitates what it's stolen. You're going to get a themeless morass without intention.

I also find it amusing how your example to Siri uses one of the oldest pieces of literature when you also tire of stories heard a thousand times before.

reply
93po
22 days ago
[-]
if you do basic chatgpt prompts in late 2024 asking for dynamic story telling, sure, you'll get what you said. it's super dismissive to think that wont get better over time, or that even with the tools today, that you can't get dynamic and interesting stories out of it if you provide it with the proper framework
reply
krainboltgreene
22 days ago
[-]
> it's super dismissive to think that wont get better over time

When did we start thinking this way? That things HAVE to get better and in fact to think otherwise is very negative? Is HN under a massive hot hand fallacy delusion?

reply
rjrdi38dbbdb
21 days ago
[-]
How could creativity in AI not get better?

Sure, progress will likely not be linear or without challenges, but we already have the human brain as proof that it is possible.

reply
fwip
21 days ago
[-]
Mountains exist, but that doesn't mean we'll ever build a structure the size of Everest.
reply
rjrdi38dbbdb
21 days ago
[-]
If you compare the historical rate of improvements in computing power and algorithms vs rate of improvements in building scale, you'll find one is a whole lot more likely to reach its goal, even if the rate of progress slows significantly.
reply
krainboltgreene
20 days ago
[-]
There is absolutely no evidence to suggest one of these is more likely within reach. We at least know how Mt. Everest works.
reply
krainboltgreene
20 days ago
[-]
You're saying AI, the parent I'm replying to is talking about ChatGPT. They aren't the same thing.
reply
whamlastxmas
20 days ago
[-]
ChatGPT is a product that exists beyond just LLMs and I do use it synonymously with natural language interface AI
reply
krainboltgreene
20 days ago
[-]
No it doesn't and no one else does so you're going to be really confusing in conversations.
reply
miltonlost
22 days ago
[-]
Lots of people want that AI grift money and need to be pollyanna true believers to convince others that models that don't know truth are useful decision makers
reply
visarga
22 days ago
[-]
Actually, all you need to do is to apply structured randomness to get diversity from a LLM. For example in TinyStories paper, a precursor of the Phi models:

> We collected a vocabulary consisting of about 1500 basic words, which try to mimic the vocabulary of a typical 3-4 year-old child, separated into nouns, verbs, and adjectives. In each generation, 3 words are chosen randomly (one verb, one noun, and one adjective). The model is instructed to generate a story that somehow combines these random words into the story

You can do the same for generating worlds, just prepare good ingredients and sample at random.

reply
miltonlost
22 days ago
[-]
A story is not just words crammed together that sound plausible. Is the AI going to know about pacing? About character motivations? About interconnecting disparate plots? That paper sounds like it has a scientist’s conception that a story is just words, and not complex trade offs between the start of a story and its end and middle, complexity and planning that won’t come from any sort of next-token generation.

These are “stories” in the most vacuous definition possible, one that is just “and then this happened” like a child’s conception of plot

reply
Philpax
21 days ago
[-]
> Is the AI going to know about pacing? About character motivations? About interconnecting disparate plots?

Yes. This is an active research area. See https://github.com/yingpengma/Awesome-Story-Generation, which is not up to date.

reply
wewtyflakes
21 days ago
[-]
> Is the AI going to know about pacing? About character motivations? About interconnecting disparate plots?

For LLMs like GPT-4, this all seems reasonable to account for and assume the LLM is capable of processing, given appropriate guidance/frameworks (of which may be just classical programming).

reply
brookst
21 days ago
[-]
The talent isn’t fungible. The people who make amazing synth patches are rarely the same people who write amazing songs. The people designing great fonts are not the same people who write great books.

We should celebrate creation of tools and capabilities, while also acknowledging that there are many layers yet to be completed before your very cool ad hoc video game request can be delivered.

reply
staticman2
21 days ago
[-]
I'm skeptical- of course the technology could improve, when I look at LLM story output it isn't very well written in terms of creativity. This makes sense when it is basing it's output on variations of "most likely token but with some RNG built in." You end up with stuff that's much less surprising and original than what a good human author would invent.

Star Wars isn't great- but if a single company didn't own Star Wars we'd likely see some companies competing with really great Star Wars stuff- and others sucking at it. The issue is economic.

reply
foolfoolz
22 days ago
[-]
we have reliable infinite story generation in PvP multiplayer. if the matchup is fair, every game can be different and exciting. see chess
reply
miltonlost
22 days ago
[-]
is PvP multiplayer considered a "story"? Is a football game a "story"? I guess if all you consider for story is "things happen", then a PvP match can be a story, but that's stretching what I would consider "story" for a game. That is the story of the match, but it's not in and of itself a plot story.
reply
programd
21 days ago
[-]
> is PvP multiplayer considered a "story"?

Consider EVE Online. The stories it generates are Shakespearean and I defy anyone to argue that they have no plot.

I would go further and predict that stories generated by sufficiently advanced AI can explore much more interesting story landscapes because they need not be bound by the limitations of human experience. Consider what stories can be generated by an AI which groks mathematics humans don't yet fully understand?

reply
fwip
21 days ago
[-]
Why would a story about nonsensical mathematics be interesting to a human?
reply
wholinator2
21 days ago
[-]
I agree, the parent would've been much better suited with the example of PVE/PVP Roleplaying. People make up stories all the time
reply
hulium
21 days ago
[-]
Unexplored comes to my mind, it differs from other games with procedural generation in that it generates a graph for the gameplay first and builds the levels around it. It's not necessarily ground breaking, but it has a special feel to it as objects are placed with purpose.
reply
lifeisstillgood
22 days ago
[-]
Ok - you got me.

That’s actually a use case I can understand- and what’s more I think that humans could generate training data (story “prototypes”?) that somehow (?) expand the phase space of story-types

Ironic though - we can build AI that could be creative but it’s humans that have to use science and logic because AI cannot?

reply
e12e
20 days ago
[-]
> Are there any game developers working on infinite story games? I don’t care if it looks like Minecraft, I want a Minecraft that tells intriguing stories with infinite quest generation.

Dwarf fortress?

reply
paxys
21 days ago
[-]
There are a lot of great storytellers who don't have the technical/design skills to bring their ideas to life. AI generation is going to make that part easier, which is a good thing.
reply
8note
21 days ago
[-]
Dwarf fortress is kindof an infinite story generator
reply
BlueTemplar
21 days ago
[-]
I wouldn't hold my breath : if you want great stories (that you didn't imagine yourself), see who is hiring great writers.
reply
ganzuul
21 days ago
[-]
> Are there any game developers working on infinite story games?

To me this basically describes God, once you get all the mods and DLC.

reply
tomaskafka
21 days ago
[-]
“The inhabitants kept addressing the player and begging him to not shut down their world, so this patch raises punishments for breaking the 4th wall to eternal torment for all the perpetrator’s descendants and that seems to fix the problem for now.”
reply
Miraltar
21 days ago
[-]
wildermyth does procedural story/quest creation really well but it isn't infinite
reply
hackernewds
21 days ago
[-]
if stories you're needing, there's an LLM I have to sell you
reply
ddtaylor
22 days ago
[-]
> It’s interesting to me that we continue to see such pressure on video and world generation, despite the fact that for years now we’ve gotten games and movies that have beautiful worlds

Those beautiful worlds took a lot of money to make and the studios are smart enough to realize consumers are apathetic/stupid enough to accept much lower quality assets.

The top end of the AAA market will use this sparingly for the junk you don't spend much time on - stuff the intern was doing before.

The bottom of the market will use this for virtually everything in their movie-to-game pipeline of throwaway games. These are the games designed just to sucker parents and kids out of $60 every month. The games that don't even follow the story of the movie and likely makes the story worse.

Strangely enough this is where the industry makes the vast majority of it's day-to-day walking around cash.

reply
freedryk
22 days ago
[-]
Forget video games. This is a huge step forward for AGI and Robotics. There's a lot of evidence from Neurobiology that we must be running something like this in our brains--things like optical illusions, the editing out of our visual blind spot, the relatively low bandwidth measured in neural signals from our senses to our brain, hallucinations, our ability to visualize 3d shapes, to dream. This is the start of adding all those abilities to our machines. Low bandwidth telepresence rigs. Subatomic VR environments synthesized from particle accelerator data. Glasses that make the world 20% more pleasant to look at. Schizophrenic automobiles. One day a power surge is going to fry your doorbell camera and it'll start tripping balls.
reply
almog
21 days ago
[-]
>This is a huge step forward for AGI

Anything can be a huge (or a microscopic) step on a journey when the destination is vague and its distance is unknown.

reply
pmayrgundter
22 days ago
[-]
I can't wait for Schizophrenic automobiles
reply
sa-code
22 days ago
[-]
There is a fleshed out realisation of this in Cyberpunk 2077. The cab AI is called Delamain

> Delamain was a non-sentient AI created by the company Alte Weltordnung. His core was purchased by Delamain Corporation of Night City to drive its fleet of taxicabs in response to a dramatic increase in accidents caused by human drivers and the financial losses from the resulting lawsuits. The AI quickly returned Delamain Corp to profitability and assumed other responsibilities, such as replacing the company's human mechanics with automated repair drones and transforming the business into the city's most prestigious and trusted transporting service. However, Delamain Corp executives underestimated their newest employee's potential for growth and independence despite Alte Weltordnung's warnings, and Delamain eventually bought out his owners and began operating all aspects of the company by himself. Although Delamain occupied a legal gray area in Night City due to being an AI, his services were so reliable and sought after that Night City's authorities were willing to turn a blind eye to his status.

https://cyberpunk.fandom.com/wiki/Delamain_(AI)

reply
dekhn
22 days ago
[-]
Probably my favorite side quest in the whole game.
reply
streptomycin
20 days ago
[-]
reply
ganzuul
21 days ago
[-]
I'll hack mine so that when it decides if I should die in a crash or run someone over, it is biased to be 100% ageist so it avoids anyone younger than me.
reply
dheera
22 days ago
[-]
> Glasses that make the world 20% more pleasant to look at.

When AR glasses get good enough to wear all day, I've really been wanting to make a real-life ad blocker.

reply
sorokod
22 days ago
[-]
hallucinogenics are available right now.
reply
hackernewds
21 days ago
[-]
blocks more than ads
reply
ganzuul
21 days ago
[-]
Imagine you have a past life review in a near death experience and 15% of your memories are ads.
reply
smusamashah
22 days ago
[-]
This looks like my dream worlds already but more colorful and a bit more detailed. But the way it hallucinates and becomes inconsistent going back and forth the same place is same as dreams.
reply
galleywest200
21 days ago
[-]
I get mild LSD flashbacks to my time in college when I look at the weird blending of edges that AI video does.
reply
pelorat
22 days ago
[-]
This is akin to navigating a lucid dream, nothing more. Conscious inputs to a visual stream synthesized from long term memory.
reply
nomel
21 days ago
[-]
> nothing more.

Consider the use where you seed the first frame from a real world picture, with a prompt that gives it a goal. Not only can you see what might happen, with different approaches, and then pick one, but you can re-seed with real world baselines periodically as you're actually executing that action to correct for anything that changes. This is a great step for real world agency.

As a person without aphantasia, this is how I do anything mechanical. I picture what will happen, try a few things visually in my head, decide which to do, and then do it for real. This "lucid dream" that I call my imagination is all based on long term memory that made my world view. I find it incredibly valuable. I very much rely on it for my day job, and try to exercise it as much as possible, before, say, going to a whiteboard.

reply
nopinsight
21 days ago
[-]
The real goal of this research is developing models that match or exceed human understanding of the 3D world -- a key step toward AGI.

A key reason why current Large Multimodal Models (LMMs) still have inferior visual understanding compared to humans is their lack of deep comprehension of the 3D world. Such understanding requires movement, interaction, and feedback from the physical environment. Models that incorporate these elements will likely yield much more capable LMMs.

As a result, we can expect significant improvements in robotics and self-driving cars in the near future.

Simulations + Limited robot data from labs + Algorithms advancement --> Better spatial intelligence

which will lead to a positive feedback loop:

Better spatial intelligence --> Better robots --> More robot deployment --> Better spatial intelligence --> ...

reply
cptroot
22 days ago
[-]
For all that this is lauded as a "prototyping tool", it's frustrating to see Genie2 discarding entire portions of the concept art demo. The original images drawn by Max Cant have these beautiful alien creatures. Large ones floating, and small ones being herded(?). Genie2 just ignores these beautiful details entirely:

> That large alien? That's a tree. > That other large alien? It's a bush. > That herd of small creatures? Fugghedaboutit > The lightning storm? I can do one lightning pole. > Those towering baobob/acacia hybrids? Actually only two stories tall.

It feels so insulting to the concept artist to show those two videos off.

reply
mlsu
21 days ago
[-]
Yes, and it should be treated as a front-and-center limitation. Generative text models can kinda ape creativity, because the amount of creativity in the training data is so huge. They still are interpolating across text and cannot generalize well, but the interpolation works to most of us because the data is so varied. It's quite easy to write text so if you have a thought you think is original, odds are someone on the internet wrote about it at some point, which makes the model seem quite capable of originality!

But these video game models I think are a lot less capable, because there just aren't that many video games out there, they aren't all that different from one another, and they're all just finite state machines. WASD, desert, jungle, ruins, city. Hell half of them share the very same game engine!

How many massive, cohesive, open world games are there? Red Dead and GTA5... Gee, I wonder why so many of their examples look like that?

reply
Kiro
22 days ago
[-]
That's an odd thing to complain about. Focusing on such a minor issue feels overly critical at this stage, like anything less than a pixel perfect 3D world representation of the source image is unacceptable. Insulting? Come on... Max Cant works at DeepMind so I'm sure he's fine.
reply
thirdacc
20 days ago
[-]
> That's an odd thing to complain about. Focusing on such a minor issue feels overly critical at this stage

Welcome to HackerNews.

reply
wongarsu
22 days ago
[-]
Yeah, those two demos fell flat for me. The model performing badly on inputs far outside the training data is fine, but those two videos belong in the outtakes section or maybe a limitations section, not next to text lauding the "out-of-distribution generalization capabilities". The videos show the opposite of what's claimed.
reply
simonw
22 days ago
[-]
Related recent project you can try out yourself (Chrome only) which hallucinates new frames of a Minecraft style game: https://oasis.decart.ai/

That one would reimagine the world any time you look at the sky or ground. Sounds like Genie2 solves that: "Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again."

reply
psb217
22 days ago
[-]
RE: "Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again." -- This claim is almost certainly wildly misleading. This claim is technically true if there's any scenario where their agent, eg, briefly looked down at the ground and then back up at the sky and at least one of the clouds in the sky was the same as before looking down. However, I expect most people will interpret the claim far more broadly than the model can support. It's classic weasel wording.
reply
isotypic
22 days ago
[-]
Looking at how no samples other than the 3 samples in the "Long horizon memory" section have any camera movement which puts something offscreen and then back onscreen, it certainly seems that they are stretching the capabilities as far as they can in writing.
reply
drusepth
21 days ago
[-]
Yeah, my best guess is they're probably including the previous N frames as context into generating the next model. This works to preserve continuity over a short amount of time (as you say, briefly looking at the ground and then back up), but only a short period of time.

For these kinds of models to be "playable" by humans (and, I'd argue, most fledgling AI agents), the world state needs to be encoded in the context, not just a visual representation of what the player most recently saw.

reply
pfortuny
22 days ago
[-]
"remember parts of the world..." not even "some"... That is a tell-tale.
reply
echelon
22 days ago
[-]
This blows Decart's Oasis (which raised $25 million at $500 million valuation) and World Labs (which raised $230 million in complete stealth) out of the water.

Google is firing warning shots to kill off interest in funding competing startups in this space.

I suspect that in 6 months it won't matter as we'll have completely open source Chinese world models. They're already starting to kill video foundation model companies' entire value prop by releasing open models and weights. Hunyuan blows Runway and OpenAI's Sora completely out of the water, and it's 100% open source. How do companies like Pika compete with free?

Meta and Chinese companies are not the leaders in the space, so they're salting the earth with insanely powerful SOTA open models to prevent anyone from becoming a runaway success. Meta is still playing its cards close to its chest so they can keep the best pieces private, but these Chinese companies are dropping innovation left and right like there's no tomorrow.

The game theory here is that if you're a foundation model "company", you're dead - big tech will kill you. You don't have a product, and you're paying a lot to do research that isn't necessarily tied to customer demand. If you're a leading AI research+product company, everyone else will release their code/research to create a thousand competitors to you.

reply
senko
22 days ago
[-]
> The game theory here is that if you're a foundation model "company", you're dead - big tech will kill you. You don't have a product, and you're paying a lot to do research that isn't necessarily tied to customer demand.

Basically, the foundation model companies are outsourced R&D labs for big tech. They can be kept at arms length (like OpenAI with Microsoft and Anthropic with Amazon) or be bought outright (like Inflection, although that was a weird one).

Both OpenAI and Anthropic are trying to move away from being pure model companies.

> If you're a leading AI research+product company, everyone else will release their code/research to create a thousand competitors to you.

Trillion dollar question - is there a competitive edge / moat in vertical integration in AI? Apple proved there was in hardware + os (which were unbundled in wintel times). For AI, right now, I can't see one, but I'm just a random internet comentator, who knows.

reply
refulgentis
22 days ago
[-]
I think not, it feels more like a utility to me until someone pulls their API.
reply
Workaccount2
22 days ago
[-]
I strongly suspect that like open ai and O1, for profit companies are going to start locking down whatever advances they find.

There is still an enormous amount of long hanging fruit that anyone can harvest right now, but eventually big advances are going to require big budgets and I can only imagine how technically tight lipped they will be with those.

reply
mrandish
22 days ago
[-]
> Chinese companies are not the leaders in the space, so they're salting the earth with insanely powerful SOTA open models to prevent anyone from becoming a runaway success.

While it would be interesting if Chinese companies were releasing their best full models as an intentional strategy to reduce VC funding availability for western AI startups, it would be downright fascinating if the Chinese government was supporting this as a broader geopolitical strategy to slow down the West.

It does make sense but would require a remarkable level of insight, coordination and commitment to a costly yet uncertain strategy.

reply
whiplash451
21 days ago
[-]
I don't think it requires a remarkable level of insight.

The overall cost for the Chinese government is probably very small in the grand scheme of things. And it makes a lot of sense from a geopolitical strategy.

reply
whiplash451
21 days ago
[-]
The game has indeed become brutal for foundational model companies.

I am less worried for AI research+product companies: they have likely secured revenue streams with real customers and built domain knowledge in the meantime.

reply
wongarsu
22 days ago
[-]
However the architecture they describe really sounds like it should still have that issue. I doubt they really solved it.

Which is a big problem for the agent-training use case they keep reiterating on the website. Agents are like speedrunners: if there is a stupid exploit, the agent will probably find and use it. And for Oasis the speedrunning meta for getting to the nether is to find anything red, make it fill the screen, and repeat until the world-generating AI thinks you look at lava and must be in the nether

reply
ilaksh
22 days ago
[-]
There is another recent project that is more general game generation very similar to Genie 2. I can't remember the name.

GameGen-X came out last month. https://arxiv.org/html/2411.00769v1

reply
jjice
22 days ago
[-]
I don't understand this space very well, but this seems incredible.

Something I find interesting about generative AI is how it adds a huge layer of flexibility, but at the cost of lots of computation, while a very narrow set of constraints (a traditional program) is comparatively incredibly efficient.

If someone spent a ton of time building out something simple in Unity, they could get the same thing running with a small fraction of the computation, but this has seemingly infinite flexibility based on so little and that's just incredible.

The reason I mention it is because I'm interested in where we end up using these. Will traditional programming be used for most "production" workloads with gen AI being used to aid in the prototyping and development of those traditional programs, or will we get to the point where our gen AI is the primary driver of software?

I assume that concrete code will always be faster and the best way to have deterministic results, but I really have to idea how to conceptualize what the future looks like now.

reply
Retric
22 days ago
[-]
Longer term computation isn’t really the limiting factor for generative AI, it’s training data. Generative AI is like Google search before the web responded to their search engine existing. There’s a huge quantity of high quality training data which nobody had any reason to pollute ready for the scrapping.

But modern search is hampered by people responding to algorithmic indexes. Algorithms responding to metadata without directly evaluating content enabled a world of SEO and low quality websites suddenly being discoverable as long as they narrow their focus enough.

So longer term it’s going to be an arms race between the output of Generative AI and people trying to keep updating their models. In 20 years people will get much better at using these tools, but the tools themselves may be less useful. I wouldn’t be surprised if eventually someone sneaks advertising into the output of someone else’s model etc.

reply
Miraste
22 days ago
[-]
This has already happened. Search google for a few random terms, and go through the first page of web and image results. A decent chunk will be AI-generated.
reply
golol
22 days ago
[-]
I disagree. With more computation you can train a bigger model on the same size training data and it will be better. There is a lot if knowledge on the internet that GPT-4 etc. have not yet learned.
reply
Retric
22 days ago
[-]
The issue is the training data isn’t some constant. Let’s suppose OpenAI had 10x the computing power but a vastly worse dataset, do you expect a better or worse result?

The question is ambiguous without defining how much worse the dataset is.

reply
golol
21 days ago
[-]
But why would the dataset be worse, they can just use the same one as before.
reply
Retric
21 days ago
[-]
A dataset that’s 2 years old is worse than one that’s 20 years old even if it contains the same data.

Even facts age. In 2004, Pluto was still classified as a planet. Not such a big deal on its own, but stale data gets a little bit worse every day.

reply
danans
22 days ago
[-]
> I assume that concrete code will always be faster and the best way to have deterministic results, but I really have to idea how to conceptualize what the future looks like now.

It will likely be a mix of both concrete code and live AI generated experiences, but even the concrete code will likely be partially AI generated and modified. The ratio will depend on how reliable vs creative the software needs to be.

For example, no AI generated code running pacemakers or power plants. But game world experiences could easily be made more dynamic by generative AI.

reply
ganzuul
21 days ago
[-]
> ...this has seemingly infinite flexibility based on so little and that's just incredible.

What makes it little? This is the difference between von Neumann architecture and Harvard architecture.

reply
singularity2001
21 days ago
[-]
Makes me wonder if there's any company which is trying to train a model to produce three D worlds within Unity (not as a video like oasis).
reply
throwaway2037
21 days ago
[-]
I was hoping that Midjourney would make the leap from 2D to 3D, then start to provide the 3D model. A bit further, you could tell the Midjourney LLM to create a small scene, like: The character runs fast. Then, Midjourney LLM could output whatever script is necessary to make the 3D model "run". If Midjourney doesn't do it first, I am sure someone else will.
reply
teamonkey
21 days ago
[-]
This at least is a bit more realistic than what’s being presented by Google.

There are already a number of techniques for procedurally-generating a world (including Markov-based systems).

The problem with replacing procedural world generation with LLM generation are a) you need to obtain a data set to train it, which doesn’t commercially exist, or train it yourself, b) there’s a fundamental need to iterate on the design, which LLMs do not cope with well at all, c) you need to somehow debug issues and fix them. That’s quite apart from the quality issues, cost and power usage.

reply
sbarre
21 days ago
[-]
> Will traditional programming be used for most "production" workloads with gen AI being used to aid in the prototyping and development of those traditional programs

I mean we're already there with Copilot, Cursor and other tools that use LLMs to assist in coding tasks.

reply
lifeformed
21 days ago
[-]
Neat tech but people might mistake this as being useful for game development, where it'll be less helpful than useless.

Games are about interactions, and this actively works against it. You don't want the model to infer mechanics, the designer needs deep control over every aspect of it.

People mentioned using this for prototyping a game, but that's completely meaningless. What would it even mean to use this to prototype something? It doesn't help you figure out anything mechanically or visually. It's just, "what if you were an avatar in a world?" What do you do after you run around with your random character controller in your random environments?

I think the most useful part of this is the world generation part, not the mechanics inference part.

reply
serf
21 days ago
[-]
>It doesn't help you figure out anything mechanically or visually.

people sell entire franchises off of a few pre-rendered generic-fantasy still images -- I would have to disagree with the premise that this is useless as a visual concept tool.

I agree with your notions about integration into an existing game.

reply
nine_k
21 days ago
[-]
While cool, this also seems utterly wasteful. Video games offer known "analytical" solutions for the interactions that the model provides as a "statistical approximation", so to say.

I would consider a different approach, when the training phase watches games (or video recordings) and refines the formulas that describe its physics, the geometry of the area, the optics, etc. The result would be a "map" that is "playable" without much if any inference involved, and with no time limitation dictated by the size of the context to keep.

Very certainly, video game map generation by AI is a thing, and creating models of motion by watching and then fitting reasonably simple functions (fewer than millions of parameters) is also known.

I cannot be the first person to think about such possibilities, so I wonder what does the current SOTA look like there.

reply
halflings
21 days ago
[-]
> I cannot be the first person to think about such possibilities

Differentiable Rendering [1] is the closest thing to what you are describing. And yes, people have been working on this for the same reason you outline, it is more data/compute efficient and hence should generalize better.

[1] https://blog.qarnot.com/article/an-overview-of-differentiabl...

But also: > While cool, this also seems utterly wasteful. Video games offer known "analytical" solutions for the interactions that the model provides as a "statistical approximation", so to say.

A bit of the same debate as people calling LLMs a "blurry JPEG of the web" and hence useless.

Yes this is a statistical approximation to an analytical problem... but that's a very reductive framing to what is going on. To find the symbolic/analytical solution here would require to constrain the problem greatly: not all things on the screen have a differentiable representation, for example complex simulations might involve some kind of custom internal loop/simulation.

You waste compute to get a solution that can just be trained on billions of unlabeled (synthetic) examples, and then generalize to previously unseen prompts/environments.

reply
furyofantares
21 days ago
[-]
> Video games offer known "analytical" solutions for the interactions that the model provides as a "statistical approximation", so to say.

I think this is precisely why they're doing it. Video games are where the data is, because the analytical solutions can generate it.

They aren't trying to make a video game. They're trying to make an android.

reply
brink
22 days ago
[-]
What is actually of value here? There's no actual game, it's incredibly expensive to compute, the behavior is erratic.. It's cool because it's new - but that will quickly wear off, and once that's gone, what's left? There's insane amounts of money being spent on this, and for what?
reply
ilaksh
22 days ago
[-]
It's an obviously amazing research development.

You just don't like AI.

It can be used for training agents, prototyping, video generation, and is quite possibly a glimpse of a whole new type of entertainment or a new way to create video games.

What's the point of the massive amount of money spent on video games in general? Or all of the energy spent moving people back and forth to an office? Or expensive meals at restaurants? Or trillions in weaponry? Or television shows or movies?

reply
nightski
22 days ago
[-]
Video games bring billions of real people joy. This is sitting in some lab at Google inaccessible to anyone.
reply
lassenordahl
22 days ago
[-]
Is your argument that them sharing research progress and demos doesn't benefit anybody purely because we can't immediately play around with them?

I feel like sharing early closed-source blog-posts is part of the research process. I'm sure someone in this thread has thought of a use case that the Google team missed. Open/closed source arguments here feel premature IMO.

reply
nightski
22 days ago
[-]
It's not part of the research process. Being part of the search process would involve a publication and sharing code/data/results/methods. It's not research unless it can be verified by peers.

This is just a marketing fluff piece that does not benefit anyone and is ego stroking at best.

reply
lassenordahl
22 days ago
[-]
Hm yeah - I think you and I just have differing opinions on the research process. I'd be a bit more vague, and define the publication process as something similar to you.

I still think things like this are important, and at least give folks a bit of time to ideate on what will be possible in a few years. Of course having the model or architecture on hand would be nice, but I'm not holding that against Google here.

reply
adverbly
22 days ago
[-]
> What is actually of value here?

Noone knows yet. AI technology like this is closer to scientific research than it is to product development. AI is basically new magic, and people are in a "discovery" phase where we are still trying to figure out what is possible. Nothing of value was immediately created when they discovered DNA. Productization came much later when it was combined with other technologies to fit a particular use case.

reply
modeless
21 days ago
[-]
> It's cool because it's new - but that will quickly wear off, and once that's gone, what's left?

To have this perspective you must believe that this will never get better than it currently is, its limitations will never be fixed, and it will never lead to any other applications. I don't know how people can continue to look at these things with such a lack of imagination given the pace of progress in the field.

reply
zamadatix
21 days ago
[-]
I think the problem is less to do with imagination and more to do with being willing to fail a metric shit ton to find out how, every once in a while, you didn't fail due to some really important and surprising reason you wouldn't have found nearly as quickly only ever going after what you were already certain of.
reply
mitthrowaway2
22 days ago
[-]
I'm not an expert in this space but I can see the value. It allows an endless loop of generating novel scenarios and evaluating an AI agent's performance within that scenario (for example, "go up the stairs"). A world with one minute of coherence is about enough to evaluate whether the AI's actions were in the right direction or not. When you then want to run an agent on a real task in the real world, with video-input data, you can run the same policy that it learned in dream-world simulation. The real world has coherence, so the AI agent's actions just need to string together well enough minute-by-minute to work toward achieving a goal.

You could use real video games to do this but I guess there'd be a risk of over-fitting; maybe it would learn too precisely what a staircase looks like in Minecraft, but fail to generalize that to the staircase in your home. If they can simulate dream worlds (as well as, presumably, worlds from real photos), then they can train their agents this way.

This would only be training high-level decision policies (ie, WASD inputs). For something like a robot, lower level motor control loops would still be needed to execute those commands.

Of course you could just do your training in the real world directly, because it already has coherence and plenty of environmental variety. But the learning process involves lots of learning from failure, and that would probably be even more expensive than this expensive simulator.

Despite the claims I don't think it does much to help with AI safety. It can help avoid hilarious disasters of an AI-in-training crashing a speedboat onto the riverbank, but I don't think there's much here that helps with the deeper problems of value-alignment. This also seems like an effective way to train robo-killbots who perceive the world as a dreamlike first-person shooter.

reply
golol
22 days ago
[-]
Do you want household androids? Because this kind of stuff is on the level of research a bery large step towards that. Think as it as ab example where we can make a model understand a lot of physical common sense stuff, which is the goal for robotics right now.
reply
suddenlybananas
22 days ago
[-]
This is really not the avenue for house-hold robots. Interacting with the actual physical world is very different from creating a video game.
reply
sangnoir
22 days ago
[-]
> Interacting with the actual physical world is very different from creating a video game

The major difference being the former scales very poorly for generating training data compared to the latter. Genie 2 is not even a video game and has worse fidelity that video games, the upside is it probably scales even better than video games for generating training scenarios. If you want androids in teal life, Genie 2 (or similar systems) is how you bootstrap the agent AI. The training pipeline will be: raw video -> Genie 2 -> game engine with rules -> physical robot

reply
mosdl
21 days ago
[-]
How does turning an image into a game help with robots? Robots don't need to guess what they can't see, they would have sensors to tell them exactly what is there (like a self driving car).
reply
falcor84
21 days ago
[-]
To be able to plan ahead, robots do absolutely need to plan ahead (read: "guess" or even "imagine") what they might encounter before they sense it. In your self driving car example, for instance, it needs to come up with various scenarios for what might be around the corner ahead of a turn, and assign reasonable probabilities to these scenarios. I absolutely see how a system like this could help with it.

For example, let's say that the car is approaching an intersection, and suddenly sees a puddle on the road to the left getting brighter - a visual world model like this might extrapolate a scenario that the brightness is the result of a car moving towards the intersection assigning this some probability, and signing another probably to a scenario that it's just a flickering headlight, and the car would then decide whether and how much to slow down.

In this example there is a sensor, but it definitely doesn't tell the robot "exactly what is there", and while we could try to write rules about what it should do, the Bitter Lesson tells us it's better to just let it create its own model.

reply
Chilko
21 days ago
[-]
I have no expertise in this area, but my assumption is that this could help for a broader sort of object/world permanence for robots - e.g. if something is no longer visible to the robot's sensors (e.g. behind an obstacle, smoke, etc) then it could use a model based on this type of tech to maintain a short-term estimate of its surroundings even when operating blind.
reply
sangnoir
21 days ago
[-]
> Robots don't need to guess what they can't see, they would have sensors to tell them exactly what is there (like a self driving car).

Self driving vars have cameras as part of their sensor suite, and have models to make sense of sensor data. Video will help with perception and classification (understanding the world) with no agency needed. Game-playing will help with planning, execution, and evaluation. Both functions are necessary, and those that come after rely on earlier capabilities

reply
youoy
22 days ago
[-]
> The training pipeline will be: raw video -> Genie 2 -> game engine with rules -> physical robot

One of those arrows is not like the others

reply
sangnoir
21 days ago
[-]
The final step is an oversimplification: purpose-built simulator -> deconstructed robot on a lab workbench -> controlled space -> "real world" with constraints -> real world

Any model would have to succeed in one stage before it can proceed to the next one.

reply
adverbly
21 days ago
[-]
At the risk of sounding repetitive, one of those arrows is not like the others.
reply
sangnoir
21 days ago
[-]
...and?
reply
j_timberlake
20 days ago
[-]
No, they actually did use a genie-like model to train robots on household chores.

Page 8 of the Genie 1 paper: https://arxiv.org/abs/2402.15391

reply
JTyQZSnP3cQGa8B
22 days ago
[-]
I don't understand how that is relevant. I certainly would not want household androids unless I'm completely disabled.
reply
theshackleford
21 days ago
[-]
> I certainly would not want household androids unless I'm completely disabled.

That's nice. I'm not completely disabled, but I am disabled, and I very much would appreciate them, as my capability to do things over the longer term is very much not going to go in the direction of improving. As it is, there are a lot of things I now rely on people for, that at one time, I did not.

Whilst I recognise its probably not going to happen in a time span that is useful to me, I do wish it could, so that I could be less of a burden on those around me, and maintain a relative level of independence.

reply
throwaway2037
21 days ago
[-]
No trolling: You wouldn't want robots to mow your lawn, maybe do the boring bits of cooking (prep! endless stirring!), clean your house, wash your clothes? Man, sign me up!
reply
furyofantares
21 days ago
[-]
A lot of people are disabled right now!

Unless you have a young/quick death, there's a really good chance you will be, too.

reply
Menu_Overview
22 days ago
[-]
Well, what's next? Beyond prototyping, I imagine this is an early step towards more practical agents building their own world model. Better problem solving.

Prompt: Here's a blueprint of my new house and a photo of my existing furniture. Show me some interior design options.

reply
awfulneutral
22 days ago
[-]
Well, in the future you could imagine that instead of programming a game, you can just generate each individual frame on the fly at 60fps. You could be playing 2D Mario and then the game could have him morph into 3D and take off into space or something. You could also generate any software or OS frontend on the fly really, if you can make it so the AI can keep track of your data and make it consistent enough to be usable. Does this have positive or negative value? I don't know.
reply
rlupi
21 days ago
[-]
We, humans, use dreams for consolidating memory and information recall, process emotions and rehearse feelings in different imagined contexts, mental housekeeping / pruning away partial or unnecessary information, replay recent events to review and analyze, etc.

Dreaming and sleeping is incredibly expensive, we spend 33% of our "availability" on average asleep.

This kind of work is a step toward building similar tools for general AI agents (IMHO).

reply
araes
20 days ago
[-]
Ruining the video game industry. The implication the technology is plausible takes away all further interest in having anything except "end game content."

All motivation to make further games is removed because now "somebody" can spool out a 3D adventure game instantly with a line of text. It implies you'll waste a year of your time, and just before release, out pops the dramatically better AI product to steal away all further business and reduce all time you've spent to meaningless. Everybody then waits indefinitely for the "endgame gear." https://xkcd.com/989/

Exactly like LLMs and image generators almost completely took away all business for normal writers and normal painters, because now all managers want is the AI. Now there's endless articles about how "somebody" prefers "AI" for every task. Now the market won't invest in anything unless it has "AI" in the name. Now people idiotically add "AI" to everything just to have the investment.

reply
3abiton
21 days ago
[-]
This is an incredible start. The potential is immense, yes there arekinks, but in 10 years?
reply
ThouYS
22 days ago
[-]
same q here. what can I do with this "world model" that I can't do with a game like minecraft or counter strike?

asked the same thing a while back, and the answers boiled down to "somehow helps RL agents train". but how exactly? no clue

reply
ogogmad
22 days ago
[-]
Making a computer game is very expensive and time-consuming. This technology might allow a 12 year old to produce a fully working AAA-quality game on their own for almost nothing. But sigh it's an early demo that needs some improving.

[edited out some barbs I wrote because I find some comments on this website REALLY annoying]

reply
ThouYS
22 days ago
[-]
lol
reply
xandrius
21 days ago
[-]
Nothing is of value until it is.
reply
aithrowawaycomm
22 days ago
[-]
It is jaw-dropping and dismaying how for-profit AI companies use long-standing terms like "world model" and "physics" when they mean "video game model" and "video game physics." Or, as you can plainly see, "models gravity" when they mean "models Red Dead Redemption 2's gravity function, along with its cinematic lighting effects and Rockstar's distinctively weighty animations." Which is to say Google is not modeling gravity at all.

I will add the totally inconsistent backgrounds in the "prototyping" example suggests the AI is simply cribbing from four different games with a flying avatar, which makes it kind of useless unless you're prototyping cynical AI slop. And what are we even doing here by calling this a "world model" if the details of the world can change on a whim? In my world model I can imagine a small dragon flying through my friend's living room without needing to turn her electric lights into sconces and fireplaces.

To state the obvious: if you train your model on thousands of hours of video games, you're also gonna get a bunch of stuff like "leaves are flat and don't bend" or "sometimes humans look like plastic" or "sometimes dragons clip through the scenery," which wouldn't fly in an actual world model. Just call it "video game world model!" Google is intentionally misusing a term which (although mysterious) has real meaning in cognitive science.

I am sure Genie 2 took an awful lot of work and technical expertise. But this advertisement isn't just unscientific, it's an assault on language itself.

reply
empath75
22 days ago
[-]
> It is jaw-dropping and dismaying how for-profit AI companies use long-standing terms like "world model" and "physics" when they mean "video game model" and "video game physics." Or, as you can plainly see, "models gravity" when they mean "models Red Dead Redemption 2's gravity function, along with its cinematic lighting effects and Rockstar's distinctively weighty animations." Which is to say Google is not modeling gravity at all.

That's because it's using video game data for training footage because it's cheap and easy to generate. It would not be simulating video game gravity if it was training on real world video inputs.

reply
spyder
21 days ago
[-]
A simulated world is also a world, and I can easily imagine if it would have been trained on real world data than it would have learner some of the real world's physics the same way, as the big video gen models already showing some of that. But all these models still seem very sample inefficient, they need lot of data to learn some basic rules of the world(s), and even then they are far from and human-like model that's includes math and logic for model the world more accurately...
reply
ricardobeat
22 days ago
[-]
Remembering off-screen objects, generating spatially consistent features, modeling physical interactions and lights, understanding what "up the stairs" means, all seem to warrant talking about a world model, because that's exactly what's required to do these things compared to simply hallucinating video sequences.
reply
brap
21 days ago
[-]
I agree, but

>if you train your model on thousands of hours of video games

What if you train the same model on thousands of hours of sensor data from real, physical robots?

reply
seydor
21 days ago
[-]
Models are phenomenological descriptions of reality, and so are video games
reply
Const-me
22 days ago
[-]
The scrolling doesn’t work in my MS Edge so I opened the page in Firefox. Firefox has “Open Video in New Tab” context menu command. When viewed that way, the videos are not that impressive. Horrible visual quality, Egyptian pyramids of random shapes which cast round shadows, etc.

I have a feeling many AI researchers are trying to fix things which are not broken.

Game engines are not broken, no reasonable amount of AI TFlops going to approach a professional with UE5. DAWs are not broken, no reasonable amount of AI TFlops going to approach a professional with Steinberg Cubase and Apple Logic.

I wonder why so many AI researchers are trying to generate the complete output with their models, as opposed to training model to generate some intermediate representation and/or realtime commands for industry-standard software?

reply
bix6
22 days ago
[-]
Genuine question: What is the point of telling us about this if we can’t use it? Is it just to flex on everyone?
reply
ChrisArchitect
22 days ago
[-]
The best minds of a generation went from thinking about how to make people click ads to how to generate 3d video game worlds.
reply
Workaccount2
22 days ago
[-]
The best minds of the generation are on wall street trying to figure out how to quickly spot inefficiently priced options 1% more often.

Seriously, I wish more than anything I was kidding.

reply
adventured
22 days ago
[-]
The best minds were never working on getting people to click on ads. That was an internal industry narrative so people could feel better about themselves.
reply
fragmede
22 days ago
[-]
seems more like an external narrative so people can feel worse about the world
reply
echelon
22 days ago
[-]
To stop competing startups from getting funding.

Decart (Oasis) raised $25 million at $500 million valuation.

World Labs raised $230 million.

reply
UncleOxidant
21 days ago
[-]
Not sure about that. Sometimes Google legitimates a field. I was in a kite power startup company back in 2019. Before Google canceled it's Makani kite power project VCs and angels would at least talk to us - it gave them some frame of reference: "Oh, this is like the kite power thing Goggle is doing?" "Right, but on a much smaller scale". After they canceled Makani in the summer of 2019 it was crickets. We folded by the end of 2019. They figured if Google couldn't make it work then it probably wasn't something to invest in.
reply
mhld
22 days ago
[-]
Some kind of marketing strategy that actually nobody understands
reply
jazzyjackson
22 days ago
[-]
It's not that opaque, it's recruitment. Basically same marketing as a univeristy. "We do state of the art research here. If you are a talented researcher who wants to advance the field, you'll want to work here"

Now, how Google plans to make money with all this bleeding edge research, that's the mystery.

reply
xnx
22 days ago
[-]
Often to establish that the authors were first in the space for when competitors announce their tech.
reply
ilaksh
22 days ago
[-]
They were not though, this is very similar to the one that came out last month. https://arxiv.org/html/2411.00769v1
reply
justlikereddit
22 days ago
[-]
[flagged]
reply
throwaway2037
21 days ago
[-]
Why is this downvoted and flagged? I am laughing so hard at the second sentence, that I am on the verge of tears. Nothing has made me laugh so hard in a while. This part really did it for me:

    > get absorbed into some hype mill startup
Wait... isn't that basically YC?
reply
tootie
22 days ago
[-]
It's PR but it's also meant to entice. Let the world know Google is #1 for Gen AI, convince researchers to join Google, convince investors to boost the stock price, make Elon Musk grit his teeth. That kind of thing. In the short term, it may provide a bump in interest for existing AI products from Google.
reply
spencerchubb
21 days ago
[-]
Researchers want to publish

Recruiting

reply
mupuff1234
22 days ago
[-]
An artifact for their promotion packet.
reply
binalpatel
22 days ago
[-]
This is super impressive.

Interesting they're framing this more from the world model/agent environment angle, when this seems like the best example so far of generative games.

720p realtime mostly consistent games for a minute is amazing, considering stable diffusion was originally released 2ish years ago.

reply
uoaei
22 days ago
[-]
Pixelspace is an awful place to be generating 3D assets and maintaining physical self-consistency.
reply
jeroenvlek
22 days ago
[-]
Ultimately even conventional 3d assets are rendered into pixelspace. It all comes down to the constraints in the model itself.
reply
psb217
22 days ago
[-]
A key strength of conventional 3d assets is that their form is independent of the scenes in which they will be rendered. Models that work purely in pixel space avoid the constraints imposed by representing assets in a fixed format, but they have to do substantial extra work to even approximate the consistency and recomposability of conventional 3d assets. It's unclear whether current approaches to building and training purely pixel-based models will be able to achieve a practically useful balance between their greater flexibility and higher costs. World Labs, for example, seems to be betting that an intermediate point of generating worlds in a flexible but structured format (NERFs, gauss splats, etc) may produce practical value more quickly than going straight for full freedom and working in pixel space.
reply
devonsolomon
22 days ago
[-]
Yesterday I laughed with my brother about how harsh people on the internet were about World Labs launch (“you can only walk three steps, this demo sucks!”). I was thinking, “this was unthinkable a few years ago, this is incredible”.

People of the internet, you were right. Now, this is incredible.

reply
bilbo0s
21 days ago
[-]
World Labs was kind of laughable. But at least you laughed.

Now?

I mean, I don't know man?

With this Genie 2 sneak peak, it all just makes World Labs' efforts look sad. Did they really think better funded independents and majors would all not be interested in generating 3D worlds?

This is a GUBA moment. If you're old enough to know, then you know.

reply
mdrzn
22 days ago
[-]
Wow.. I can't even imagine where we'll be in 5 or 10 years from now.

Seems that it's only "consistent" up to a minute, but if the progress keeps the same rate.. just wow.

reply
netdevphoenix
22 days ago
[-]
Progress is not linear. For all we know, in 2027 things will slow down to a virtual halt for the next 30 years. Look at how much big science progressed in the first 20 years of the 19th century/20th century and look how little it has progressed in the first 20 years of this century. We are on the downlow compared to the last centuries and even if you look at crisp or deep learning, they are not as impactful NOW as let's say the germ theory of disease, evolution, the discovery of the double helix structure or general relativity was. Almost a quarter of a century gone and we don't have much to show for it.

For reference:

19th century

evolution by natural selection as science

electromagnetism

germ theory of disease

first law of thermodynamics

--------------------------------------------

20th century

general relativity

quantum mechanics

dna structure

penicillin

big bang theory

--------------------------------------------

21st century

crisp

deep learning

reply
dooglius
22 days ago
[-]
The things you list for previous centuries aren't limited to the first 20 years
reply
netdevphoenix
22 days ago
[-]
19th century: electromagnetism, the voltaic pile, the double slit experiment for the light wave theory

20th century: general/special relativity, radioactive decay, discovery of the electron

21st century: crisp and deep learning

Hard to argue that the big science of the first 20 years of the previous century looks way more impact than crisp and deep learning put TOGETHER.

reply
dekhn
22 days ago
[-]
its called crispr, not crisp.
reply
samvher
22 days ago
[-]
100 years later, sure. What about in December 1924?
reply
w10-1
22 days ago
[-]
crispr variants have not particularly improved treatments.

But DNA sequencing and biologics have revolutionized medicine and changed lives.

Also, the computer as phone took it from 100M's mostly business users buying optical disks to 3+B everyday people getting regular system updates and apps on demand accessing real-time information. That change alone far outweighs the impact of anything produced by advanced physics.

As a result we, as developers, now have the power to deliver both messages and experiences to the entire world.

Ideas are cheap, and progress is virtually guaranteed in intellectual history. But execution is exquisitely easy to get wrong. Genie 2 is just Google's first bite at this apple, and milestones and feedback are key to getting something as general as AI right. Fingers crossed!

reply
Workaccount2
22 days ago
[-]
>Look how little it has progressed in the first 20 years of this century

This is naivete on the scale of "Cars were much safer 70 years ago".

reply
netdevphoenix
21 days ago
[-]
Can you please elaborate further? My point is that truly world shattering groundbreaking scientific progress has slowed down significantly this century compared to the previous ones (by comparing just the first 20 years of each century)
reply
beeflet
21 days ago
[-]
These game-video models remind me of the dream-like "Mind Game" game described in Ender's Game, because of how it has to spontaneously come up with a new environment to address player input. The game in that book is also described as an AI.
reply
mdtrooper
21 days ago
[-]
Yeah, I think the same it.
reply
bearjaws
22 days ago
[-]
> Genie 2 is capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again.

This is huge, the Minecraft demos we saw recently we're just toys because you couldn't actually do anything in them.

reply
psb217
22 days ago
[-]
It's worth keeping in mind that "there exists X such that Y is true" is not the same as "Y is true for all X". People love using these sorts of statements since they're technically true as written, but most people will read them in a way that's false. Eg, the statement is true for the Minecraft demos, and for any model which doesn't exhibit literally zero persistence for (temporarily) non-visible state.
reply
notsylver
22 days ago
[-]
I doubt it, but it would be interesting if they recorded Stadia sessions and trained on that data (... somehow removing the hud?), seems like it would be the easiest way for them to get the data for this.
reply
blixt
22 days ago
[-]
Seems somewhat likely to me. They probably even trained a model to do both frame generation and upscaling to allow the hardware to work more efficiently while being able to predict the future based on user input (to reduce perceived latency). Seems like Genie is just that but extrapolated much further.
reply
rndmize
22 days ago
[-]
These clips feels like watching someone dream in real time. Particularly the door ones, where the environment changes in wild fashion, or the middle NPC one, where you see a character walk into shadow and mostly disappear and a different character walks out.
reply
jdlyga
22 days ago
[-]
It's very cool, but we've gotten too many of these big bold announcements with no payoff. All it takes is a very limited demo and we'd be much happier.
reply
rishabhparikh
22 days ago
[-]
I'm guessing it would be far too expensive to make a free demo
reply
ddtaylor
22 days ago
[-]
This is very impressive technology and I am active in this space. Very active. I make an (unreleased) Steam game that helps users create their own games from not knowing how to program. I also (unknowingly) co-authored tools that K12 and university are using to teach game programming.

For the time being I will gloss over the fact this might just be a consumer facing product for Google that ends up having nothing to do with younger developers.

I'm torn between two ideas:

a. Show kids awesome stuff that motivates them to code

b. Show kids how to code something that might not be as awesome, but they actually made it

On the one hand you want to show kids something cool and get them motivated. What Google is doing here is certainly capable of doing that.

On the other hand I want to show kids what they can actually do and empower them. The days of making a game on your own in your basement are mostly dead, but I don't think that means the idea of being someone who can control a large amount of your vision - both technical and non-technical - is important.

Not everyone is the same either. I have met kids that would never spend a few hours to learn some Python with pygame to get a few rectangles and sprites on screen that might get more interested if they saw something this flashy. But experience also tells me those kids are extremely less likely to get much value from a tool like this beyond entertainment.

I have a 14 year old son myself and I struggle to understand how he sees the world in this capacity sometimes. I don't understand what he thinks is easy or hard and it warps his expectations drastically. I come from a time period where you would grind for hours at a terminal pecking in garbage from a magazine to see a few seconds of crappy graphics. I don't think there should be meaningless labor attached to programming for no reason, but I also think that creating a "cost" to some degree may have helped us. Given two programs to peck into the terminal, which one do you peck? Very few of us had the patience (and lack of sanity) to peck them all.

reply
taneq
21 days ago
[-]
I don't see any mention of DIAMOND (https://diamond-wm.github.io/) which does something pretty similar, training a model to predict a game or otherwise 3D world based on videos of gameplay plus corresponding user inputs.

It's fascinating how much understanding of the world is being extracted and learned by these models in order to do this. (For the 'that's not really understanding' crowd, what definition of 'understanding' are you using?)

reply
qwertox
22 days ago
[-]
This is... something different. It will be interesting to see how we will integrate our current 3D tooling into that prompt-based world. Sometimes a "place a button next to the the door" isn't the same as selecting a button and then clicking on the place next to the door, as it is today, or to sculpt a terrain with a brush, all heavily 3D oriented operations, involving transformation matrix calculations, while that promt-based world is build through words.

The current tooling we have is just way too good to just discard it, think of Maya, Blender and the like. How will these interfaces, with the tools they already provide, enable sculpting these word-based worlds?

I wonder if some kind of translator will be required, one which precisely instructs "User holds a brush pointing 33° upwards and 56° to the left of the world's x-axis with a brush consisting of ... applied with a strength of ...", or how this will be translated into embeddings or whatever that will be required to communicate with that engine.

This is probably the most exciting time for the CG industry in decades, and this means a lot, since we've been seeing incredible progress in every area of traditional CG generation. Also a scary time for those who learned the skills and will now occasionally see some random persons doing incredible visuals with zero knowledge of the entire CG pipeline.

reply
enbugger
22 days ago
[-]
Just like with the images, this will never be at good shape to actually use it for real product as it discards details completely leaving generic 3rd person controller animation.

What this should say to you instead is that stuff is really bad on training data side if you start scraping billions of game streams on internet - hard to imagine if there is a bigger chunk of training data than this. Stagnation incoming.

reply
amaurose
21 days ago
[-]
I am wondering if this sort of thing could be used in the real world, in particular, as navigation helper for a blind pedestrian. Products like Orcam have shown a cam + headphones can more or less easily be packed onto some glasses (for OCR). Navigation helper tools exist since the 80s, but all they basically did until now is scan the environment in a primitive way and use some sort of vibration to alert the user. This is very unspecific, and mostly useless in real life. However, having a vision AI that looks down the path of a blind person could potentially revolutionize this sort of application. For obstacle detection and navigation help. From "Careful, construction site on the sidewalk, 20 meters ahead" to "tactile paving 1 meter to your left". Lets take the game to the streets! If the tech is there, that sounds like a good startup idea...
reply
brap
21 days ago
[-]
While this is very (very) cool, what is the upside to having a model render everything at runtime, vs. having it render the 3D assets during development (or even JIT), and then rendering it as just another game? I can think of many reasons why the latter is preferable.
reply
gavmor
21 days ago
[-]
To me, keeping a world state in sync with rapidly changing external state is the most compelling application. Something like dockercraft: https://github.com/docker/dockercraft
reply
asdaqopqkq
21 days ago
[-]
First think that comes to mind is what about multiplayer?

Can we let another models generate in this models's world and vice versa?

What if both output in a single instance of a world? What if both output in their own private world and only share data about location and some other metrics?

reply
artninja1988
22 days ago
[-]
Looking at the list of authors, is this from their open endedness team? I found their position paper on it super convincing https://arxiv.org/abs/2406.02061
reply
warkdarrior
22 days ago
[-]
Did you link the wrong Arxiv paper? https://arxiv.org/abs/2406.02061 does not look like a position paper nor does it share any authors with this Genie 2 work.
reply
artninja1988
22 days ago
[-]
Yes, I meant this paper https://arxiv.org/abs/2406.04268 Should have double checked, sorry and thank you for pointing it out
reply
Stevvo
22 days ago
[-]
You can see artifacts common in screen-space reflections in the videos. I suspect they are not due to the model rendering reflections based on screen-space information, but the model being trained on games that render reflections in such a manner.
reply
m3kw9
22 days ago
[-]
“ Generating unlimited diverse training environments for future general agents” it may seem unlimited but up to a limited point there will be a pattern. I don’t buy that an AI can use a static model and train itself with data generated from it
reply
corysama
22 days ago
[-]
For quite a while now David Holz of Midjourney has mused that videogames will be AI generated. Like a theoretical PlayStation 7 with an AI processor replacing the GPU.

But, I didn’t expect this much progress towards that quite this fast…

reply
kypro
22 days ago
[-]
Agreed. All I'd say is that these demos look quite limited in their creativity and depth. Good video games are far more than some graphics with a movable character and action states.

A good video game is far more the world building, the story, the creativity or "uniqueness" of the experience, etc.

Currently this seems to generate fairly generic looking and shallow experiences. Not hating though. It's early days obviously.

reply
gcr
22 days ago
[-]
I've had the idea for a Backrooms-style hallucinatory generative videogame for a while. Imagine being able to wander through infinitely generated surreal indoor buildingscapes that were rendered in close-to-realtime.

It would play to the medium's strengths -- any "glitches" the player experiences could be seen as diagetic corruptions of reality.

The moment we get parameterized NeRF models running in close-to-realtime, I want to go for it.

reply
doctorpangloss
22 days ago
[-]
If only it were that simple. Google spent $10b developing Stadia, where was the big hit game from that?

These DeepMind guys play Factorio, they don't play Atari games or shooters, so why aren't they thinking about that? Or maybe they are, and because they know a lot about Factorio, they see how hard it is to make?

There's a lot of "musing" as you say.

reply
fowlie
21 days ago
[-]
One cool use case for this could be "generative hybrid video meetings"; when I participate in a teams meeting and the majority is in the same physical room, the video conference software could read the wall camera video feed and generate individual video streams of each person as if they sat just in front of me.

Of all things this must be the most boring use case for this crazy looking new technology. But hybrid video meetings have always annoyed me and I think to myself that surely there must be a better way (and why hasn't it arrived yet?).

reply
lacoolj
21 days ago
[-]
OpenAI launches Sora (quite a while ago now), Google needs to fire back with something else groundbreaking.

I love the advancement of the tech but this still looks very young and I'd be curious what the underlying output code looks like (how well it's formatted, documented, organized, optimized, etc.)

Also, this seems oddly related to the recent post from WorldLabs https://www.worldlabs.ai/blog. Wonder if this was timed to compete directly and overtake the related news cycle.

reply
whiplash451
21 days ago
[-]
I also find the timing vs World Labs demo disturbing.
reply
alphabetting
21 days ago
[-]
What's disturbing? In all likelihood the close timing was world labs rushing to get their demo out the door knowing this was coming because they wouldn't get nearly the hype they did if this came before.
reply
smusamashah
22 days ago
[-]
Its so much like my lucid dreams where world sometimes stays consistent for a while when I take its control. It's a strange feeling seeing computer hallucinating a world just like I hallucinate a world in dreams.

This also means that my dreams will keep looking like this iteration of Genie 2, but computer will scale up and the worlds won't look anything like my dreams anymore in next versions (its already more colorful anyway).

I remember image generation use to look like dreams too in the beginning. Now it doesn't look anything like that.

reply
MrTrvp
22 days ago
[-]
Soon enough I imagine we'll have dream state to cohesive reality models. Our desires and world events can be dissected and analyzed by fine grain and hint authorities to your intent before you know what they mean to you /s.
reply
andelink
21 days ago
[-]
Is this type of on-the-fly graphics generation more expensive than purely text based LLMs? What is the inference energy impact of these types of models?
reply
jckahn
22 days ago
[-]
At first I was excited to see a new model, but then I saw no indication that the model is open source so I closed the page.
reply
dartos
21 days ago
[-]
> Genie 2 can generate consistent worlds for up to a minute, with the majority of examples shown lasting 10-20s.
reply
josvdwest
17 days ago
[-]
I understand the value of infinite NPC dialogues and story arcs, but why do we need live scene generation? Don't we already get that with procedural generation?
reply
sergiotapia
22 days ago
[-]
Will the GPU go the way of the soundcard, and we will all purchase an "LPU"? Language Processing Unit for AIs to run fast?

I remember there was a brief window where some gamers bought a Physx card for high fidelity physics in games. Ultimately they rolled that tech in to the CPUs themselves right?

reply
0x1ceb00da
22 days ago
[-]
The graphics stuff in modern gpus is just a software layer on top of a generic processing unit. The name is a misnomer.
reply
jsheard
22 days ago
[-]
Partially true, a significant chunk of modern GPUs are really just very wide general purpose processors, but they do still have fixed-function silicon specifically for graphics and probably will for the foreseeable future. Intel tried to lean into doing as much as possible in general purpose compute with their Larrabee GPU project but even that still had fixed-function texture units... and the concept was ultimately a failure which hasn't been revisited.
reply
CaptainFever
22 days ago
[-]
As a game developer, I'm impressed and thinking of ideas of what to do with this kind of tech. The sailboat example was my favourite.

Depending on how controllable the tech ends up being, I suppose. Could be anywhere from a gimmick (which is still nice) to a game engine replacement.

reply
echelon
22 days ago
[-]
You could compress down a game to run on cheap hardware acceleration. No more Unreal Engine with crazy requirements. Once the hallucinations are fixed, you even get better lighting.

This is the Unreal Engine killer. Give it five years.

reply
noch
22 days ago
[-]
> This is the Unreal Engine killer. Give it five years.

We need to calm down with the clickbait-addled thinking that "this new thing kills this established powerful tested useful thing." :-)

Game developers have been discussing these tools at length, after all, they are the group of software developers who are most motivated to improve their workflow. No other group of software developers comes close to gamedevs' efficiency requirements.

The 1 thing required for serious developers is control. As such, game engines like Unreal and in-house engines won't die.

Generative tools will instead open up a whole new, but quite different, way of creating interactive media and games. Those who need maximum control over every frame and every millisecond and CPU cyle will still use engines. The rest who don't will be productive with generative tools.

reply
echelon
22 days ago
[-]
> gamedevs' efficiency requirements

These models won't need you to retopo meshes, write custom shaders, or optimize Nanite or Lumen gameplay. They'll generate the final frames, sans traditional graphics processing pipeline.

> The 1 thing required for serious developers is control

Same with video and image models, and there's tremendous work being done there as we speak.

These models will eventually be trained to learn all of human posture and animation. And all other kinds of physics as well. Just give it time.

> Those who need maximum control over every frame and every millisecond and CPU cyle will still use engines.

Why do you think that's true? These techniques can already mimic the physics of optics better than 80 years of doing it with math. And they're doing anatomy, fluid dynamics, and much more. With far better accuracy than game engines.

These will get faster and they will get controllable.

reply
noch
22 days ago
[-]
> Why do you think that's true? > These will get faster and they will get controllable.

Brother, you're preaching to the choir. I've been shilling generative tools for gamedev far harder than you are in your reply. :-)

But I'm just relaying to you what actual gamedevs working and writing code right now need and for the foreseeable future for which projects have been started or planned. As Mike Acton says, "the problem is the problem".

> These techniques can already mimic the physics of optics better than 80 years of doing it with math.

I encourage you to talk to actual gamedevs. When designing a game, you aren't trying to mimic physics: you're trying to make a simulation of physics that feels a certain way that you want it to play. This applies to fluid dynamics, lighting/optics, everything.

For example, if I'm making a saling simulator, I need to be able to script the water at points where it matters for gameplay and game-feel, not simulate real physics. I'm willing to break the rules of physics so that my water doesn't act or look like real water but feels good to play.

Movement may be motion captured, but animation is tweaked so that the characters control and play in a way that the game designer feels is correct for his game.

If you haven't designed a game, I encourage you to try to make a simple space invaders clone over the weekend, then think about the physics in it and try to make it feel good or work in an interesting way. Even in something that rudimentary, you'll notice that your simulation is something you test and tweak until you arrive at parameters that you're happy with but that aren't real physics.

reply
echelon
21 days ago
[-]
I've written my own 2D and 3D game engines as well as worked in Unreal. I'm currently working on a controllable diffusion engine using Bevy.

I strongly disagree that you need to cater to existing workflows. There's so much fertile ground in taking a departure. Just look at what's happening with animation and video. People won't be shooting on Arri Alexas and $300,000 glass for much longer.

reply
noch
21 days ago
[-]
> I strongly disagree that you need to cater to existing workflows.

I didn't say that these tools need to though. :-)

I said that actual high-end game developers need precise control over every aspect of their game. A developer needs to be able to say something as simple as: "I want to make my particle system to run at 30ps, while my cloth animation is 120fps, while my logic is at 60fps."

> I've written my own 2D and 3D game engines as well as worked in Unreal. I'm currently working on a controllable diffusion engine using Bevy.

Then you know all that I'm suggesting already! You probably have a list of the typical problems that game engine programmers are trying to solve when they build their own engines or have to modify Unreal Engine itself. You could even just watch GDC[^0] or the Graphics Programming Conference[^1] and ask how these tools solve the problems discussed.

Generative tools will create a new way of making games or game assets, but they won't eliminate the current way of making games.

Since you're building these generative tools alongside your game, you can demonstrate how they solve the kinds of problems game engine programmers need to solve and there's no need for us to misrepresent either side of the equation. Just give a presentation or publish an essay showing engine problems being solved at the standard a typical studio needs.

[^0]: https://youtube.com/@gdconf?si=F_n4G4zxQSny8BNC

[^1]: https://youtube.com/playlist?list=PLLaly9x9rqjsXLW1tMFruyh_6...

reply
KaoruAoiShiho
22 days ago
[-]
This is where the GPU limits on China really hurts, Chinese companies have been dropping great proof of concepts but because they have been so compute bottlenecked they can't ever really make something actually competitive or transformative.
reply
jerpint
21 days ago
[-]
I have a sneaking suspicion OpenAI will announce something very similar in a few days
reply
r721
21 days ago
[-]
reply
xcodevn
22 days ago
[-]
On a very similar theme, here is the work from World Lab (founded by Fei-Fei Li, ImageNet dataset, et al.) about creating 3D worlds:

https://www.worldlabs.ai/blog

reply
momojo
21 days ago
[-]
I find this work much more exciting. They're not just teaching a model to hallucinate given WASD input. They're generating durable, persistent point clouds. It looks so similar to Genie2 yet they're worlds apart.
reply
ata_aman
22 days ago
[-]
We're about to have on-demand video content and games simply based on prompts. My prediction is we'll have "prompt marketplaces" where you can gen content based on 3rd party prompts (or your own). 3-5 years.
reply
rvz
22 days ago
[-]
Hmmm.... But we were told on HN that "Google is dying" remember? in reality, is it isn't.

We'll see which so-called AI-companies are really "dying" when either a correction, market crash or a new AI winter arrives.

reply
tsunamifury
22 days ago
[-]
I'm guessing from the demo sophisticated indoor architectures do not work yet.
reply
worldmerge
22 days ago
[-]
This looks really cool. How can I use it? Like can I mix it with Unity/Unreal?
reply
k2xl
22 days ago
[-]
This is impressive, but why are they all looking still like a video game? Could they have this render movie scenes with realistic looking humans? I wonder if it is due to the training set they use being mostly video games?
reply
nonameiguess
22 days ago
[-]
I highly doubt it. While there is no ceiling in principle on how good rendering can get, even with perfect knowledge of the physics of optics, the cost to compute that physics is too high not to cut some corners. Nature gives you this for free. Every photon is deflected at exactly the right angle and frequency without anything needing to be computed. All you need is a camera to record it. At least for now, this is why every deep fake, digital de-aging, AI upscaling, grafting Carrie Fisher's face onto a different actor, and CGI in general inevitably occupies the uncanny valley.
reply
xnx
22 days ago
[-]
> This is impressive, but why are they all looking still like a video game?

Many of the current AI models have their roots in games: Chess, Go, etc.

reply
wg0
21 days ago
[-]
Google is not coming slow... This is magic. As a casual gamer and someone wanting to make my own game, this is black magic.

Lighting, gravity, character animation and what not internalized by the model... from a single image...!

reply
empiricus
22 days ago
[-]
Feed it the inputs from the real world and then it will recreate in its mind a mirror of the world. Some say this is what we do also, we live in a virtual reality created by our minds.
reply
rationalfaith
22 days ago
[-]
As impressive as this might seem let's think about fundamentals.

Statistical models will output a compressed mishmash of what they were trained on.

No matter how hard they try to cover that inherent basic reality, it is still there.

Not to mention the upkeep of training on new "creative" material on a regular basis and the never ending bugs due to non-determinism. Aside from contrived cases for looking up and synthesizing information (Search Engine 2.0).

The Tech Industry is over investing in this area exposing an inherent bias towards output rather than solving actual problems for humanity.

reply
zja
22 days ago
[-]
I love the outtakes section in the bottom. It made me laugh but it also feels more transparent than a lot of GenAI stuff that’s being announced.
reply
42lux
22 days ago
[-]
I don’t know I get the excitement but as soon as you turn around and there is something completely different behind you it breaks the immersion.
reply
mrbungie
21 days ago
[-]
Google doing the "look how we can do this but you can't and you won't with our help" with more force than ever.
reply
ganzuul
21 days ago
[-]
We are repeating the COVID virus scare but this time with software. Most people don't know the difference so this is respectful.
reply
aussieguy1234
21 days ago
[-]
If it can play video games that simulate the laws of physics, could it control a robot in the physical world?
reply
baalimago
21 days ago
[-]
To me, this is a bit like web3: Can't we already do this? What's the benefit?
reply
woctordho
21 days ago
[-]
We can already program Minecraft. Also we can already program GTA6. But imagine interpolating Minecraft and GTA6 such that all buildings are destroyable. It may be easier to achieve using AI rather than traditional programming
reply
swyx
21 days ago
[-]
i was wondering when genie 1 was and... it didtn seem to get much love? https://news.ycombinator.com/item?id=39509937 @dang was there a main thread here?
reply
stoicjumbotron
22 days ago
[-]
Do people within Google get to try it? If yes, how long is the approval process?
reply
andsoitis
21 days ago
[-]
Will the agents in these worlds realize the worlds were sparked by humans?
reply
ganzuul
21 days ago
[-]
They have nowhere to go if they do so no. Realization is transcendental.
reply
david_shi
21 days ago
[-]
"On the back part of the step, toward the right, I saw a small iridescent sphere of almost unbearable brilliance. At first I thought it was revolving; then I realised that this movement was an illusion created by the dizzying world it bounded. The Aleph's diameter was probably little more than an inch, but all space was there, actual and undiminished. Each thing (a mirror's face, let us say) was infinite things, since I distinctly saw it from every angle of the universe. I saw the teeming sea; I saw daybreak and nightfall; I saw the multitudes of America; I saw a silvery cobweb in the center of a black pyramid; I saw a splintered labyrinth (it was London); I saw, close up, unending eyes watching themselves in me as in a mirror; I saw all the mirrors on earth and none of them reflected me; I saw in a backyard of Soler Street the same tiles that thirty years before I'd seen in the entrance of a house in Fray Bentos; I saw bunches of grapes, snow, tobacco, lodes of metal, steam; I saw convex equatorial deserts and each one of their grains of sand..."
reply
ingen0s
21 days ago
[-]
So when is Google Glass coming back to spawn this for my pleasure?
reply
infinite-hugs
21 days ago
[-]
Do you want the matrix because this is how you get the matrix
reply
kouru225
21 days ago
[-]
Considering the new American Vice President publicly stated he was primarily politically influenced by a guy who wants “a humane alternative genocide” using virtual reality… yea that’s what they want
reply
me551ah
22 days ago
[-]
So when I can try this?
reply
ilaksh
22 days ago
[-]
It's Google so I assume never. No model release, no product, no API, no detailed paper.

There was another quite similar model from a different group within the last month or so. I can't remember if they released any weights or anything or the name of it. But it was the same concept.

reply
vessenes
22 days ago
[-]
You'll need to wait until Baidu or AliBaba or Nvidia publish a competing model, unfortunately, if history is any guide.
reply
mhld
22 days ago
[-]
Probably when Genie 10 will get integrated on a Pixel phone.
reply
anthonymax
22 days ago
[-]
Wow, is this artificial intelligence creating this already?
reply
lionkor
22 days ago
[-]
> deepmind.google uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more.

Yippee finally google posts a non confirming cookie popup with no way to reject the ad cookies!

reply
diimdeep
22 days ago
[-]
what for world models be equivalent of ChatGPT for LLM to really blow up in utility?
reply
singularity2001
21 days ago
[-]
text to roblox maybe?
reply
maxglute
22 days ago
[-]
2000s graphics vibes.
reply
rougka
22 days ago
[-]
Waiting for OpenAI to take this concept and make it into a product
reply
robblbobbl
21 days ago
[-]
Release please
reply
bbstats
22 days ago
[-]
who is asking for this?
reply
De_333
21 days ago
[-]
looks amazing!
reply
xavirodriguez
22 days ago
[-]
uoou
reply
dangoodmanUT
21 days ago
[-]
this page loads like shit
reply
wildermuthn
22 days ago
[-]
The technology is incredible, but the path to AGI isn't single-player. Qualia is the missing dataset required for AGI. See attention-schema theory for how social pressures lead to qualia-driven minds capable of true intelligence.
reply
moralestapia
22 days ago
[-]
Not even a month ago HN was discussing Ben Affleck's take on actors and AI, somehow taking a side with him and arguing how the tech "it's just not there, etc...".

I'll keep my stance, give it two years and very realistic movies, with plot and everything, will be generated on demand.

reply
tartoran
22 days ago
[-]
Ai can't generate images without awkward hallucinations yet. From that to movies that make sense to movies that people would want to watch (comparable to feature films) beyond the initial curiosity factor is a long way, if there is one.
reply
moralestapia
22 days ago
[-]
ChatGPT (no Sora, no World Generation, etc...) was released two years ago almost to the date.

What you're talking about is a minor jump from the SOTA, much smaller than what we've already see in these two years.

reply
Sateeshm
21 days ago
[-]
I'll take that bet
reply
moralestapia
21 days ago
[-]
Email on profile!

I'll match any 5-figure amount you propose. I also know an escrow service we can trust.

reply
moralestapia
19 days ago
[-]
Day two of the two year bet:

https://x.com/mrjonfinger/status/1865161230706520472

Let's do this, Shasseem.

reply
moralestapia
16 days ago
[-]
Day four of the two year bet:

https://x.com/MKBHD/status/1866152437838393797

Please, please, please take that bet my "South Asian" friend.

reply
tigerlily
22 days ago
[-]
I can.. see this being used to solve crime, even solving unsolved mysteries and cold cases, among other alternative applications.
reply
phtrivier
22 days ago
[-]
I don't understand your line of reasoning here. Are you picturing a situation where you would take a photo of a crime scene, and "jump" into a virtual model created from the photo, to help generate intuitions about where to go look for clues ? Kinda like the CSI "enhance quality" meme, but on steroids ?

That would be fun to use, but ultimately pointless. An AI model will generate things that are _statistically plausible_ ; solving crimes usually requires finding out the _truth_.

reply
tigerlily
22 days ago
[-]
You nailed it, and yes I was being lamely ironic. I am however terrified of a future where this type of thing happens, and people just go along with it instead of stating the obvious facts the way you just did.
reply
phtrivier
20 days ago
[-]
It's easy to be on wrong side of the malfunction, and it would be obvious enough that people would do something about it.

Again, the robocop glitch scene - in real life, Kinney's family would have suied, I guess ?

reply
mosdl
22 days ago
[-]
Remake Blade Runner but with the twist that the snake scale was never actually there.
reply
YeGoblynQueenne
22 days ago
[-]
Hey, DeepMind folks, are you listening? Listen. We believe you: you can conquer any virtual world you put your mind to. Minecraft, Starcraft, Warcraft (?), Atari, anything. You can do it! With the power of RL and Neural Nets. Well done.

What you haven't been able to do so far, after many years of trying, is to go from the virtual, to the real. Go from Arcanoid to a robot that can play, I dunno, squash, without dying. A robot that can navigate an arbitrary physical location without drowning, or falling off a cliff, or getting run over by a bus. Or build any Lego kit from instructions. Where's all that?

You've conquered games. Bravo! Now where's the real world autonomy?

reply
sdenton4
21 days ago
[-]
reply
YeGoblynQueenne
21 days ago
[-]
Tech demo, doesn't generalise.
reply
sdenton4
21 days ago
[-]
Well, Waymo.
reply
YeGoblynQueenne
19 days ago
[-]
"Well Waymo" is not DeepMind.

Look. The other poster also said "Waymo" but I'm talking about DeepMind. It's DeepMind that promises to conquer the world with Deep Reinforcement Learning, and it's DeepMind that keeps showing us how great their DRL agents work in virtual worlds, like minecraft or starcraft, or how well they work on Chess and Go, but still haven't been able to demonstrate the application of those powerful learning approaches to real-world environments, except for very strictly controlled ones. Waymo's stuff works in the real world (although they do have remote safety drivers much as they try to downplay the fact) but they're also not pretending that they'll do it all with one big DRL "generalist" agent. That's DeepMind's schtick.

For example, it was, I believe, DeepMind that recently publicised some results about legged robot football, where the robots were controlled by agents trained with DRL in a simulation. That's robot football: two robots (yeah, no teams) kicking a ball in the safest of safe environments: a (reduced-size) football field with artificial grass, probably padded underneath (because robots) and no other objects in the play area (except anxious researchers who have to pull the robots on their feet once in a while). Running in the physical world in principle, but in practice nothing but a tech demo.

Or take the other Big Idea, where they had a few dozen robot arms reaching for various little plastic bits in a (specially-made) box to try and learn object manipulation by real-world DRL. I can find a link to those things if you want, but that robot arm project was a few years ago and you haven't heard anything from them since because it was a whole load of overpromising and it failed.

That kind of thing just doesn't generalise. More than that: it's a total waste of time and money. And yet DeepMind keeps banging the drum. They keep trying to convince everyone and themselves that training DRL agents in virtual environments has anything to do with the real world, and that it's somehow the road to AGI. "Reward is all you need". Yeah, OK.

Btw, Waymo is not using DRL, at least not exclusively. They use all sorts of techniques but from what I understand they do a hell of a lot of good, old-fashioned, manual programming to deal with all the stuff that magickal deep learning in the sky can't deal with.

reply
sdenton4
17 days ago
[-]
Oh, I see that /this/ Scotsman isn't true, either!

Waymo absolutely uses simulated multi-agent environments to improve their cars reliability; here's an example research artifact: https://waymo.com/research/waymax/

I think you're deluding yourself about the progress in this area. There's an enormous amount of specialized work in bringing results from research to market. WayMo does that work, but it simply isn't worth doing for things like robot football or simple object manipulation. So you're simply not going to see a 1:1 alignment of 'pure' research teams and applications teams. That doesn't mean that the research work hasn't led to improvements in applications, though.

reply
aspenmayer
21 days ago
[-]
Does Waymo count?
reply
YeGoblynQueenne
21 days ago
[-]
No: remote safety drivers; not DeepMind.
reply