Oasis: A Universe in a Transformer
256 points
2 months ago
| 35 comments
| oasis-model.github.io
| HN
redblacktree
2 months ago
[-]
"If you were dreaming in Minecraft" is the impression that I get. It feels very much like a dream with the lack of object permanence. Also interesting is light level. If you stare at something dark for a while or go "underwater" and get to the point where the screen is black, it's difficult to get back to anything but a black screen. (I didn't manage it in my one playthrough)

Very odd sensation indeed.

reply
robotresearcher
2 months ago
[-]
I don't see how you design and ship a game like this. You can't design a game by setting model weights directly. I do see how you might clone a game, eventually without all the missing things like object permanence and other long-term state. But the inference engine is probably more expensive to run than the game engine it (somewhat) emulates.

What is this tech useful for? Genuine question from a long-time AI person.

reply
naed90
2 months ago
[-]
Yep! Which is why a key point for our next models is to get to a state that you can "code" a new world using "prompting". I agree that these tools become insanely useful only once there is a very good way for creators to "develop" new worlds/games on top of these systems and then users could interact with those worlds.

At the end of the day, it should provide the same "API" as a game engine does: creators develop worlds, users interact with those worlds. The nice thing is that if AI can actually fill this role, then it would be: 1. Potentially much easier to create worlds/games (you could just "talk" to the AI -- "add a flying pink elephant here") 2. Users could interact with a world that could change to fit each game session -- this is truly infinite worlds

Last point: are we there yet? Ofc not! Oasis v1 is a first POC. Wait just a bit more for v2 ;)

reply
notfed
2 months ago
[-]
Obviously this tool is not going to generate a "ship"pable game for you. AI is a long way off from that. As for "design", I don't find it very hard to see how incredibly useful being able to rapidly prototype a game would be, even if it requires massive GPU usage. And papers like these are only stepping stones to getting there.
reply
robotresearcher
2 months ago
[-]
> being able to rapidly prototype a game would be

I don't see how this does that, or is a step towards that. Help me see it?

reply
kixpanganiban
2 months ago
[-]
Right now it looks like Oasis is only trained on Minecraft. Imagine if it was trained in thousands of hours of other games as well, of different genres and styles.

Ostensibly, a game designer can then just "prompt" a new game concept they want to experiment with, and Oasis can dream it into a playable game.

For example, "an isometric top-down shooter, with Maniac mechanics, and Valheim graphics and worldcrafting, set in an ancient Nordic country"

And then the game studio will start building the actual game based on some final iteration of the concept prompt. A similar workflow already to concept art being "seeded" through Midjourney/SD/flux today.

reply
robotresearcher
2 months ago
[-]
Thanks! That’s such an ambitious endgame that it didn’t occur to me.
reply
sangnoir
2 months ago
[-]
I found the visual artifacts annoying. I wonder if anyone has trained models on pre-rasterizarion game engine output like meshes/material, camera or even just the raw OpenGL calls. An AI that generates inputs to an actual renderer/engine will solve visual fidelity
reply
stale2002
2 months ago
[-]
> I don't see how you design and ship a game like this. You can't design a game by setting model weights directly. I do see how you might clone a game

Easy. You do the same thing that we did with AI images, except with video game world models.

IE, you combine together multiple of them, and taking bits and pieces of each game "world model", but put together, is almost like creating an entirely new game.

> eventually without all the missing things like object permanence and other long-term state.

Well, just add in those other things with a much smaller set of variables. You are already sending in the whole previous frame, plus user input into the weights. I see no reason why you couldn't send in a simplified game state as well.

reply
andoando
2 months ago
[-]
I suppose you could potentially take a movie like avatar and create a somewhat interactive experience with it?
reply
reissbaker
2 months ago
[-]
I think it's an interesting tech demo. You're right that as-is it's not useful. Here are some long-term things I could imagine:

1. Scale it up so that it has a longer context length than a single frame. If it could observe the last million frames, for example, that would allow significantly more temporal consistency.

2. RAG-style approaches. Generate a simple room-by-room level map (basically just empty bounding boxes), and allow the model to read the map as part of its input rather than simply looking at frames. And when your character is in a bounding box the model has generated before, give N frames of that generation as context for the current frame generation (perhaps specifically the frames with the same camera direction, even, or the closest to that camera direction). That would probably result in near-perfect temporal consistency even over very long generations and timeframes, assuming the frame context length was long enough.

3. Train on larger numbers of games, and text modalities, so that you can describe a desired game and get something similar to your description (instead of needing to train on a zillion Minecraft runs just to get... Minecraft.)

That being said I think in the near-term it'll be much more fruitful to generate game assets and put them in a traditional game engine — or generate assets, and have an LLM generate code to place them in an engine — rather than trying to end-to-end go from keyboard+mouse input to video frames without anything structured in between.

Eventually the end-to-end model will probably win unless scaling limits get hit, as per the Bitter Lesson [1], but that's a long eventually, and TBH at that scale there really may just be fundamental scaling issues compared to assets+code approaches.

It's still pretty cool though! And seems useful from a research perspective to show what can already be done at the current scale. And there's no guarantee the scaling limits will exist such that this will be impossible; betting against scaling LLMs during the gpt-2 era would've been a bad bet. Games in particular are very nice to train against, since you have easy access to near-infinite ground truth data, synthetically, since you can just run the game on a computer. I think you could also probably do some very clever things to get better-than-an-engine results by training on real-life video as well, with faked keyboard/mouse inputs, such that eventually you'd just be better both in terms of graphics and physics than a game engine could ever hope to be.

1: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

reply
thrance
2 months ago
[-]
> It's a video game, but entirely generated by AI

I ctrl-F'ed the webpage and saw 0 occurrence of "Minecraft". Why? This isn't a video game, this is a poor copy of a real video game you didn't even bother to say the name of, let alone credit it.

reply
armchairhacker
2 months ago
[-]
It seems like an interesting attempt to get around legal issues.

They can't say "Minecraft" because that's a Microsoft trademark, but they can use Minecraft images as training data, because everyone (including Microsoft) is using proprietary data to train diffusion models. There's the issue that the outputs obviously resemble Minecraft, but Microsoft has its own problems with Bing and DALL-E generating images that obviously resemble trademarked things (despite guardrails).

reply
chefandy
2 months ago
[-]
Avoiding legal and ethical issues. This stuff was made by a bunch of real people, and people still get their name in movie and game credits even if they got a paycheck from it. Microsoft shamelessly vacuuming up proprietary content didn't change the norms of the way people get credited in these mediums. It's sad to see how thoughtlessly so many people using generative AI disregard the models' source material as "data" while the models (and therefore their creators) almost always get prominently credited for putting in a tiny fraction of the effort. The dubious ethical defense against crediting source works— that models learn about media the same way humans do and adapt it to suit their purposes— is obliterated when it is trained on one work to reproduce that work. That this is equated to generating an image on Midjourney it's a blatant example of a common practice— people want to get credit for other people's work, but when it's time to take responsibility the way a human artist would have to, "The machine did it! It's not my responsibility!"
reply
woah
2 months ago
[-]
The point of this paper is to demonstrate a method of training an AI to output interactive game footage. They could have trained it for similar results with DOOM videos. Presumably the footage they trained on was OK to train on (I don't think a video of someone playing a video game is copyrighted by the video game's author), but they could have used a variety of other games.
reply
rodiger
2 months ago
[-]
Surprisingly, Nintendo has a long history of copyright-striking videos of folks playing their games.

https://www.ign.com/articles/2013/05/16/nintendo-enforces-co...

reply
stared
2 months ago
[-]
Algorithms trained on DOOM, or using DOOM as a showcase, mention DOOM.
reply
FeepingCreature
2 months ago
[-]
Do you really think this would be materially different if they used Minetest? To be frank, nothing in Minecraft as a game (rather than the code) deserves intellectual property protection; it copies games that came before and was copied by games that came after. It is an excellent, well-designed implementation of a very basic concept.
reply
chefandy
2 months ago
[-]
> nothing in Minecraft as a game (rather than the code) deserves intellectual property protection

> excellent, well-designed implementation

And there we see the problem laid bare. Excellent designs that are well-executed are not worthless facets of the real product. As we can see from Minecraft's success, that is the real product. People play video games for the experience, not to execute some logical construct a formal proof showing that it's fun. The reason that this demo uses Minecraft as opposed to a Minecraft knockoff is because Minecraft is better, and they are capitalizing on that. Even if that game is based on a well-worn concept, the many varied design problems you solve when making a game are harder than development, which is why damn near every game that starts open source is a knockoff of something other people already designed. It's not Mojang was some marketing powerhouse that knocked infiniminer off it's perch without merit.

reply
FeepingCreature
2 months ago
[-]
> And there we see the problem laid bare. Excellent designs that are well-executed are not worthless facets of the real product. As we can see from Minecraft's success, that is the real product.

Which is why I said "as a game, rather than the code", specifically. My whole point is that the elements which were assembled into it are not the valuable part!

I mean, what is Minecraft? Mine blocks, craft items. Fight skeletons, spiders, zombies and exploding zombies. The end boss is a dragon. It's Generic The Game.

The thing that the AI is training on is the thing without value - the look. Mojang gave that away in the billions of stream-hours, to their benefit.

reply
chefandy
1 month ago
[-]
The interface, look and feel, controls, color pallet, textures, animation, camera angles and movement, icons, branding, music, sfx, vfx... Those are all part of the experience and why it succeeded. Especially in games— people play games for the experience and the look and feel is a huge part of the experience.

Developers always think that these things have no value, yet as someone that's done a whole lot of work in both design and back-end development, the design and interface affect how people feel about the software far more than any technical underpinning. People play games because it makes them feel things, not because they want to interact with some novel game mechanic or use a technically superior engine.

And that is why pretty much the only open-source user-facing software with broad support— eg Firefox, blender, signal— are the ones that are foundation-backed with product managers that prioritize design. That's the core reason Mastodon failed to replace Twitter despite an incredible amount of momentum, the reason so few people use Linux desktops, and the reason that you'll have a hard time finding a professional photographer that's never tried gimp and an even harder time finding one that used it more than once. I've seen so many people pour thousands of collective hours into high-quality software with some cool conceptual ideas only for it to languish, unused. Developers know how much effort it takes to create software, and since they have a working mental model of how software works under the hood, they're far more tolerant to shitty interfaces. To most people, the interface IS the software. Developers, largely, have no clue what it takes to create high-quality design, and therefore undervalue it, and blame lack of FOSS adoption on commercial software marketing. User feedback doesn't support that assumption.

People buy things that look great just because they look great. When the pleasure of interaction is the whole point, technically or conceptually exceptional software with a substandard look and feel has no value.

reply
amelius
2 months ago
[-]
Maybe they are even __trying__ to get sued to set a legal precedent against Copilot.
reply
kiloDalton
2 months ago
[-]
There is one mention of Minecraft in the second paragraph of the Architecture section, "...We train on a subset of open-source Minecraft video data collected by OpenAI[9]." I can't say whether this was added after your comment.
reply
stared
2 months ago
[-]
It is weird - compare and contrast with https://diamond-wm.github.io/, which explicitly mentions Counter Strike.

When a scientific work uses some work and does not credit it, it is academic dishonesty.

Sure, they could have trained the model on a different dataset. No matter which source was used, it should be cited.

reply
dartos
2 months ago
[-]
Yeah it is strange how they make the model sound like it can generate any environment, but only shows demos of the most data-available game ever.
reply
blixt
2 months ago
[-]
Super cool, and really nice to see the continuous rapid progress of these models! I have to wonder how long-term state (building a base and coming back later) as well as potentially guided state (e.g. game rules that are enforced in traditional code, or multiplayer, or loading saved games, etc) will work.

It's probably not by just extending the context window or making the model larger, though that will of course help, because fundamentally external state and memory/simulation are two different things (right?).

Either way it seems natural that these models will soon be used for goal-oriented imagination of a task – e.g. imagine a computer agent that needs to find a particular image on a computer, it would continuously imagine the path between what it currently sees and its desired state, and unlike this model which takes user input, it would imagine that too. In some ways, to the best of my understanding, this already happens with some robot control networks, except without pixels.

reply
aithrowawaycomm
2 months ago
[-]
There's not even the slightest hint of state in this demo: if you hold "turn left" for a full rotation you don't end up where you started. After a few rotations the details disappear and you're left in the middle of a blank ocean. There's no way this tech will ever make a playable version of Mario, let alone Minecraft.
reply
blixt
2 months ago
[-]
There's plenty of evidence of state, just a very short-term memory. Examples:

- The inventory bar is mostly consistent throughout the play

- State transitions in response to key presses

- Block breakage over time is mostly consistent

- Toggling doors / hatches works as expected

- Jumping progresses with correct physics

Turning around and seeing approximately the same thing you saw a minute ago is probably just a matter of extending a context window, but it will inherently have limits when you get to the scale of an entire world even if we somehow can make context windows have excellent compression of redundant data (which would greatly help LLM transformers too). And I guess what I'm mostly wondering about is how would you synchronize this state with a ground truth so that it can be shared between different instances of the agent, or other non-agent entities.

And again, I think it's important to remember games are just great for training this type of technology, but it's probably more useful in non-game fields such as computer automation, robot control, etc.

reply
naed90
2 months ago
[-]
Hey, developer of Oasis here! You are very correct. Here are a few points: 1. We trained the model on a context window of even 30 sec. What's the problem? It barely pays any attention to frames beyond the past few ones. This certainly makes sense as it's a question of the loss function of the model during training. We are running now many different training runs to experiment with a better loss func (and datamix) to solve this issue. You'll see newer versions soon! 2. In the long term, we believe the "ultimate" solution is 2 models: 1 model that maintains game state + 1 model that turns that state into pixel. Think of it as having the first model be something resembling more of an LLM that gets the current state + user action and produces the new state, and then the second model being a diffusion model that takes from this state and maps to pixels. This would win the best of both worlds.
reply
throwaway314155
2 months ago
[-]
This stuff is all fascinating to me from a computer vision perspective. I'm curious - if you have a second model tasked with learning just the game state - does that mean you would be using info from the game itself (say, via a mod or with the developer console) as training data? Or is the idea that the model somehow learns the state (and only the state) on its own as it does here?
reply
naed90
2 months ago
[-]
That's a great question -- lots of experiments will be going into the future versions o Oasis. There are quite a few different possibilities here and we'll have to experiment with them a lot.

The nice thing is that we can run tons of experiments at once. For Oasis v1, we ran over 1000 experiments (end-to-end training a 500M model) on the model arch, datamix, etc., before we created the final checkpoint that's deployed on the site. At Decart (we just came out of stealth yesterday: https://www.theinformation.com/articles/why-sequoias-shaun-m...) we have 2 teams: Decart Infrastructure and Decart Experiences. The first team provides insanely fast infra for training/inferencing (writes from scratch everything from CUDA to redoing the python garbage collector) -- we are able to get a 500M model to converge during training in ~20h instead of 1-2 weeks. Then, Decart Experiences uses this infra to create these new types of end-to-end "Generated Experiences"

reply
bongodongobob
2 months ago
[-]
Nah, it doesn't even track which direction you're looking. Looking straight ahead, walk into some sugar cane so your whole screen is green. Now look up. It thinks you were looking down.
reply
blixt
2 months ago
[-]
I guess it comes down to your definition of state. I'm not saying there's enough state for this to be playable, but there is clearly state and I think it's important to point out how impressive the amount of temporal consistence and coherence this model is capable of, considering not long ago the state of the art here rapidly decohered into completely noisy pixels.
reply
FeepingCreature
2 months ago
[-]
In other words: there's enough state now that the lack of state stands out. It works well enough for its failures to be notable.
reply
bongodongobob
2 months ago
[-]
I guess if you consider knowing what color the pixels were in the last frame "state". That's not a definition anyone would use though. Spinning around and have the world continuously regenerate or looking at the sky and back down regenerating randomly is the opposite of state. It's complete incoherence.
reply
bubblyworld
2 months ago
[-]
Just a thought - complete incoherence is a noise function, no? Successive frames here are far more correlated than that, which is pretty remarkable.

My definition of state is something like reified bits of information, for which previous frames and such certainly count (knowing the current frame tells you a lot of information about the next frame vs not knowing the current frame).

reply
golol
2 months ago
[-]
Between the first half and the last sentence of your post is a giant leap of conclusion.
reply
blixt
2 months ago
[-]
Yeah probably, it remains to be seen if these models can actually help guide a live session towards the goal. At least it's been shown that these types of world models can help a model become better at achieving a goal, in lieu of a hard coded simulation environment or the real world, for when those options are not tractable.

My favorite example is: https://worldmodels.github.io/ (Not least of all because they actually simulate these simplified world models in your browser!)

reply
GaggiX
2 months ago
[-]
>There's no way this tech will ever make a playable version of Mario

Wait a few months, if someone is willing to use their 4090 to train the model, the technology is already here. If you could play a level of Doom than Mario should be even easier.

reply
duendefm
2 months ago
[-]
It's not a videogame, it's a fast minecraft screenshot simulator where the prompt between each frame is the state of the input and the previous frames, with something of a resemblance of coherence.
reply
jiwidi
2 months ago
[-]
So basically trained a model on minecraft. This is not generalistic at all or whatsoever. Is not like the game comes from a prompt, it probably comes from a bunch of finetuning and gigadatasets from playing minecraft.

Would love to see some work like this but with world/games coming from a prompt.

reply
naed90
2 months ago
[-]
wait for Oasis v2, coming out soon :) (Disclaimer: I'm from the Oasis team)
reply
whism
2 months ago
[-]
Allow the user to draw into the frame buffer during play and feed that back, and you could have something very interesting.
reply
dartos
2 months ago
[-]
It’d probably break wildly since it’s really hard to draw Minecraft by hand.
reply
brap
2 months ago
[-]
Waiting line is too long so I gave up. Can anyone tell me, are the pixels themselves generated by the model, or does it just generate the environment which is rendered by “classical” means?
reply
yorwba
2 months ago
[-]
If it were to generate an environment rendered by classical means, it would most likely have object permanence instead of regenerating something new after briefly looking away: https://oasis-model.github.io/3_second_memory.webp
reply
naed90
2 months ago
[-]
Every pixel is generated! User actions go in, pixels come out -- and there is only a transformer in the middle :)

Why is this interesting? Today, not too interesting (Oasis v1 is just a POC). In the future (and by future I literally mean a few months away -- wait for future versions of Oasis coming out soon), imagine that every single pixel you see will be generated, including the pixels you see as you're reading this message. Why is that interesting? It's a new interface for communication between humans and machines. It's like why LLMs are interesting for chat, because they provide humans and machines an ability to interact in a way humans are used to (chat) -- here, computers will be able to see the world as we do and show back stuff to us in a way we are used to. TLDR: imagine telling your computer "create a pink elephant" and just seeing it popup in a game you're playing.

reply
yokto
2 months ago
[-]
It generates the pixels, including the blurry UI at the bottom.
reply
xyzal
2 months ago
[-]
Maybe we should train models on Mario games to make Nintendo fight for the "Good Cause".
reply
gessha
2 months ago
[-]
I find this extremely disappointing. A diffusion transformer trained on Minecraft frames and accelerated on an ASIC... Okay?

From the demo(that doesn't work on Firefox) you can see that it's overfit to the training set and it doesn't have a consistent state transition.

If you define it as a Markov decision process with states being images, actions being keyboard/mouse inputs, the probability transition being the transformer model, the model is a very poor one. Turning the mouse around shouldn't result in a completely different world, it should result in the exact same point of space from different camera orientation. You can fake it by fudging with the training data and augmenting with walking a bit, doing a 360 camera rotation and continuing the exploration but that will just overfit to that specific seed.

The page says their ASICs model inference supports 60+ players. Where are they shown playing together? What's the point of touting multiplayer performance when realistically, the poor state transition will mean those 60+ players are playing single player DeepDream Minecraft?

reply
jmartin2683
2 months ago
[-]
Why? Seems like a very expensive way to vaguely clone a game.
reply
haccount
2 months ago
[-]
It's an early demo of interactive realtime inference but it appears to have a promise of promptable game worlds and mechanisms. Or "scriptable dynamic imagination" if you will.

The answer to "why?" when DeepDream demoed hallucinated dog faces in 2015 was contemporary diffusion models.

reply
hnlmorg
2 months ago
[-]
The dog demo was introducing something new. This isn't.

I don't want to be negative about someone else's project but I can completely understand why people are underwhelmed by this.

What I think will be the real application for AI in gaming isn't creating poorer versions of native code, it will be creating virtual novels that evolve with the player. Where characters are actual AI rather than following a predefined script. Where you, as the player, can change and shape the story as you wish. Think Star Trek Holodeck "holo-novels" or MMORPGs but can be played fully offline.

Rendering the pixels is possibly the worst application for AI at this stage because AI lacks reliable frame by frame continuity, rendering speed, nor an understanding of basic physics, which are all the bare minimum for any modern games engine.

reply
stale2002
2 months ago
[-]
> What I think will be the real application for AI in gaming isn't creating poorer versions of native code

"Why would you use this basic image diffusion model? It just creates a poorer version of existing images!" Is what your statement sounds like.

Obviously, you'd do the same thing with this game engine as is done with images. You combine together multiple world models, and thats almost the same as creating a new game.

Imagine playing minecraft and you could add a prompt "Now make complex rules for building space ships that can fly to different planets!" and it just mostly works, on the fly.

reply
hnlmorg
2 months ago
[-]
> “Why would you use this basic image diffusion model? It just creates a poorer version of existing images!" Is what your statement sounds like.

I’m not the OP and that’s not what my reply “sounds like”. You’re not reading my comment charitably.

> Obviously, you'd do the same thing with this game engine as is done with images. You combine together multiple world models, and thats almost the same as creating a new game.

That was actually my point.

Adding some VFX to a game engine doesn’t alter the game engines state.

What some of the proponents of this project are describing is akin to saying “ray tracing alters the game mechanics”.

It doesn’t matter how smart nor performant the AI gets at drawing pixels, it’s still just stateless pixels. This is why I was discussed practical applications of AI that can define the game engines state rather than just what you see on screen.

Image diffusion models specifically are not the right applications of AI to create and alter a persistent game state. And that’s why people are underwhelmed by this project.

reply
dartos
2 months ago
[-]
> it appears to have a promise of promptable game worlds

Is it a world if there’s no permanence?

We’ve seen demos like this for a while now (granted not as fast) but the core problem is continuity. (Or some kind of object permanence)

It’s a problem for image generators as well.

I’d be more interesting if that was any closer to being solved than to have a real time Minecraft screenshot generator.

I may have missed it, but I didn’t see anything about prompting. I’d be surprised if this model could generalize beyond Minecraft at all.

reply
thirdacc
2 months ago
[-]
>It’s a problem for image generators as well.

It was, about a year ago. It's a solved problem.

reply
dartos
2 months ago
[-]
I haven’t seen any image generator maintain spatial consistency for more than a few seconds of video (maybe a min for runway actually)

Definitely nothing that maintains it for hours on end.

If you have, please link them. I’m very interested.

reply
int_19h
2 months ago
[-]
It also seems like a pretty decent way to investigate emergent world models in NNs.
reply
piperly
2 months ago
[-]
From a research perspective, this approach isn’t new; David Ha and Danijar Hafner explored similar ideas years ago. However, the technique itself and the achievement of deploying it for testing by hundreds of users is commendable. It feels more like an experimental prototype than a viable replacement for mainstream gaming.
reply
shanim_
2 months ago
[-]
Could you explain how the interaction between the spatial autoencoder (ViT-based) and the latent diffusion backbone (DiT-based) enables both rapid response to real-time input and maintains temporal stability across long gameplay sequences? Specifically, how does dynamic noising integrate with these components to mitigate error compounding over time in an autoregressive setup?
reply
vannevar
2 months ago
[-]
If anyone has ever read Tad Williams' Otherland series, this is basically the core idea. "The dream that is dreaming us."
reply
djhworld
2 months ago
[-]
I think this is really cool as a sort of art piece? It's very dreamlike and unsettling, especially with the music
reply
0xTJ
2 months ago
[-]
Seems like a neat idea, but too bad that the demo it doesn't work on Firefox.
reply
tayiorrobinson
2 months ago
[-]

  // ==UserScript==
  // @name         oasis.decart.ai
  // @match        https://oasis.decart.ai/*
  // @run-at       document-start
  // ==/UserScript==
  
  chrome = true;
reply
piperly
2 months ago
[-]
Haha, it really works. Thanks!
reply
naed90
2 months ago
[-]
we really wanted it too! but webtrc was giving us lots of trouble on FF :( trust me, most of the team here is on FF too, and we're bummed we can't play it there haha
reply
amiramer
2 months ago
[-]
So cool! Curious to see how it evolves.. seems like a portal into fully generated content, 0 applications. So exciting. Will it also be promptable at some point?
reply
joshdavham
2 months ago
[-]
Incredible work! I think once we’re able to solidly emulate these tiny universes, we can then train agents within them to make even more intelligent AI.
reply
aaladdin
2 months ago
[-]
How would you verify that real world physics actually hold here? Otherwise, such breaches could be maliciously and unfairly exploited.
reply
mrtnl
2 months ago
[-]
Very cool tech demo! Curious to see if we continue to generate environments in this level or move more to generating the physics
reply
GaggiX
2 months ago
[-]
Kinda hyped to see how this model (or a much bigger one) will run on Etched's transformer ASIC, Sohu, if it ever comes out.
reply
th0ma5
2 months ago
[-]
This feels like a nice preview at the bottom of the kinds of unsolvable issues these things will always have to some degree.
reply
TalAvner
2 months ago
[-]
This is next level! I can't believe it's all AI generated in real time. Can't wait to see what's next.
reply
goranim
2 months ago
[-]
Love it! this virtual world looks so goo and it is also changing really fast so seems like a very powerful model!
reply
Daroar
2 months ago
[-]
I can see where they are going with it and wow! Truly the proof that we are all indeed in a simulation.
reply
drdeca
2 months ago
[-]
This apparently currently only supports chrome. I hope it will support non-chrome browsers in the future.
reply
therein
2 months ago
[-]
Queue makes it untestable. It isn't running client-side? What's with the queueing?
reply
pka
2 months ago
[-]
Negative comments are so weird, it's like people forgot what GPT 2 was like. I know this isn't completely new, but it's a world simulation inside a goddamn LLM. Not perfect, not coherent over longer time periods, but still insane. I swear if tomorrow magic turned out to be real and wizards start controlling the literal fabric of the universe people will be like "meh" before the week ends :D
reply
gunalx
2 months ago
[-]
Really cool tech demo. What for the most part impressed me is the inference speed. But I don't really see any use for this unless a way to store worldstate to avoid the issue of it forgetting what it just said.
reply
petersonh
2 months ago
[-]
Very cool - has a very dreamlike quality to it
reply
jhonj
2 months ago
[-]
tried their not-a-game and it was SICK to play knowing it's not a game engine. really sick. When did these Decart ppl started working on that. must be f genius ppl
reply
duan2112
2 months ago
[-]
Love it!!!
reply
keidartom
2 months ago
[-]
So cool!
reply
robblbobbl
2 months ago
[-]
Me gusta!
reply
hesyechter
2 months ago
[-]
Very very cool, i love it Good luck
reply