FilterHN

Waypoint-1: Real-Time Interactive Video Diffusion from Overworld

36 points

by avaer

6 hours ago

| past

| 6 comments

| huggingface.co

| HN

▲

dsrtslnd23

13 minutes ago

[-]

10,000 hours training data seems quite low for a world model?

▲

roskelld

1 hour ago

[-]

The context seemed to last a few seconds. I went from a mock up screenshot of a fantasy video game, complete with first person weapon. Then as I moved forward the weapon became part of the scenery and the whole world blurred and blended until it became some sort of sci-fi abstract space. Spinning the camera completely changed look and style.

I ended up with a UI that closely resembled the Cyberpunk 2077 one complete with VO modal popup. I guess it must have featured a lot in the training data.

Really not sure what to make of this, seems to have no constraints on concept despite the prompt (I specifically used the word fantasy), no spatial memory, no collision, or understanding of landscape features in order to maintain a sense of place.

▲

avaer

1 hour ago

[-]

Accurate to my experience hacking on this model today, but I don't think anyone's blowing smoke about it.

Thinking back to where GPT-3 was 5 years ago, I can't help but be a little bit excited. And unlike GPT-3 this is Apache.

▲

sampton

34 minutes ago

[-]

It's an acid dream.

▲