FilterHN

Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise

154 points

by kelseyfrog

3 months ago

| past

| 11 comments

| arxiv.org

| HN

▲

xandergos

3 months ago

[-]

Hi everyone, I wrote this paper. Cool to see it has been posted here already.

I want to clarify some points on things other people have mentioned:

- This architecture is not as fast as Perlin noise. IMO it is unlikely we will see any significant improvement on Perlin noise without a significant increase in compute, at least for most applications. Nonetheless, this system is not too slow for real-time use. In the Minecraft integration, for instance, the bottleneck in generation speed is by far Minecraft's own generation logic (on one RTX 3090 Ti).

- I agree that this is not "production-ready" for most tasks. The main issue is that (1) terrain is generated at realistic scales, which are too big for most applications, and (2) the only control the user has is the initial elevation map, which is very coarse. Thankfully, I expect both of these issues to be fixed pretty quickly. (1) is more specific to terrain generation, but I have a number of ideas on how to fix it. (2) is mostly an issue simply because I did not have the time to engineer a system with this many features (and as-is, the system is quite dense). I believe a lot of existing work on diffusion conditioning could be adapted here.

- The post title misses one key part of the paper title: "in Infinite, Real-Time Terrain Generation." I don't expect this to replace perlin noise in other applications. And for bounded generation, manual workflows are still superior.

- The top level input is perlin noise because it is genuinely the best tool for generating terrain at continental scale. If I had more time on my hands, I would like to use some sort of plate tectonics simulator to generate that layout, but for something simple, reasonably realistic, and infinite, perlin noise is pretty much unbeatable. Even learned methods perform on-par with perlin noise at this scale because the data is so simple.

▲

semi-extrinsic

3 months ago

[-]

The paper is cool, and this is not a criticism, but Fig. 4 gives strong "draw the rest of the owl" vibes.

▲

kelseyfrog

3 months ago

[-]

> in Infinite, Real-Time Terrain Generation

My sincerest apologies. The submission disallowed the title in its entirety. It's generally unclear if the guidance for submitters favors omission or rewording. I take full responsibility for omitting those qualifiers.

▲

xandergos

3 months ago

[-]

No worries! It's worded fine as-is given the restrictions. Just wanted to clear up any misunderstandings.

▲

sieste

3 months ago

[-]

Cool work, congratulations.

Why did you put "real-time" in the title though when generation takes > 7 seconds?

▲

demarq

3 months ago

[-]

The author points out that the sheer scale of what you generate in that 7 seconds is so vast that you’ll have plenty of time to generate the next tile even when moving at extreme speeds. So your only problem is the first tile, which you can pre compute at the very beginning.

Latency vs Throughput

▲

treyd

3 months ago

[-]

I think it's pretty neat that, in implementing this for the game, you wrote a terrain gen mod lets you call out to a server running as a separate process. I can't believe nobody had the thought of doing that before (to my knowledge), but it makes so much sense.

▲

noodletheworld

3 months ago

[-]

The irony of this paper is that the part that isn't geographic (towns, roads, fields, etc) which have non-random ordered structure are the parts that are most suitable for this approach.

▲

xandergos

3 months ago

[-]

Could be! I think maps are definitely the most interesting direction for future research to take.

▲

unconed

3 months ago

[-]

1) A system that needs _seconds per tile_ is not suitable for real-time anything imo.

The irony is that you explicitly posited your thing as a successor to Perlin noise when in fact, it's just a system that hallucinates detail on top of Perlin (feature) noise. This is dishonest paper bait and the kind of AI hubris that will piss off veterans in the scene.

2) I'm also disappointed that nowhere is there any mention of Rune Johansen's LayerGen which is the pre-AI tech that is the real precedent here.

Every time I see a paper from someone trying to apply AI to classic graphics tech, it seems they haven't done the proper literature study and just cite other AI papers. It seems they also haven't talked to anyone who knows the literature either. https://runevision.com/tech/layerprocgen/

3) >The top level input is perlin noise because it is genuinely the best tool for generating terrain at continental scale

This is a non-sense statement. I don't know what you are thinking here at all, except maybe that you are mistakenly using "Perlin" as a group noun for an entire style of functions.

Perlin has all sorts of well-known issues, from the overall "sameyness" (due to the mandatory zero-crossings and consistent grid size) as well as the vertical symmetry which fails to mimic erosion. Using it as the input to a feature vector isn't going to change that at all.

The idea of using plate tectonics is much better, but, vastly _different_ from what you have done. And btw, every plate tectonics simulation that I've seen does not look convincing. If you treat it as a simple transport problem, the result just looks like a Civilization 1 map. But if you want to treat it seriously, then the tectonics have to be the source of all your elevation changes, and not just some AI hallucination on top afterwards. The features would all have to make sense.

Your abstract states that classic terrains are "fundamentally limited in coherence"... but even to my non-geologist eye, your generated heightmaps seem incredibly blobby and uncanny. This makes me think that a real geologist would immediately spot all sorts of things that don't make any sense. For example, if you added water and rivers to the terrain, would it work, or would you end up with non-sense loops and Escher-like watersheds?

(mostly I'm disappointed that the level of expertise in AI tech is so low that all these things have to be pointed out instead of being things you already knew)

▲

nathan_douglas

3 months ago

[-]

> And btw, every plate tectonics simulation that I've seen does not look convincing.

It's an amazing problem! I haven't spent much time on it - maybe 20-30 hours spread out over several years - but it _is_ something I come back to from time to time. And it usually ends up with me sitting there, staring at my laptop screen, thinking, "but what if I... no, crap. Or if we... well... no..."

TBH it's one of the things that excites me, because it makes it clear how far we still have to go in terms of figuring out these planet-scale physical processes, simulating them, deriving any meaningful conclusions, etc. Still so much to learn!

▲

euleriancon

3 months ago

[-]

I worked on something very similar for my master's degree.

The problem I could never solve was the speed, and from reading the paper it doesn't seem like they managed to solve that either.

In the end, for my work, and I expect for this work, it is only usable for pre generated terrains and in that case you are up against very mature ecosystems with a lot of tooling to manipulate and control terrain generation.

It'll be interesting to see of the authors follow up this paper with research into even stronger ability to condition and control terrain outputs.

▲

reactordev

3 months ago

[-]

I came here to say this. My masters was on procedural generation. Perlin, fBm, etc. The things these noise functions have that an LLM doesn’t is speed. 1-D perlin is just a dozen or so multiplications with a couple random coefficients. The GPU can do 4-D Perlin all day long every frame taking up a 4096x4096x32 texture volume.

While I do like the erosion effects and all, having a few height texture brushes that have those features that you can multiply on the GPU is trivial. I still welcome these new approaches but like you said, it’s best for pre generation.

▲

yunnpp

3 months ago

[-]

My masters was also on procedural generation. Now I wonder how many of us are out there.

At any rate, given that this paper divides the terrain in regions and apparently seeds each region deterministically, it looks like one could implement a look-ahead that spawns the generation on async compute in Vulkan and lets it cook as the camera flies about.

▲

DonHopkins

3 months ago

[-]

Now days there are a whole lot of people procedurally generating their masters degrees, but that's a different thing entirely.

▲

yunnpp

3 months ago

[-]

lol, this got me good.

▲

daemonologist

3 months ago

[-]

I think it's catnip for programmers, myself included. (See also: boids, path traced renderers, fluid simulations, old fashioned "generative"/plotter art, etc. - stuff with cool visual output)

▲

tonyarkles

3 months ago

[-]

Boids is so satisfying!

▲

VikingCoder

3 months ago

[-]

Boids, Game of Life, Genetic Algorithms, Pixel Shaders...

All so satisfying to play with.

One of my favorites was when I was sure I was right about the Monty Hall problem, so I decided to write a simulator, and my fingers typed the code... and then my brain had to read it, and realize I was wrong. It was hilarious. I knew how to code the solution better than I could reason about it. I didn't even need to run the program.

▲

reactordev

3 months ago

[-]

Which is what a sane terrain system would do. Just beyond the far plane you would load/gen the tile/chunk and as you got closer, increase the resolution/tessalation/etc. (or you start with high and each level away you skip vertices with a wider index march for a lower lod).

In any case, like I said, I welcome any new advances in this area. Overhang being the biggest issue with procedural gen quad terrain. Voxel doesn’t have that issue but then suffers from lack of fine detail.

▲

anentropic

3 months ago

[-]

I am curious - why would you want to keep generating the same perlin noise every frame? why not pre-generate the terrain?

▲

orbital-decay

3 months ago

[-]

Convincing AND useful procedural terrain is usually hard-simulated along some manually placed guides, which is typically faster and more versatile than a diffusion model. I don't see any model being used in practice for this, at least not until it has good controlnets trained specifically for this task. However something like this can be useful for texture generation, especially with geometry/camera position/lighting as additional inputs.

▲

lawlessone

3 months ago

[-]

It's interesting but i can't see it replacing Perlin or Simplex noise which are used for more than just terrain generation.

edit: I don't think i have the vocabulary to describe other issues i have other than it doesn't feel like the right way to "solve" this problem.

I'd prefer something that was entirely code rather than requiring training, and possibly retraining to get what i want.

edit2: Also is this entirely flat ? or can it be applied to a sphere (planet) , or terrain inside a cylinder (rotating space habitat) ?

▲

noodletheworld

3 months ago

[-]

> Also is this entirely flat?

Yes.

The output in this case is a 90m heightmap (ie. 2d grayscale image).

▲

vessenes

3 months ago

[-]

This is really awesome, actually! It looks great, very diverse, and clearly is scalable to extremely large maps. Props for testing the generation out on Minecraft - where terrain generation really matters.

▲

strongbond

3 months ago

[-]

No it's not. The whole presentation of it is confusing. Sorry.

▲

vessenes

3 months ago

[-]

This is a low quality, extremely low quality comment. What in particular did you find confusing?

▲

Fraterkes

3 months ago

[-]

I’m not sure if I understand the usecase: for a lot of generated worlds (of eg games) you don’t just want downsampled “realistic” topology, you want specific stylization and fine-grained artistic control. For those cases this is worse than “raw” noise. If all you wanted was to generate plausible, earth-like maps, Gemini, or Gpt would do a comparable job (with more control Id wager)

▲

liamzebedee

3 months ago

[-]

I wonder if you could use this to generate the fractal terrain by the ocean like in Sydney. We already have "artificial reefs". We have built beauty in the build environment (architecture), some in the natural environment (gardens, forests), but comparatively less when it comes to the ocean.

▲

stevefan1999

3 months ago

[-]

Not just this could improve Minecraft world generation (heh) but this could also have good use on 3D surface material generation as well, namely on the layering of different materials and generation using multi diffusions, if you look at the surface as a microscopic terrain

▲

Imnimo

3 months ago

[-]

I wonder what it would take to adapt a model like this to generate non-Earthlike terrain. For example, if you were using it to make planets without atmospheres and without water cycles, or planets like Io with rampant volcanism.

▲

DonHopkins

3 months ago

[-]

Since 1996, Ken Perlin has published a whole bunch of extremely cool Java applet demos on his web page, which he uses to teach his students at NYU and anyone who wanted to learn Java and computer graphics. One of his demos was a procedural planet generator!

I learned a lot from his papers and demo code, and based the design of The Sims character animation system on his Improv project.

https://mrl.cs.nyu.edu/~perlin/ (expired https cert)

https://web.archive.org/web/20001011065024/http://mrl.nyu.ed...

Here's a more recent blog post about a new one using WebGL, Dragon Planet:

https://blog.kenperlin.com/?p=12821

Here's another blog post about how he's been updating his classic Java applets by rewriting them in JavaScript:

https://blog.kenperlin.com/?p=27980

▲

euleriancon

3 months ago

[-]

In practice you can use 2d generation on spheres with simple UV mapping techniques. Your pixel height becomes distance from the sphere origin.

▲

lawlessone

3 months ago

[-]

Will it not get all bunched up near the poles though? and maybe have seam where the ends of the tiles meet?

edit: Perlin noise and similar noise functions can be sampled in 3d which sorta fixes the issues i mention , and higher dimensions but i am not sure how that would be used.

▲

DonHopkins

3 months ago

[-]

Yes, you can use a 3d Perlin noise field and sample it on the surface of the sphere, to get seamless texture without any anomalies at the poles or projection distortion. That applies to any 3d shape, not just spheres -- it's like carving a solid block of marble. And use 4d Perlin noise to animate it!

It's easy to add any number of dimensions to Perlin noise to control any other parameters (like generating rocks or plants, or modulating biomes and properties like moisture across the surface of the planet, etc).

Each dimension has its own scale, rotation, and intensity (a transform into texture space), and for any dimension you typically combine multiple harmonics and amplitudes of Perlin noise to generate textures with different scales of detail.

The art is picking and tuning those scales and intensities -- you'd want grass density to vary faster than moisture, but larger moist regions to have more grass, dry regions are grassless, etc.

▲

xandergos

3 months ago

[-]

I've thought about this before, and I think there is some way you could find to do it. For example, you could generate on the mercator projection of the world, and then un-project. But the mercator distorts horizontal length approaching the poles. I think it would be complex to implement, but you could use larger windows closer to the poles to negate this.

▲

MindSpunk

3 months ago

[-]

You're still going to run into problems with mercator because under mercator the poles project to infinity, so you'd need an infinitely large texture or you special-case the poles. Many renderers do this so it is viable!

There isn't a zero tradeoff 2D solution, it's all just variations on the "squaring the circle" problem. An octahedral projection would be a lot better as there are no singularities and no infinities, but you still have non linear distortion. Real-time rendering with such a height map would still be a challenge as an octahedral projection relies on texture sampler wrapping modes, however for any real world dataset you can't make a hardware texture big enough (even virtual) to sample from. You'd have to do software texture sampling.

▲

pezezin

3 months ago

[-]

Yes, to generate a whole planet it is a much better idea to use something like a cube map.

▲

noodletheworld

3 months ago

[-]

Mm. This paper makes it hard to understand what they've done.

For example:

> MultiDiffusion remains confined to bounded domains: all windows must lie within a fixed finite canvas, limiting its applicability to unbounded worlds or continuously streamed environments.

> We introduce InfiniteDiffusion, an extension of MultiDiffusion that lifts this constraint. By reformulating the sampling process to operate over an effectively infinite domain, InfiniteDiffusion supports seamless, consistent generation at scale.

…but:

> The hierarchy begins with a coarse planetary model, which generates the basic structure of the world from a rough, procedural or user-provided layout. The next stage is the core latent diffusion model, which transforms that structure into realistic 46km tiles in latent space. Finally, a consistency decoder expands these latents into a high-fidelity elevation map.

So, the novel thing here is slightly better seemless diffusion image gen.

…but, we generate using a heirsrchy based on a procedural layout.

So basocally, tldr: take perlin noise, resize it, and then image-2-image use it as a seed to generate detailed tiles?

People have already been doing this.

Its not novel.

The novel part here is making the detailed tiles slightly nicer.

Eh. :shrug:

The paper obfuscates this, quite annoyingly.

Its unclear to me why you cant just use multi diffusion for this, given your top level input is already bounded (eg. User input) and not infinite.

▲

treyd

3 months ago

[-]

They built a system that can be trained on real-world data so it can output more real-world-like data. It's much more sophisticated than what you could hand-tune with Perlin noise and can have coherent terrain features across multiple scales which are harder to do in classical Perlin noise.

▲

wafngar

3 months ago

[-]

Very common in ML research these days - claim novelty / cite prior work in an obfuscated way and so on.

▲

bombcar

3 months ago

[-]

Sounds like we need to add it to GTNH (only half joking)