Bolt3D: Generating 3D Scenes in Seconds
290 points
7 months ago
| 11 comments
| szymanowiczs.github.io
| HN
bhouston
7 months ago
[-]
It doesn't seem to work that well. Once you move off the primary camera axis, like rotate around, you notice that there are many regions with only sparse resolution and there are gaps everywhere. It is totally unusable for anything.

Sure, it solves for the primary view, but this is claiming it is a 3D scene reconstruction/inference technique and in that claim it only sort of works.

For example: https://i.postimg.cc/43tj36jv/Screenshot-2025-03-20-at-8-52-...

reply
gcapu
7 months ago
[-]
Where is this image from? Were you able to try the model?
reply
burgerone
7 months ago
[-]
What's the use case for putting AI into everything? Pretty much every AI product so far has been and still is subject to hallucinations and inaccuracies and on top of that it's hugely computing intensive. Sure, it's the best we have right now and it allows us to do things that were previously next to impossible with manual programming work, but it's far from being something that's actually viable. And what would be the use case for turning a picture into an approximated 3d mesh that is only really complete from one angle? LIDAR does a stunningly accurate job at that already, reproducibly (although granted that this cannot retroactively be applied to existing photos).
reply
graypegg
7 months ago
[-]
So I agree with you, but to be fair it is neat, and I think academia should be allowed to try things with little to no *immediate* commercial value. Being "neat" is enough IMO if there's enough resources to go around.

In the long run, yeah this *exact* application is sort of pointless. I expected to see the lens parameters factored into the process. It's not. This would mean that everything is not only dimensionally inaccurate since there's no reference measurement, but also proportionally inaccurate to other things in the scene. You can actually see the effect of that on the "flower car" example. (the entire shape of the car is warped) Let alone the fact that the entire scene that can't be seen in the original photo is made up.

Maybe someone would use this to make game assets? But you'd need to fix them up a ton before using them. Other sibling comments make the point that there's no wireframes... so we can assume the polygon count here is insane.

Either way... it's just neat.

reply
AlexeyBelov
7 months ago
[-]
> What's the use case for putting AI into everything?

Money.

reply
scyzoryk_xyz
7 months ago
[-]
I’m imagining this approach being combined down the line with your typical photogrammetry approach for reinforcing quality.
reply
diggan
7 months ago
[-]
Show. Us. The. Wireframes!

Every single time a new "Generate 3D" thing appears, they never show the wireframes of the objects/scenes up front, always you need to download and inspect things yourself. How is this not standard practice already?

Not displaying the wireframes at all, or even offer sample files so we could at least see it ourselves, just makes it look like you already know that the generated results are unusable...

reply
jsheard
7 months ago
[-]
They usually don't show the material channels either, which I assume is because there aren't any, and instead the lighting is statically baked into the asset. That works for a demo where you just wiggle the camera in a circle, but it'll immediately fall apart if the lighting environment changes or anything in the scene moves.
reply
Legend2440
7 months ago
[-]
Think of it more like a 3D picture than an animation model.

There are no materials channels or wireframe. It’s a volumetric 3D representation, like a picture made up of color blobs.

reply
kookamamie
7 months ago
[-]
And thus unusable for most things 3D models and scenes would be used for today.
reply
hwillis
7 months ago
[-]
Apples, not oranges. This isn't for 3d models and scenes- think of it as a fancy version of streetview or an apartment walkthrough. Those project pictures onto a sphere around you. Gaussian splats are an improvement that uses multiple images to interpolate viewpoints- take two pictures from different angles, and guassian splats let you take a look from between those views, or above or below them.

This method uses ai to generate even more unseen structure, so with relatively few images you can still represent a real scene with some level of fidelity. It will never need dynamic lights or animation because the point is just to look as close as possible to a still image. Splats do that FAR better and more efficiently than you ever could with dynamic lighting, triangulated models, and visual effects.

reply
ForTheKidz
7 months ago
[-]
I see your point, but also consider that interactivity comes to mind in part because 3d models are so expensive to describe compared to 2d shapes that they're largely worth it for interactive stuff. We might see more innovation on that front with a low-cost barrier to entry.

Plus, scaling dynamic lighting up has always been the Big Bad of computer graphics, and precomputation will always give us an amazing heuristic to use against it. Everything else basically tends towards not mattering: we can only absorb a finite number of details, but we live in a world with virtually infinite lights.

reply
teamonkey
7 months ago
[-]
Honestly, I thought this was the most practical and usable example of AI generation I've seen to date. I actually found it refreshing after all the guff we usually see.

I bet in a couple of years it'll be standard for estate agents to show 3D views like this on their web sites, architects converting quick paintovers of existing sites to 3D models, improvements to Street View, and so on. Anywhere where you want a quick 3D view of a space based on a few photos taken on a smartphone and where accuracy isn't 100% important.

For things like games, it still follows the existing photogrammetry workflow (with all of those problems), but it might reduce the number of photos needed to create a point cloud.

reply
ajwin
7 months ago
[-]
My understanding is that it is not mesh, it’s Gaussian Splatting. There are tools to convert Splats into mesh though.
reply
diggan
7 months ago
[-]
Yeah, but isn't still the expected outcome to end up with actual 3D objects, not point clouds? Or did people start integrating point clouds into their 3D workflows already? Besides for stuff like volumes and alike, I think most of us are still stuck working with polygons for 3D.
reply
tracerbulletx
7 months ago
[-]
No geometry in the conventional sense. I did a demo of rendering a Gaussian splat in React Three Fiber here, you can open the linked splat file (its hosted on hugging face) if you want to see the data format. https://codesandbox.io/p/sandbox/3d-gaussian-splat-in-react-... I also have this youtube video about creating that demo https://www.youtube.com/watch?v=6tVcCTazmzo
reply
lmpdev
7 months ago
[-]
I use pointclouds all the time in Rhino/Lastools/Meshlab

I much prefer pointclouds and nurbs over meshes

Not everything is gamedev

reply
diggan
7 months ago
[-]
> Not everything is gamedev

Agree, I'm not sure why you'd think that's the only use case for 3D, unless I misunderstand your argument here.

How would you handle visual effects with point-clouds for example? There are so many use cases for proper 3D, and all I can think of as use-case for point clouds are environments with static lightning, which seems like a really small part of what people generally consider "3D scenes".

reply
lmpdev
7 months ago
[-]
> Visual effects

Maybe I missed the mark on “gamedev”, but 3D is larger than just “aesthetically pleasing 3D VFX” for its own sake

Often I’m trying to use something as a reference for a design where a 3D model isn’t the actual end goal, or I’m performing analytics on a 3D object (say in my case for a lot of GIS and simulation work)

The whole “mesh is the be all and end all of 3D modelling” irks me as while yes it’s a really important way of representing an object (especially with real time constraints), it doesn’t do justice to the full landscape of techniques and uses for 3D

It would be like 2D sprite artists from the gamedev world saying “what’s the point of all this vector art you illustrators are doing” or “what’s the point of all these wireframe designs you graphic designers are doing” - “these aren’t raster images!”

I suppose my snipe was trying to communicate the idea that 3D is larger than just a vehicle for entertainment production. It intersects many industries that may eschew polygons because real time rendering is irrelevant

3D tooling has uses beyond producing 3D scenes, just as Photoshop is used for more than touching up photographs

Edit: for anyone stuck in a rut with meshes come join the dark side with nurbs - it makes you think about modelling in a radically different way (unfortunate side effect is it makes working with meshes feel so so “dirty”)

reply
CyberDildonics
7 months ago
[-]
The whole “mesh is the be all and end all of 3D modelling”

No one said this, it seems like you are making up fake questions and not dealing with the actual questions that the person you replied to asked.

You can view point clouds and you can warp them around, but working with them and tracing rays becomes a different story.

Once you need something as a jumping off point to start working with, point clouds are not going to work out anymore. People use polygons for a reason. They have flexible UVs, they can be traced easily, they can be worked with easily, their data is direct, standard and minimal.

reply
MITSardine
7 months ago
[-]
Games are the least of it, the vast majority of scientific applications to do with physics use meshes rather than point clouds.

This is because a point cloud does not represent a surface or a volume until the points are connected to form, well, a surface or a volume.

And physical problems are most often defined over surfaces or volumes. For instance, waves don't propagate over sparse sets of points, but within continuous domains.

However, for applications where geometric accuracy is needed, I think you wouldn't want to use a method based on a minimal number of photographs anyways. For instance, the Lascaux cavern was mapped in 3D a decade ago based on "good old" algorithms (not machine learning) and instruments (more sophisticated than a phone camera). So these critiques are missing the point, in my opinion. These Gaussian Splatting methods are very impressive for the constraints they operate under!

reply
Legend2440
7 months ago
[-]
You can convert splats into meshes using a simple marching cubes algorithm.

But the meshes produced are not easy to edit.

reply
dheera
7 months ago
[-]
Generating good meshes sounds like a problem for a completely different machine learning algorithm to me.
reply
MITSardine
7 months ago
[-]
Meshing has been around long before machine learning came to prominence, there's plenty of methods to improve surface meshes already.
reply
Legend2440
7 months ago
[-]
None of which work.

There is no good method to take a 3D scan and make a sensible mesh out of it. They tend to have far more vertices than necessary and lack structure.

reply
MITSardine
7 months ago
[-]
I don't know what you mean by lacking structure, but perhaps you are not aware of all the tools that exist, because fixing surface meshes is a rather classic problem. Just type "surface remeshing" or "surface mesh optimization" on google scholar and you'll see thousands of results.

This is a separate problem from triangulation (turning point clouds into meshes) done with entirely different algorithms. It's likely the software you used for this assumes the user will then turn to other software to improve their surface mesh.

Even for operations that are naturally in sequence, you will often find the software to carry out those steps is separated. For instance turning CAD into a surface mesh is one software, turning a surface mesh into a tetrahedral volume mesh another (if those are hexahedra, then yet another), and then optimizing or adapting those meshes is done by yet another piece of software. And yet these steps are carried out each time an engineer goes from CAD to physical simulation. So it's entirely possible the triangulation software you used does not implement any kind of surface optimization and assumes the user will then find something else to deal with that.

reply
hwillis
7 months ago
[-]
If you wanted to show someone a walkaround of the Sistine chapel or David, would you be better off using triangles and PBR and raycast lighting? You don't really gain anything from all that; you're doing a tremendous amount of computation just to recapture the particular lighting at an exact time. If you want the same detail that a few good pictures capture -tens of millions of pixels- you need to have many billions of triangles onscreen.

With splats you can have incredibly high fidelity with identical lighting and detail built in already. If you want to make a game or a movie, don't use splats. If you want to recreate a static scene from pictures, splats work very well.

reply
text0404
7 months ago
[-]
splats augment 3D scenes, they don't replace them. i've seen them used for AR/VR, photogrammetry, and high-performance 3D. going from splats to a 3D model would be a downgrade in terms of performance.
reply
dmarcos
7 months ago
[-]
What’s the best use of splats that you’ve seen so far that I can try? AR/VR or regular 3D
reply
nine_k
7 months ago
[-]
Meshes are editable. Are Gaussian splats?
reply
dmarcos
7 months ago
[-]
What kind of edits you mean? You can crop / combine splats easily in your browser with supersplat (not affiliated)

https://superspl.at/editor

reply
text0404
7 months ago
[-]
kinda but not really in a meaningful way, at least not yet. there's some plugins for popular 3D software but it's still early days.
reply
therealpygon
7 months ago
[-]
Yea, someone can say, “Look, we have just created the first color computer and it displays images. Look at this first ever real life photo on this digital screen!” There will always be the people who ask, “Yeah, but does it run Photoshop?”
reply
lallysingh
7 months ago
[-]
Expected by whom? Other researchers in this space? That's the audience for this work.
reply
dheera
7 months ago
[-]
Not necessarily.

If you're using it to render video you don't need to go into the mesh world.

reply
eMPee584
7 months ago
[-]
Isn't https://svraster.github.io/ just superceding gaussians? Voxels are also not meshes, but might they not prove even more useful for coming rendering engines..?
reply
text0404
7 months ago
[-]
splats don't have wireframes, and they have an embedded webgpu viewer in the linked page.
reply
quitit
7 months ago
[-]
I think there should be a standard set of images for comparison, because I've never seen a mesh generator readme that wasn't impressive. I test each one I get my hands on and the results are often disappointing.
reply
slowtrek
7 months ago
[-]
Is anything like this available locally yet?
reply
emmelaich
7 months ago
[-]
Here's the repo: https://github.com/szymanowiczs/splatter-image

Apparently you can clone and run the demo locally. But wasn't clear at a glance how much is local and what hardware required.

reply
echelon
7 months ago
[-]
Your link above (Splatter Image) is not the same code / paper / research as Bolt3D.

This is a previous paper/work by the lead author a year before they interned at Google Research and produced Bolt3D.

Bolt3D appears to be his intern research project done in conjunction with a bunch of other Google and DeepMind researchers.

I don't suspect there will ever be publicly available code for this.

reply
kombine
7 months ago
[-]
This is previous work. Bolt3D uses the same principle, of predicting a per-pixel Gaussian splatting representation but it also trains a diffusion model, which is only feasible if you have substantial compute available.

Given that it's a work done at Google I will not expect them to release source code. But it will be reproduced by someone else soon enough.

reply
emmelaich
7 months ago
[-]
True, I should've said related work.
reply
ashikns
7 months ago
[-]
Isn't it generating in the browser using webgpu?
reply
gessha
7 months ago
[-]
I assume that’s for interactive viewing only, not for generation.

> Our method takes 6.25 seconds to reconstruct one scene on a single H100 NVIDIA GPU or 15 seconds on an A100.

reply
dvrp
7 months ago
[-]
I mean, it's the same author but seems like co-authors are different.

How do you know it's the actual implementation?

reply
noduerme
7 months ago
[-]
I'm very unclear as to what is supposed to be happening locally here, but as soon as the demo finishes loading, it crashes firefox on my phone.
reply
lukan
7 months ago
[-]
I assume WebGPU related? FF lacks behind in support and I assume they use it under the hood. Meaning I would try chrome. (I know, I know)
reply
antonkar
7 months ago
[-]
We’ll hopefully convert an LLM into a 3D haunted house and finally democratize AI interpretability by having millions of gamers walking in them
reply
oplane
7 months ago
[-]
Can 3D generation happen for terrain views on a map ?
reply
flykespice
7 months ago
[-]
Good good, now show me the topology
reply
marianaenhn
7 months ago
[-]
ok, thx
reply
tmilard
7 months ago
[-]
Impressed with Bolt3D AI model ! - Speed of the 3D generation, - Accurate 3D mesh deduction. It's a wonderfull chock.

I agree, this is the way forward: - "some photos" as imput. - Convenient, a camera is in every pocket (Smartphone).

On WE, I have been trying for years to generate 3D from photos.My tool now works well, but there is still this big problem of the time it takes to "recreate" the 3D mesh from photos. I remind that photos are in ... 2D.Not convenient. Here is an example of my Tool's generation : https://free-visit.net/fr/demo01

Here, Bolt3d takes away the 4 hours combersome work into a automatic process. Wahoo !

So Bravo to the Bolt3d team of researchers.

reply