A shame that kid was slept on. Allegedly (according to discord) he abandoned this because so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music.
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.
Additionally, you can intentionally introduce view-dependent ghosting artifacts. In other words, if you take images from a certain angle that contain an object, and remove that object for other views, it can produce a lenticular/holographic effect.
Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene?
With M4 that shouldn’t be problem?
I have. Personally, I'm a big fan of hybrid representations like this. An underlying mesh helps with relighting, deformation, and effective editing operations (a mesh is a sparse node graph for an otherwise unstructured set of data).
However, surface-based constraints can prevent thin surfaces (hair/fur) from reconstructing as well as vanilla 3DGS. It might also inhibit certain reflections and transparency from being reconstructed as accurately.
My friend and colleague shared a link with me. Pretty cool to see this trending here. I'm very passionate about Gaussian splatting and developing tools for creatives.
We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.
Couldn’t you just use iphone pros for this?
I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.
EDIT:
I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.
ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.
And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.
A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.
There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.
In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.
Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.
A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.
You just can't see the back of a thing by knowing the shape of the front side with current technologies.
Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?
Hah, for the past day, I've been trying to somehow submit the Helicopter music video / album as a whole to HN. Glad someone figured out the angle was Gaussian.
Because expertise, love, and care cut across all human endeavor, and noticing those things across domains can be a life affirming kind of shared experience.
Super cool to read but can someone eli5 what Gaussian splatting is (and/or radiance fields?) specifically to how the article talks about it finally being "mature enough"? What's changed that this is now possible?
1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)
2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)
3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.
4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).
5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.
ELI5 has meant friendly simplified explanations (not responses aimed at literal five-year-olds) since forever, at least on the subreddit where the concept originated.
Now, perhaps referring to differentiability isn't layperson-accessible, but this is HN after all. I found it to be the perfect degree of simplification personally.
Lol. Def not for 5 year olds but it's about exactly what I needed
How about this:
Take a lot of pictures of a scene from different angles, do some crazy math, and then you can later pretend to zoom and pan the camera around however you want
If one actually tried to explain to a five year old, they can use things like analogy, simile, metaphor, and other forms of rhetoric. This was just a straight-up technical explanation.
How hard is it to handle cases where the starting positions of ellipsoids in 3D is not correct (being too off). How common is such a scenario with the state of the art? E.g., if having only a stereoscopic image pair, the correspondences are often not accurate.
Gaussian splatting is a way to record 3-dimensional video. You capture a scene from many angles simultaneously and then combine all of those into a single representation. Ideally, that representation is good enough that you can then, post-production, simulate camera angles you didn't originally record.
For example, the camera orbits around the performers in this music video are difficult to imagine in real space. Even if you could pull it off using robotic motion control arms, it would require that the entire choreography is fixed in place before filming. This video clearly takes advantage of being able to direct whatever camera motion the artist wanted in the 3d virtual space of the final composed scene.
To do this, the representation needs to estimate the radiance field, i.e. the amount and color of light visible at every point in your 3d volume, viewed from every angle. It's not possible to do this at high resolution by breaking that space up into voxels, those scale badly, O(n^3). You could attempt to guess at some mesh geometry and paint textures on to it compatible with the camera views, but that's difficult to automate.
Gaussian splatting estimates these radiance fields by assuming that the radiance is build from millions of fuzzy, colored balls positioned, stretched, and rotated in space. These are the Gaussian splats.
Once you have that representation, constructing a novel camera angle is as simple as positioning and angling your virtual camera and then recording the colors and positions of all the splats that are visible.
It turns out that this approach is pretty amenable to techniques similar to modern deep learning. You basically train the positions/shapes/rotations of the splats via gradient descent. It's mostly been explored in research labs but lately production-oriented tools have been built for popular 3d motion graphics tools like Houdini, making it more available.
It’s a point cloud where each point is a semitransparent blob that can have a view dependent color: color changes depending on direction you look at them. Allowing to capture reflections, iridescence…
You generate the point clouds from multiple images of a scene or an object and some machine learning magic
I think this tech has become "production-ready" recently due to a combination of research progress (the seminal paper was published in 2023 https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) and improvements to differentiable programming libraries (e.g. PyTorch) and GPU hardware.
For the ELI5, Gaussian splatting represents the scene as millions of tiny, blurry colored blobs in 3D space and renders by quickly "splatting" them onto the screen, making it much faster than computing an image by querying a neural net model like radiance fields.
I found this VFX breakdown of the recent Superman movie to have a great explanation of what it is and what it makes possible: https://youtu.be/eyAVWH61R8E?t=232
tl;dr eli5: Instead of capturing spots of color as they would appear to a camera, they capture spots of color and where they exist in the world. By combining multiple cameras doing this, you can make a 3D works from footage that you can then zoom a virtual camera round.
Really amazing video. Unfortunately this article is like 60% over my head. Regardless, I actually love reading jargon-filled statements like this that are totally normal to the initiated but are completely inscrutable to outsiders.
"That data was then brought into Houdini, where the post production team used CG Nomads GSOPs for manipulation and sequencing, and OTOY’s OctaneRender for final rendering. Thanks to this combination, the production team was also able to relight the splats."
Hi, I'm one of the creators of GSOPs for SideFX Houdini.
The gist is that Gaussian splats can replicate reality quite effectively with many 3D ellipsoids (stored as a type of point cloud). Houdini is software that excels at manipulating vast numbers of points, and renderers (such as Octane) can now leverage this type of data to integrate with traditional computer graphics primitives, lights, and techniques.
Can you put "Gaussing splats" in some kind of real world metaphor so I can understand what it means? Either that or explain why "Gaussian" and why "splat".
I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
Gaussian splatting is a bit like photogrammetry. That is, you can record video or take photos of an object or environment from many angles and reproduce it in 3D. Gaussians have the capability to "fade" their opacity based on a Gaussian distribution. This allows them to blend together in a seamless fashion.
The splatting process is achieved by using gradient descent from each camera/image pair to optimize these ellipsoids (Gaussians) such that the reproduce the original inputs as closely as possible. Given enough imagery and sufficient camera alignment, performed using Structure from Motion, you can faithfully reproduce the entire space.
Happily. Gaussian splats are a technique for 3D images, related to point clouds. They do the same job (take a 3D capture of reality and generate pictures later from any point of view "close enough" to the original).
The key idea is that instead of a bunch of points, it stores a bunch of semi-transparent blobs - or "splats". The transparency increases quickly with distance, following a normal distribution- also known as the "Gaussian distribution."
> I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
Blurring is a convolution or filter operation. You take a small patch of image (5x5 pixels) and you convolve it with another fixed matrix, called a kernel. Convolution says multiply element-wise and sum. You replace the center pixel with the result.
https://en.wikipedia.org/wiki/Box_blur is the simplest kernel - all ones, and divide by the kernel size. Every pixel becomes the average of itself and its neighbors, which looks blurry. Gaussian blur is calculated in an identical way, but the matrix elements follow the "height" of a 2D Gaussian with some amplitude. It results in a bit more smoothing as farther pixels have less influence. Bigger the kernel, more blurrier the result.There are a lot of these basic operations:
How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity. You don’t know what a Gaussian blur does; do you know what a Gaussian is? How deeply do you want to understand?
If you’re curious start with the Wikipedia article and use an LLM to help you understand the parts that don’t make sense. Or just ask the LLM to provide a summary at the desired level of detail.
> How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity.
My bad! I am the author. Gaussian splatting allows you to take a series of normal 2D images or a video and reconstruct very lifelike 3D from it. It’s a type of radiance field, like NeRFs or voxel based methods like Plenoxels!
To be honest it looks like it was rendered in an old version of Unreal Engine. That may be an intentional choice - I wonder how realistic guassian splatting can look? Can you redo lights, shadows, remove or move parts of the scene, while preserving the original fidelity and realism?
The way TV/movie production is going (record 100s of hours of footage from multiple angles and edit it all in post) I wonder if this is the end state. Gaussian splatting for the humans and green screens for the rest?
The aesthetic here is at least partially an intentional choice to lean into the artifacts produced by Gaussian splatting, particularly dynamic (4DGS) splatting. There is temporal inconsistency when capturing performances like this, which are exacerbated by relighting.
That said, the technology is rapidly advancing and this type of volumetric capture is definitely sticking around.
Knowing what I know about the artist in this video this was probably more about the novelty of the technology and the creative freedom it offers rather than it is budget.
For me it felt more like higher detail version of Teardown, the voxel-based 3d demolition game. Sure it's splats and not voxels, but the camera and the lighting give this strong voxel game vibe.
Several of ASAP's video have a lo-fi retro vibe, or specific effects such as simulating stuff like a mpeg a/v corruption, check out A$AP Mob - Yamborghini High (https://www.youtube.com/watch?v=tt7gP_IW-1w)
Hello! I’m Chris Rutledge, the post EP / cg supervisor at Grin Machine. Happy to answer any questions. Glad people are enjoying this video, was so fun to get to play with this technique and help break it into some mainstream production
Awesome work, incredibly well done! What was the process like for setting the direction on use of these techniques with Rakim? Were you basically just trusted to make something great or did they have a lot of opinions on the technicalities?
Tangential, but I've been exploring gaussian splatting as a photographic/artistic medium for a while, and love the expressionistic quality of the model output when deprived of data.
Thanks! I'm using the KIRI Engine in Blender to render splats from my photos (https://github.com/Kiri-Innovation/3dgs-render-blender-addon) and then process the image as I would my photography in Lightroom. There are lots of different photogrammetry tools for generating plys (the point cloud) like PolyCam (https://poly.cam).
Be sure to watch the video itself* - it’s really a great piece of work. The energy is frenetic and it’s got this beautiful balance of surrealism from the effects and groundedness from the human performances.
* (Mute it if you don’t like the music, just like the rest of us will if you complain about the music)
Similarly, the music video for Taylor Swif[0] (another track by A$AP Rocky) is just as surrealistic and weird in the best way possible, but with an eastern european flavor of it (which is obviously intentional and makes sense, given the filming location and being very on-the-nose with the theme).
I can't really respect the artist though, after the assault on a random bystander in Stockholm in 2019 — for which he was convicted. He got off too easy.
The end result is really interesting. As others have pointed out, it looks sort of like it was rendered by an early 2000s game engine. There’s a cohesiveness to the art direction that you just can’t get from green screens and the like. In service of some of the worst music made by human brains, but still really cool tech.
A$ap Rocky's music videos have some really good examples of how AI can be used creatively and not just to generate slop. My favorite is Taylor Swif, it's a super fun video to watch.
This reminds me about how Soulja Boy just used a cracked copy of Fruity Loops and a cheap microphone and recorded all his songs that made him millions.[1]
Edit: Ok this was a big team of VFX producers who did this. Still, prices are coming down dramatically in general, but yeah that idea is a bit of an underfit to this case.
You might consider why this article which has nothing to do with AI as you know it (except for the machine learning aspects of Gaussian splatting), and was produced by a huge team of vfx professionals, has made you think about AI democratising culture (despite the fact that music videos and films have been cheap to make for decades). Don’t just look for opportunities to discuss your favourite talking points.
The texture of Gaussian Splatting always looks off to me. It looks like the entire scene has been textured or has a bad, uniform film grain filter to me. Everything looks a little off in an unpleasing way -- things that should be sharp are aren't, and things that should be blurry are not. It's uncanny valley and not in a good way. I don't get what all the rage is about it and it always looks like really poor B-roll to me.
I think in 2026 it's hard to make a video look this "bad" without it being a clear aesthetic choice, so not sure you could find this video in another setting.
I really disagree with the label brainrot. Brainrot is low-quality garbage with no artistic merit, and very little thought behind its creation, which does nothing but make you briefly pause while scrolling, before scrolling away with no lasting impression being done to your mind (besides increased boredom and inability to focus).
This is clearly an artistic statement, whether you like the art or not. A ton of thought and time was put into it. And people will likely be thinking and discussing this video for some time to come.
How did Rhianna look him in the eyes and say "yes babe, good album, release it, this is what the people wanted after 7 years, it is pleasing to listen to and enjoyable"?
the real question is how much of the art is their own and how much is outside expectations and their reactions to it.
And it's not always giving in to those voices, sometimes it's going in the opposite direction specifically to subvert those voices and expectations even if that ends up going against your initial instincts as an artist.
With someone like A$AP Rocky, there is a lot of money on the line wrt the record execs but even small indie artists playing to only a hundred people a night have to contend with audience expectation and how that can exert an influence on their creativity.
Im sure it was more like, “hey babe, can I get a few millions to go in the studio and experiment/make some art?” And then she was like, “yeah go for it! Make some weird shit.”
If I was in his position I’d probably be doing the same. Why bother with another top hit that pleases the masses.
> One recurring reaction to the video has been confusion. Viewers assume the imagery is AI-generated. According to Evercoast, that couldn’t be further from the truth. Every stunt, every swing, every fall was physically performed and captured in real space. What makes it feel synthetic is the freedom volumetric capture affords.
so basically despite the higher resource requirements like 10TB of data for 30 minutes of footage, the compositing is so much faster and more flexible and those resources can be deleted or moved to long term storage in the cloud very quickly and the project can move on
fascinating
I wouldn't have normally read this and watched the video, but my Claude sessions were already executing a plan
the tl;dr is that all the actors were scanned in a 3D point cloud system and then "NeRF"'d which means to extrapolate any missing data about their transposed 3D model
this was then more easily placed into the video than trying to compose and place 2D actors layer by layer
Gaussian splatting is not NeRF (neural radiance field), but it is a type of radiance field, and supports novel view synthesis. The difference is in an explicit point cloud representation (Gaussian splatting), versus a process that needs to be inferred by a neural network.
Pretty sure most of this could be filmed with a camera drone and preprogrammed flight path...
Did the Gaussian splatting actually make it any cheaper? Especially considering that it needed 50+ fixed camera angles to splat properly, and extensive post-processing work both computationally and human labour, a camera drone just seems easier.
> Pretty sure most of this could be filmed with a camera drone and preprogrammed flight path
This is a “Dropbox is just ftp and rsync” level comment. There’s a shot in there where Rocky is sitting on top of the spinning blades of a helicopter and the camera smoothly transitions from flying around the room to solidly rotating along with the blades, so it’s fixed relative to rocky. Not only would programming a camera drone to follow this path be extremely difficult (and wouldn’t look as good), but just setting up the stunt would be cost prohibitive.
This is just one example of the hundreds you could come up with.
Drones and 2d compositing could do a lot. They would excel in some areas used in the video, require far more resources than this technique in others, and be completely infeasible on a few.
They would look much better in a very "familiar" way. They would have much less of the glitch and dynamic aesthetic that makes this so novel.
If it was achievable, cheaper, and of equal quality then it would have been done that way. Surely it would’ve been done that way a long time ago too. Drone paths have been around a lot longer than this technology.
There’s no proof of your claim and this video is proof of the opposite.
I want to shoutout Nial Ashley (aka Llainwire) for doing this in 2023 as a solo act and doing the visuals himself as well - https://www.youtube.com/watch?v=M1ZXg5wVoUU
A shame that kid was slept on. Allegedly (according to discord) he abandoned this because so many artists reached out to have him do this style of mv, instead of wanting to collaborate on music.
Hi,
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
Try GSOPs yourself: https://github.com/cgnomads/GSOPs (example content included).
I’m fascinated by the aesthetic of this technique. I remember early versions that were completely glitched out and presented 3d clouds of noise and fragments to traverse through. I’m curious if you have any thoughts about creatively ‘abusing’ this tech? Perhaps misaligning things somehow or using some wrong inputs.
There's a ton of fun tricks you can perform with Gaussian splatting!
You're right that you can intentionally under-construct your scenes. These can create a dream-like effect.
It's also possible to stylize your Gaussian splats to produce NPR effects. Check out David Lisser's amazing work: https://davidlisser.co.uk/Surface-Tension.
Additionally, you can intentionally introduce view-dependent ghosting artifacts. In other words, if you take images from a certain angle that contain an object, and remove that object for other views, it can produce a lenticular/holographic effect.
The ghost effect is pretty cool, too! https://www.youtube.com/watch?v=DQGtimwfpIo
Can such plugin be possible for Davinci Resolve, to have merge of scene captured from two iPhones with spatial data, into 3D scene? With M4 that shouldn’t be problem?
Yes: https://irrealix.com/plugin/gaussian-splatting-davinci-resol...
(I'm not the author.)
You can train your own splats using Brush or OpenSplat
Hi David, have you looked into alternatives to 3DGS like https://meshsplatting.github.io/ that promise better results and faster training?
I have. Personally, I'm a big fan of hybrid representations like this. An underlying mesh helps with relighting, deformation, and effective editing operations (a mesh is a sparse node graph for an otherwise unstructured set of data).
However, surface-based constraints can prevent thin surfaces (hair/fur) from reconstructing as well as vanilla 3DGS. It might also inhibit certain reflections and transparency from being reconstructed as accurately.
Great work! I’d love to see a proper BTS or case study.
Stay tuned
I do believe a BTS is being developed.
Random question, since I see your username is green.
How did you find out this was posted here?
Also, great work!
My friend and colleague shared a link with me. Pretty cool to see this trending here. I'm very passionate about Gaussian splatting and developing tools for creatives.
And thank you!
From the article:
>Evercoast deployed a 56 camera RGB-D array
Do you know which depth cameras they used?
We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.
You may have explained this elsewhere, but if not—-what kind of post processing did you do to upscale or refine the realsense video?
Can you add any interesting details on the benchmarking done against the RED camera rig?
Aha: https://www.red.com/stories/evercoast-komodo-rig
So likely RealSense D455.
I was not involved in the capture process with Evercoast, but I may have heard somewhere they used RealSense cameras.
I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.
Couldn’t you just use iphone pros for this? I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.
EDIT: I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.
ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.
And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.
Is there any reason to think https://thebaffler.com/salvos/the-problem-with-music doesn't apply here?
A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.
There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.
In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.
I think it's because they already had proven capture hardware, harvest, and processing workflows.
But yes, you can easily use iPhones for this now.
Looks great by the way, i was wondering if there’s a file format for volumetric video captures
https://developer.apple.com/av-foundation/
https://developer.apple.com/documentation/spatial/
Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.
A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.
You just can't see the back of a thing by knowing the shape of the front side with current technologies.
Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.
Check this project, for example: https://zju3dv.github.io/freetimegs/
Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.
Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).
Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?
Why would they go for the cheapest option?
It was more the point that technology is much cheaper. The company i worked for had completely missed it while trying to develop in house solutions.
Kinect Azure
high-quality 3D content
Would have been nice to see some in the video.
Never did I think I would ever see anything close to related to A$AP on HN. I love this place.
Hah, for the past day, I've been trying to somehow submit the Helicopter music video / album as a whole to HN. Glad someone figured out the angle was Gaussian.
I run a programming company and one of my sales people was surprised to see I liked soundcloud rap. I was like:
What did you expect?
>Classical music?
Nah I like hype, helps when things are slow.
Prokofiev's Alexander Nevsky goes hard if you do want something in the classical world though.
Is he wearing... hair curlers?
That's what one does when they want some fiyah curls.
And nearly a Carti post at the top of HN
One day we’ll see a an Osamason or Xaviersobased post on HN
I'm taking the opportunity to FWAEH in here
What do you mean?
Helicopter had a Carti feature that was pulled but leaked, and a promo photoshoot with the two of them for it.
yeah that had me do a double take lol
Why is that “cool” or desirable?
Because expertise, love, and care cut across all human endeavor, and noticing those things across domains can be a life affirming kind of shared experience.
Perfect comment, but it’s very funny to me that you even needed to say it. Some folks on here talk like moon people who have never met humans before.
Desirable because it’s a rare culture + tooling combo. I’m into both and HN is one of the few places I would see them come together. So yeah, “cool”
Super cool to read but can someone eli5 what Gaussian splatting is (and/or radiance fields?) specifically to how the article talks about it finally being "mature enough"? What's changed that this is now possible?
1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)
2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)
3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.
4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).
5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.
You... you must have been quite some 5 year old.
ELI5 has meant friendly simplified explanations (not responses aimed at literal five-year-olds) since forever, at least on the subreddit where the concept originated.
Now, perhaps referring to differentiability isn't layperson-accessible, but this is HN after all. I found it to be the perfect degree of simplification personally.
Lol. Def not for 5 year olds but it's about exactly what I needed
How about this:
Take a lot of pictures of a scene from different angles, do some crazy math, and then you can later pretend to zoom and pan the camera around however you want
Some things would be literally impossible to properly explain to a 5 year old.
If one actually tried to explain to a five year old, they can use things like analogy, simile, metaphor, and other forms of rhetoric. This was just a straight-up technical explanation.
Thanks.
How hard is it to handle cases where the starting positions of ellipsoids in 3D is not correct (being too off). How common is such a scenario with the state of the art? E.g., if having only a stereoscopic image pair, the correspondences are often not accurate.
Thanks.
Great explanation/simplification. Top quality contribution.
Or: Matrix bullet time with more viewpoints and less quality.
Gaussian splatting is a way to record 3-dimensional video. You capture a scene from many angles simultaneously and then combine all of those into a single representation. Ideally, that representation is good enough that you can then, post-production, simulate camera angles you didn't originally record.
For example, the camera orbits around the performers in this music video are difficult to imagine in real space. Even if you could pull it off using robotic motion control arms, it would require that the entire choreography is fixed in place before filming. This video clearly takes advantage of being able to direct whatever camera motion the artist wanted in the 3d virtual space of the final composed scene.
To do this, the representation needs to estimate the radiance field, i.e. the amount and color of light visible at every point in your 3d volume, viewed from every angle. It's not possible to do this at high resolution by breaking that space up into voxels, those scale badly, O(n^3). You could attempt to guess at some mesh geometry and paint textures on to it compatible with the camera views, but that's difficult to automate.
Gaussian splatting estimates these radiance fields by assuming that the radiance is build from millions of fuzzy, colored balls positioned, stretched, and rotated in space. These are the Gaussian splats.
Once you have that representation, constructing a novel camera angle is as simple as positioning and angling your virtual camera and then recording the colors and positions of all the splats that are visible.
It turns out that this approach is pretty amenable to techniques similar to modern deep learning. You basically train the positions/shapes/rotations of the splats via gradient descent. It's mostly been explored in research labs but lately production-oriented tools have been built for popular 3d motion graphics tools like Houdini, making it more available.
> Gaussian splatting is a way to record 3-dimensional video.
I would say it's a 3D photo, not a 3D video. But there are already extensions to dynamic scenes with movement.
See 4D splatting.
Brain dances!
It’s a point cloud where each point is a semitransparent blob that can have a view dependent color: color changes depending on direction you look at them. Allowing to capture reflections, iridescence…
You generate the point clouds from multiple images of a scene or an object and some machine learning magic
This 2-minute video is a great intro to the topic https://www.youtube.com/watch?v=HVv_IQKlafQ
I think this tech has become "production-ready" recently due to a combination of research progress (the seminal paper was published in 2023 https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) and improvements to differentiable programming libraries (e.g. PyTorch) and GPU hardware.
This is a REALLY good video explaining it. https://www.youtube.com/watch?v=eekCQQYwlgA
https://aras-p.info/blog/2023/09/05/Gaussian-Splatting-is-pr... and for a visual demo of the result https://antimatter15.com/splat/
For the ELI5, Gaussian splatting represents the scene as millions of tiny, blurry colored blobs in 3D space and renders by quickly "splatting" them onto the screen, making it much faster than computing an image by querying a neural net model like radiance fields.
I'm not up on how things have changed recently
I found this VFX breakdown of the recent Superman movie to have a great explanation of what it is and what it makes possible: https://youtu.be/eyAVWH61R8E?t=232
tl;dr eli5: Instead of capturing spots of color as they would appear to a camera, they capture spots of color and where they exist in the world. By combining multiple cameras doing this, you can make a 3D works from footage that you can then zoom a virtual camera round.
I also spoke to the vfx team from Superman on how they achieved the reconstructions! (I’m also the author for the Helicopter article here).
https://radiancefields.com/gaussian-splatting-in-superman
Really amazing video. Unfortunately this article is like 60% over my head. Regardless, I actually love reading jargon-filled statements like this that are totally normal to the initiated but are completely inscrutable to outsiders.
Hi, I'm one of the creators of GSOPs for SideFX Houdini.
The gist is that Gaussian splats can replicate reality quite effectively with many 3D ellipsoids (stored as a type of point cloud). Houdini is software that excels at manipulating vast numbers of points, and renderers (such as Octane) can now leverage this type of data to integrate with traditional computer graphics primitives, lights, and techniques.
Can you put "Gaussing splats" in some kind of real world metaphor so I can understand what it means? Either that or explain why "Gaussian" and why "splat".
I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
Sure!
Gaussian splatting is a bit like photogrammetry. That is, you can record video or take photos of an object or environment from many angles and reproduce it in 3D. Gaussians have the capability to "fade" their opacity based on a Gaussian distribution. This allows them to blend together in a seamless fashion.
The splatting process is achieved by using gradient descent from each camera/image pair to optimize these ellipsoids (Gaussians) such that the reproduce the original inputs as closely as possible. Given enough imagery and sufficient camera alignment, performed using Structure from Motion, you can faithfully reproduce the entire space.
Read more here: https://towardsdatascience.com/a-comprehensive-overview-of-g....
> explain why "Gaussian" and why "splat".
Happily. Gaussian splats are a technique for 3D images, related to point clouds. They do the same job (take a 3D capture of reality and generate pictures later from any point of view "close enough" to the original).
The key idea is that instead of a bunch of points, it stores a bunch of semi-transparent blobs - or "splats". The transparency increases quickly with distance, following a normal distribution- also known as the "Gaussian distribution."
Hence, "Gaussian splats".
> I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
Blurring is a convolution or filter operation. You take a small patch of image (5x5 pixels) and you convolve it with another fixed matrix, called a kernel. Convolution says multiply element-wise and sum. You replace the center pixel with the result.
https://en.wikipedia.org/wiki/Box_blur is the simplest kernel - all ones, and divide by the kernel size. Every pixel becomes the average of itself and its neighbors, which looks blurry. Gaussian blur is calculated in an identical way, but the matrix elements follow the "height" of a 2D Gaussian with some amplitude. It results in a bit more smoothing as farther pixels have less influence. Bigger the kernel, more blurrier the result.There are a lot of these basic operations:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
If you see "Gaussian", it implies the distribution is used somewhere in the process, but splatting and image kernels are very different operations.
For what it's worth I don't think the Wikipedia article on Gaussian Blur is particularly accessible.
How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity. You don’t know what a Gaussian blur does; do you know what a Gaussian is? How deeply do you want to understand?
If you’re curious start with the Wikipedia article and use an LLM to help you understand the parts that don’t make sense. Or just ask the LLM to provide a summary at the desired level of detail.
There's a Corridor Digital video being shared that explains it perfectly. With very little math.
https://youtube.com/watch?v=cetf0qTZ04Y
> How can you expect someone to tailor a custom explanation, when they don’t know your level of mathematical understanding, or even your level of curiosity.
The other two replies did a pretty good job!
My bad! I am the author. Gaussian splatting allows you to take a series of normal 2D images or a video and reconstruct very lifelike 3D from it. It’s a type of radiance field, like NeRFs or voxel based methods like Plenoxels!
Corridor has done some great stuff with Gaussian Splats, I recommend this video for a primer!
https://youtube.com/watch?v=cetf0qTZ04Y
Reminds me of Kurtwood Smith’s piping sales pitch in The Patriot
To be honest it looks like it was rendered in an old version of Unreal Engine. That may be an intentional choice - I wonder how realistic guassian splatting can look? Can you redo lights, shadows, remove or move parts of the scene, while preserving the original fidelity and realism?
The way TV/movie production is going (record 100s of hours of footage from multiple angles and edit it all in post) I wonder if this is the end state. Gaussian splatting for the humans and green screens for the rest?
The aesthetic here is at least partially an intentional choice to lean into the artifacts produced by Gaussian splatting, particularly dynamic (4DGS) splatting. There is temporal inconsistency when capturing performances like this, which are exacerbated by relighting.
That said, the technology is rapidly advancing and this type of volumetric capture is definitely sticking around.
The quality can also be really good, especially for static environments: https://www.linkedin.com/posts/christoph-schindelar-79515351....
I wonder if you are thinking Source engine? I was getting serious skibidi toilet vibes during several parts of this video.
Knowing what I know about the artist in this video this was probably more about the novelty of the technology and the creative freedom it offers rather than it is budget.
For me it felt more like higher detail version of Teardown, the voxel-based 3d demolition game. Sure it's splats and not voxels, but the camera and the lighting give this strong voxel game vibe.
We will be able to have imax level 3D technically today if you feed it the correct data
Yes, they talk about this in the article and that’s exactly what they did.
It wasn't clear to me how much this was intentional vs. being the limits of the technology at the moment.
Several of ASAP's video have a lo-fi retro vibe, or specific effects such as simulating stuff like a mpeg a/v corruption, check out A$AP Mob - Yamborghini High (https://www.youtube.com/watch?v=tt7gP_IW-1w)
Hello! I’m Chris Rutledge, the post EP / cg supervisor at Grin Machine. Happy to answer any questions. Glad people are enjoying this video, was so fun to get to play with this technique and help break it into some mainstream production
Awesome work, incredibly well done! What was the process like for setting the direction on use of these techniques with Rakim? Were you basically just trusted to make something great or did they have a lot of opinions on the technicalities?
Grin Machine knocked this out of the park!
Great job, Chris and crew!
Tangential, but I've been exploring gaussian splatting as a photographic/artistic medium for a while, and love the expressionistic quality of the model output when deprived of data.
https://bayardrandel.com/gaussographs/
Loving this; great work! Do you talk about the process anywhere in more depth?
Thanks! I'm using the KIRI Engine in Blender to render splats from my photos (https://github.com/Kiri-Innovation/3dgs-render-blender-addon) and then process the image as I would my photography in Lightroom. There are lots of different photogrammetry tools for generating plys (the point cloud) like PolyCam (https://poly.cam).
Cool aesthetic!
Be sure to watch the video itself* - it’s really a great piece of work. The energy is frenetic and it’s got this beautiful balance of surrealism from the effects and groundedness from the human performances.
* (Mute it if you don’t like the music, just like the rest of us will if you complain about the music)
Similarly, the music video for Taylor Swif[0] (another track by A$AP Rocky) is just as surrealistic and weird in the best way possible, but with an eastern european flavor of it (which is obviously intentional and makes sense, given the filming location and being very on-the-nose with the theme).
0. https://youtu.be/5URefVYaJrA
Holy shit that’s great. I need to check a few more of his videos.
Watch the video to the very end: the final splat is not a gaussian one.
Direct link to the music video: https://www.youtube.com/watch?v=g1-46Nu3HxQ
Good idea - we'll put that link in the toptext as well. Thanks!
I can't really respect the artist though, after the assault on a random bystander in Stockholm in 2019 — for which he was convicted. He got off too easy.
I would have refused to work on this.
Too bad, but I managed to watch about 30 seconds of the video before getting motion sickness.
Seems like a really cool technology, though.
I wonder if anyone else got the same response, or it's just me.
I loved the video. Didn't get the motion sickness myself.
My wife said the same thing, but it gets better after the intro.
You could also pull a Michel Gondry and do it with practical effects. https://www.youtube.com/watch?v=s5FyfQDO5g0&list=RDs5FyfQDO5...
The end result is really interesting. As others have pointed out, it looks sort of like it was rendered by an early 2000s game engine. There’s a cohesiveness to the art direction that you just can’t get from green screens and the like. In service of some of the worst music made by human brains, but still really cool tech.
Dang, it's been cool watching gaussian splats go from tech demo to real workflow.
For sure!
A$ap Rocky's music videos have some really good examples of how AI can be used creatively and not just to generate slop. My favorite is Taylor Swif, it's a super fun video to watch.
https://www.youtube.com/watch?v=5URefVYaJrA
They really said it’s capturing everything when A$AP Rocky’s Gaussian splatted mouth in that video be looking worse than AI generated video lol
Both of my worlds are colliding with this article. I love reading about how deeply technical products/artifacts get used in art.
This reminds me about how Soulja Boy just used a cracked copy of Fruity Loops and a cheap microphone and recorded all his songs that made him millions.[1] Edit: Ok this was a big team of VFX producers who did this. Still, prices are coming down dramatically in general, but yeah that idea is a bit of an underfit to this case.
[1] https://www.youtube.com/watch?v=f1rjhVe59ek
You might consider why this article which has nothing to do with AI as you know it (except for the machine learning aspects of Gaussian splatting), and was produced by a huge team of vfx professionals, has made you think about AI democratising culture (despite the fact that music videos and films have been cheap to make for decades). Don’t just look for opportunities to discuss your favourite talking points.
Fair point actually. touché.
I really don’t see the connection. A$AP isn’t a noob
Can somebody explain to me what was actually scanned? Only the actors doing movements like push ups, or whole scenes / rooms?
The texture of Gaussian Splatting always looks off to me. It looks like the entire scene has been textured or has a bad, uniform film grain filter to me. Everything looks a little off in an unpleasing way -- things that should be sharp are aren't, and things that should be blurry are not. It's uncanny valley and not in a good way. I don't get what all the rage is about it and it always looks like really poor B-roll to me.
A$AP Rocky’s music videos has been always great.
In another setting, it looks like ass, but lo-fi, glitchy shit is perfectly compatible with hip-hop aesthetic. Good track though.
I think in 2026 it's hard to make a video look this "bad" without it being a clear aesthetic choice, so not sure you could find this video in another setting.
The technology is impressive, but the end result… Weapons-grade brainrot.
I’m curious what other artists end up making with it.
I really disagree with the label brainrot. Brainrot is low-quality garbage with no artistic merit, and very little thought behind its creation, which does nothing but make you briefly pause while scrolling, before scrolling away with no lasting impression being done to your mind (besides increased boredom and inability to focus).
This is clearly an artistic statement, whether you like the art or not. A ton of thought and time was put into it. And people will likely be thinking and discussing this video for some time to come.
The article said it was by design...
"The team also used Blender heavily for layout and previs, converting splat sequences into lightweight proxy caches for scene planning."
How did Rhianna look him in the eyes and say "yes babe, good album, release it, this is what the people wanted after 7 years, it is pleasing to listen to and enjoyable"?
I prefer when artists make music they intrinsically want to make — not what others want them to make.
the real question is how much of the art is their own and how much is outside expectations and their reactions to it.
And it's not always giving in to those voices, sometimes it's going in the opposite direction specifically to subvert those voices and expectations even if that ends up going against your initial instincts as an artist.
With someone like A$AP Rocky, there is a lot of money on the line wrt the record execs but even small indie artists playing to only a hundred people a night have to contend with audience expectation and how that can exert an influence on their creativity.
It seems the numerous leaks and trials took their toll.
I don’t disagree with you—I felt “Tailor Swif,” “DMB,” and “Both Eyes Closed” were all stronger than the tracks that made it onto this album.
But sometimes you’ve gotta ship the project in the state it’s in and move on with your life.
Maybe now he can move forward and start working on something new. And perhaps that project will be stronger.
Im sure it was more like, “hey babe, can I get a few millions to go in the studio and experiment/make some art?” And then she was like, “yeah go for it! Make some weird shit.”
If I was in his position I’d probably be doing the same. Why bother with another top hit that pleases the masses.
Because it was awesome. But also, leaks probably.
> One recurring reaction to the video has been confusion. Viewers assume the imagery is AI-generated. According to Evercoast, that couldn’t be further from the truth. Every stunt, every swing, every fall was physically performed and captured in real space. What makes it feel synthetic is the freedom volumetric capture affords.
No, it’s simply the framerate.
Wow, back to music videos. It’s been years. This is a great one.
so basically despite the higher resource requirements like 10TB of data for 30 minutes of footage, the compositing is so much faster and more flexible and those resources can be deleted or moved to long term storage in the cloud very quickly and the project can move on
fascinating
I wouldn't have normally read this and watched the video, but my Claude sessions were already executing a plan
the tl;dr is that all the actors were scanned in a 3D point cloud system and then "NeRF"'d which means to extrapolate any missing data about their transposed 3D model
this was then more easily placed into the video than trying to compose and place 2D actors layer by layer
Gaussian splatting is not NeRF (neural radiance field), but it is a type of radiance field, and supports novel view synthesis. The difference is in an explicit point cloud representation (Gaussian splatting), versus a process that needs to be inferred by a neural network.
It's not a type of radiance field.
It’s literally the name of gaussian splatting. 3D Gaussian Splatting for Real Time Radiance Fields
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
> and then "NeRF"'d which means to extrapolate any missing data about their transposed 3D model
Not sure if it's you or the original article but that's a slightly misleading summary of NeRFs.
I'm all for the better summary
aren't music videos supposed to have music?
it does, it just doesn't have video
Pretty sure most of this could be filmed with a camera drone and preprogrammed flight path...
Did the Gaussian splatting actually make it any cheaper? Especially considering that it needed 50+ fixed camera angles to splat properly, and extensive post-processing work both computationally and human labour, a camera drone just seems easier.
> Pretty sure most of this could be filmed with a camera drone and preprogrammed flight path
This is a “Dropbox is just ftp and rsync” level comment. There’s a shot in there where Rocky is sitting on top of the spinning blades of a helicopter and the camera smoothly transitions from flying around the room to solidly rotating along with the blades, so it’s fixed relative to rocky. Not only would programming a camera drone to follow this path be extremely difficult (and wouldn’t look as good), but just setting up the stunt would be cost prohibitive.
This is just one example of the hundreds you could come up with.
Drones and 2d compositing could do a lot. They would excel in some areas used in the video, require far more resources than this technique in others, and be completely infeasible on a few.
They would look much better in a very "familiar" way. They would have much less of the glitch and dynamic aesthetic that makes this so novel.
If it was achievable, cheaper, and of equal quality then it would have been done that way. Surely it would’ve been done that way a long time ago too. Drone paths have been around a lot longer than this technology.
There’s no proof of your claim and this video is proof of the opposite.
A drone path would not allow for such seamless transitions, never mind the planning required to nail all that choreography, effects, etc.
This approach is 100% flexible, and I'm sure at least part of the magic came from the process of play and experimentation in post.
Flying a camera drone with such proximity and acceleration would be a safety nightmare.
I think you’re missing the point
Volumetric capture like this allows you to decide on the camera angles in post-production
it gives you flexibility, options
This might be the first time I'm stumbling on Dunning Kruger on HN, no offense.
It's fucking cool. That's why.
This tech is moving along at breakneck pace and now we're all talking about it. A drone video wouldn't have done that.