Wednesday, November 6, 2024

NVIDIA’s instant NeRF AI technology renders 3D scenes from 2D images in seconds

In context: Nvidia played with NeRFs. No, they did not shoot each other with foam arrows. The term NeRF is an acronym for Neural Radiance Field. It is a technology that uses artificial intelligence to create a 3D scene from a set of still images (reverse rendering). Depending on the depth required, it usually takes hours or days to see results.

Nvidia’s Artificial Intelligence Research Branch is working on reverse rendering and has developed a neural radiation field called Instant NeRF because it can render a 3D scene 1,000 times faster than other NeRF technologies. The AI ​​model needs only a few seconds to train on a few dozen still images taken from multiple angles, and then a few more tens of milliseconds to render a 3D rendering of the scene.

Since the process is the opposite of taking a Polaroid — that is, instantly converting a 3D scene into a 2D image — Nvidia recreated an image of Andy Warhol using Polaroid. This week the research team submitted a demo of NeRF’s immediate results to the Nvidia GTC (below).

Nvidia said, “NeRF can be used to create avatars or scenes of virtual worlds, to capture video conference participants and their 3D environments, or to reconstruct scenes for digital 3D maps.” “Collecting data to feed NeRF is a bit like being a red carpet photographer trying to capture a celebrity’s outfit from all angles — the neural network requires a few dozen images taken from multiple locations around the scene, as well as the camera position for each of those shots.”

NeRF creates a 3D image from these dozens of angles, filling in the blanks where necessary. It can even make up for blockages. For example, if an object obscures the view of the subject in one of the frames, the AI ​​can still fill in that angle even if it cannot see the subject well or at all.

See also  Twitch introduces a new category so live broadcasters can show themselves in the shower

The only weakness of this technique is related to moving objects.

“In a scene that includes people or other moving elements, the faster these shots are, the better,” Nvidia said. “If there is a lot of movement during the process of capturing the 2D image, the 3D scene generated by AI will be blurry.”

For more technical details, check out the Nvidia blog. You can also watch the rest of Jensen Huang’s GTC keyword on YouTube.

Latest news
Related news