Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections.

Congrong Xu1,2 , Justin Kerr1 , Angjoo Kanazawa1,
1 UC Berkeley, 2 ShanghaiTech University

Abstract

Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS.

Appearance Embedding Interpolation

The latent representation of a scene is in a continuous space, which allows interpolating between two embeddings smoothly captures variation in appearance without affecting 3D geometry.

Real-time Changing Appearance in Viewer

You can change the appearance of the scene to any image real-time in the viewer.



Background Model

The background model is a three-layer mlp that predicts the background of the scene without introducing any 2d model. The background model can be trained effectively along with the gaussian splatting and cleans up the floters in the scene.

In our further experiments, we found that directly optimizing on a set of spherical harmonics coefficients can achieve a similar result while providing a faster training speed.

A new alpha loss term is introduced to prevent the gaussians from occupying the background.

Side-by-Side Comparisons
(Left: without background model, Right: with background model)

More Accurate Depth Rendering

The background model also provides a more accurate depth rendering of the scene.

Splatfacto

With Background Model

Model Components

Pipeline

We begin by predicting the color of each Gaussian using the Appearance Model. These Gaussians are then rasterized to generate the foreground objects. While Background Model predicts the background given ray directions. The foreground and background are merged using alpha blending to produce the final image. This final image is compared with the masked ground truth image and then processed through the Robust Mask to update the model parameters.