Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS.
The latent representation of a scene is in a continuous space, which allows interpolating between two embeddings smoothly captures variation in appearance without affecting 3D geometry.
You can change the appearance of the scene to any image real-time in the viewer.
The background model is a three-layer mlp that predicts the background of the scene without introducing any 2d model. The background model can be trained effectively along with the gaussian splatting and cleans up the floters in the scene.
In our further experiments, we found that directly optimizing on a set of spherical harmonics coefficients can achieve a similar result while providing a faster training speed.
A new alpha loss term is introduced to prevent the gaussians from occupying the background.
The background model also provides a more accurate depth rendering of the scene.
We begin by predicting the color of each Gaussian using the Appearance Model. These Gaussians are then rasterized to generate the foreground objects. While Background Model predicts the background given ray directions. The foreground and background are merged using alpha blending to produce the final image. This final image is compared with the masked ground truth image and then processed through the Robust Mask to update the model parameters.
There are many excellent works that have also explored Gaussians in the wild. Check them out!
WildGaussians: 3D Gaussian Splatting in the Wild
Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections
Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians