Missing values remain a common challenge for depth data across its wide range of applications, stemming from various causes like incomplete data acquisition and perspective alteration. This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors. Our model features two notable strengths: (1) it demonstrates resilience to depth-deficient regions, providing reliable completion for both continuous areas and isolated points, and (2) it faithfully preserves scale consistency with the conditioned known depth when filling in missing values. Drawing on these advantages, our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion, exceeding current solutions in both numerical performance and visual quality.
In 3D scenes, we start by inpainting the depth of the image inpainted regions from the posed reference views, then unproject the points into the 3D space for optimal initialization, which significantly enhances the quality and speed of the 3D scene inpainting.
Our method substantially improves the process of generating a 3D scene from a single image by eliminating the need for alignment. This advancement effectively mitigates issues of edge disjunction that previously arose from geometric inconsistencies.
Our approach begins by generating a mask for pixels without matches from any source images. These non-matching regions are then refined through DepthLab. Our approach effectively sharpens initial depth from DUST3R, substantially improving Gaussian splatting rendering quality.
Unlike existing methods that are trained and tested on a single dataset, such as NYUv2, our approach achieves comparable results in a zero-shot setting and can deliver even better outcomes with minimal fine-tuning.
First, we aim to discuss potential downstream tasks where our model could be applied, such as 4D scene generation or reconstruction, robotic navigation, editing in VR/AR, and a series of works related to DUST3R. In summary, any task requiring depth estimation that has inherent known information (either partial ground truth obtained through rendering or sensors, or warped depth from a changed camera pose) could be able to leverage our model for more accurate depth estimation, thereby enhancing the results.
Next, we think there are some possible further research directions:
@article{liu2024depthlab,
author = {Zhiheng Liu and Ka Leong Cheng and Qiuyu Wang and Shuzhe Wang and Hao Ouyang and Bin Tan and Kai Zhu and Yujun Shen and Qifeng Chen and Ping Luo},
title = {DepthLab: From Partial to Complete},
journal = {CoRR},
volume = {abs/xxxx.xxxxx},
year = {2024},
}