Multi-View Stereo by Temporal Nonparametric Fusion


Problem: Accurate, real-time dense depth estimation from monocular sequence

Idea: Similar views should have similar representations in latent-space

Solution: Soft-constrain bottleneck layer of depth estimation network using Gaussian Procceses (GPs) with appropriate pose-kernel

First, we go into the problem of estimating depth for a single view (using multiview constraints in this case)

Network Architecture for Single Frame Depth Estimation

Exact same architecture as in MVDepthNet: Real-time Multiview Depth Estimation Neural Network

Problem: Leverage multiview geometry in real-time dense depth network (assuming known poses of images from odometry source)

Idea: Cost volumes can directly encode geometric constraints so that networks don't have to

Solution: Feed cost volume along with input image into network

Methods prior to this work:

Cost Volume Construction

Network Architecture

Feed RGB reference frame (depth 3) and cost volume (depth 64) into encoder-decoder network with skip connections

Data augmentation is now straightforward because all inputs and ground-truth are with respect to the same frame

Notable Ablation Studies

Example of how adding RGB image to inputs provides finer details as compared to cost-volume alone:

Impact of number of auxiliary frames on accuracy of depth estimation:


Pose-Kernel Gaussian Process Prior

Gaussian Process Reminder

Figure generated using https://distill.pub/2019/visual-exploration-gaussian-processes/

Pose Kernel

Gaussian Process Formulation

Graphical model overview (online method):


Batch method can account for relationships between all poses while online method only has linear complexity

Batch (Offline)



Generally results in improved results

Mistakes made early in online method can take more observations to correct since they are propagated forward

Comparison of different kernels on results

This method with 2 frames outperforms other methods with 5 frames


Online Method Limitations


Graphical Models and Refinement of Latent Space

Interesting to think about ways of utilizing latent space in sequential estimation

Uncertainty Estimation