In this paper, we present a learning-based framework for improved sparse depth video completion. Given a sparse depth map and a color image, our approach makes a cost volume of a certain viewpoint that constructed on depth hypothesis planes. To effectively handle sequential cost volumes of the multiple viewpoints, we introduce a learning-based cost volume fusion framework, namely RayFusion, that effectively leverages the attention mechanism for each pair of overlapped rays in cost volumes. As a result of leveraging feature statistics accumulated over time, our proposed framework consistently outperforms or rivals state-of-the-art approaches on diverse indoor and outdoor datasets, including the KITTI Depth Completion benchmark, VOID Depth Completion benchmark, and ScanNetV2 dataset, using 94.5% fewer network parameters than the state-of-the-art approach, LRRU.
Live content is unavailable. Log in and register to view live content