In monocular depth estimation, it is challenging to acquire a large amount of depth-annotated training data, which leads to a reliance on synthetic datasets. However, the inherent discrepancies between the synthetic environment and the real-world result in a domain shift and sub-optimal performance. In this paper, we introduces SEDiff which leverages a diffusion-based generative model to extract essential structural information for accurate depth estimation. SEDiff wipes out the domain-specific components in the synthetic data and enables structural-consistent image transfer to mitigate the performance degradation due to the domain gap. Extensive experiments demonstrate the superiority of SEDiff over state-of-the-art methods in various scenarios for domain-adaptive depth estimation.
Live content is unavailable. Log in and register to view live content