Skip to yearly menu bar Skip to main content


Poster

Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

Yabo Chen · Jiemin Fang · Yuyang Huang · Taoran Yi · Xiaopeng Zhang · Lingxi Xie · Xinggang Wang · Wenrui Dai · Hongkai Xiong · Qi Tian

[ ] [ Project Page ]
Thu 3 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Synthesizing multi-view 3D from one single image is a significant but challenging task. Zero-1-to-3 methods have achieved great success by lifting a 2D latent diffusion model to the 3D scope. The target-view image is generated with a single-view source image and the camera pose as condition information. However, due to the high sparsity of the single input image, Zero-1-to-3 tends to produce geometry and appearance inconsistency across views, especially for complex objects. To tackle this issue, we propose to supply more condition information for the generation model but in a self-prompt way. A cascade framework is constructed with two Zero-1-to-3 models, named Cascade-Zero123, which progressively extract 3D information from the source image. Specifically, several nearby views are first generated by the first model and then fed into the second-stage model along with the source image as generation conditions. With amplified self-prompted condition images, our Cascade-Zero123 generates more consistent novel-view images than Zero-1-to-3. The promotion is significant for various complex and challenging scenes, involving insects, humans, transparent objects, and stacked multiple objects, etc.

Live content is unavailable. Log in and register to view live content