Text-driven 3D texturing requires the generation of high-fidelity texture that conforms to given geometry and description. Recently, the high-quality text-to-image generation ability of 2D diffusion model has significantly promoted this task, by converting it into a texture optimization process guided by multi-view synthesized images, where the generation of high-quality and multi-view consistency images becomes the key issue. State-of-the-art methods achieve the consistency between different views by treating image generation on a novel view as image inpainting conditioned on the texture generated by previously views. However, due to the accumulated semantic divergence of local inpainting and the occlusion between object parts on sparse views, these inpainting-based methods often fail to deal with long-range texture consistency. To address these, we present P3G, a texturing approach based on learned Pseudo 3D Guidance. The key idea of P3G is to first learn a coarse but consistent texture, to serve as a global semantics guidance for encouraging the consistency between images generated on different views. To this end, we incorporate pre-trained text-to-image diffusion models and multi-view optimization to achieve propagating accurate semantics globally for leaning the guidance, and design an efficient framework for high-quality and multi-view consistent image generation that integrates the learned semantic guidance. Quantitative and qualitative evaluation on variant 3D shapes demonstrates the superiority of our P3G on both consistency and overall visual quality.
Live content is unavailable. Log in and register to view live content