Skip to yearly menu bar Skip to main content


Poster

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

Fu-Yun Wang · Zhaoyang Huang · Qiang Ma · Guanglu Song · Xudong LU · Weikang Bian · Yijin Li · Yu Liu · Hongsheng LI

# 229
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Paper PDF ]
Wed 2 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract: Although video generation has made great progress in capacity and controllability and is gaining increasing attention, currently available video generation models still make minimal progress in the video length they can generate. Due to the lack of well-annotated long video data, high training/inference cost, and flaws in the model designs, current video generation models can only generate videos of $2 \sim 4$ seconds, greatly limiting their applications and the creativity of users. We present ZoLA, a zero-shot method for creative long animation generation with short video diffusion models and even with short video consistency models~(a new family of generative models known for the fast generation with top-performing quality). In addition to the extension for long animation generation~(dozens of seconds), ZoLA as a zero-shot method, can be easily combined with existing community adapters~(developed only for image or short video models) for more innovative generation results, including control-guided animation generation/editing, motion customization/alternation, and multi-prompt conditioned animation generation, etc. And, importantly, all of these can be done with commonly affordable GPU~(12 GB for 32-second animations) and inference time~(90 seconds for denoising 32-second animations with consistency models). Experiments validate the effectiveness of ZoLA, bringing great potential for creative long animation generation.

Live content is unavailable. Log in and register to view live content