World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce \textit{DriveDreamer}, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, \textit{DriveDreamer} acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. Extensive experiments are conducted to verify that \textit{DriveDreamer} empowers both driving video generation and action prediction, faithfully capturing real-world traffic constraints. Furthermore, videos generated by \textit{DriveDreamer} significantly enhance the training of driving perception methods.
Live content is unavailable. Log in and register to view live content