Large-scale data is crucial for learning realistic and capable driving policies. However, it is impractical to naively scale dataset size with real-data alone. The majority of real driving data is uninteresting for learning, and collecting more to cover the long-tail of scenarios is expensive and potentially unsafe. We propose to use asymmetric self-play to scale learning beyond real world data, with additional challenging, solvable, and realistic synthetic scenarios. In particular, we design two agents—a teacher and a student—with asymmetric objectives so that the teacher learns to propose scenarios that itself can pass but the student fails, and the student learns to solve those scenarios. When applied to traffic simulation, our approach learns realistic policies with significantly lower collision rates across urban, highway, and long-tail scenarios. Our approach also zero-shot transfers to generate more effective training data for learning end-to-end autonomy policies, significantly outperforming alternatives like training on adversarial scenarios or real data alone.
Live content is unavailable. Log in and register to view live content