Poster
CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
Jiarui Sun · Girish Chowdhary
# 242
Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in latent space, which does not preserve motion’s spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion's motion predictor starts with a Transformer-based network for initial reconstruction of corrupted motion. Then, a graph convolutional network (GCN) is employed to refine the prediction considering past observations in the discrete cosine transformation (DCT) space. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motion, while maintaining an appropriate level of diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods in both accuracy and fidelity, achieving at least a 35% relative improvement in fidelity metrics, while demonstrating superior robustness. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining an appropriate level of diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods in both accuracy and fidelity, achieving at least a 35% relative improvement in fidelity metrics, while demonstrating superior robustness.
Live content is unavailable. Log in and register to view live content