Poster

Learning Multimodal Latent Generative Models with Energy-Based Prior

Shiyu Yuan ⋅ Jiali Cui ⋅ Hanao Li ⋅ Tian Han

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Paper PDF [ Supplemental]

Abstract

Multimodal models have gained increasing popularity recently. Many works have been proposed to learn the representations for different modalities. The representation can learn shared information from these domains, leading to increased and coherent joint and cross-generation. However, these works mainly considered standard Gaussian or Laplacian as their prior distribution. It can be challenging for the uni-modal and non-informative distribution to capture all the information from multiple data types. Meanwhile, energy-based models (EBM) have shown their effectiveness in multiple tasks due to their expressiveness and flexibility. But its capacity has yet to be discovered for the multimodal generative models. In this paper, we propose a novel framework to train multimodal latent generative models together with the energy-based models. The proposed method can lead to more expressive and informative prior which can better capture the information within multiple modalities. Our experiments showed that our model is effective and can increase generation coherence and latent classification for different multimodal datasets.

Chat is not available.