Skip to yearly menu bar Skip to main content


Poster

Learning Representations of Satellite Images From Metadata Supervision

Jules Bourcier · Gohar Dashyan · Karteek Alahari · Jocelyn Chanussot

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Self-supervised learning is increasingly applied to Earth observation problems that leverage satellite and other remotely sensed data. Within satellite imagery, metadata such as time and location often hold significant semantic information that improves scene understanding. In this paper, we introduce Satellite Metadata-Image Pretraining (SatMIP), a new approach for harnessing metadata in the pretraining phase through a flexible and unified multimodal learning objective. SatMIP represents metadata as textual captions and aligns images with metadata in a shared embedding space by solving a metadata-image contrastive task. Our model learns a non-trivial image representation that can effectively handle recognition tasks. We further enhance this model by combining image self-supervision and metadata supervision, introducing SatMIPS. As a result, SatMIPS improves over its image-image pretraining baseline, SimCLR, and accelerates convergence. Comparison against four recent contrastive and masked autoencoding-based methods for remote sensing also highlight the efficacy of our approach. Furthermore, we find that metadata supervision yields better scalability to larger backbones, and more robust hierarchical pretraining.

Live content is unavailable. Log in and register to view live content