Skip to yearly menu bar Skip to main content


Poster

Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery

Andy V Huynh · Lauren Gillespie · Jael Lopez-Saucedo · Claire Tang · Rohan Sikand · Moisés Expósito-Alonso

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Tue 1 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Multimodal image-text contrastive learning has shown that joint representations can be learned across modalities. Here, we show how leveraging multiple views of image data with contrastive learning can improve downstream fine-grained classification performance for species recognition, even when one view is absent. We propose ContRastive Image-remote Sensing Pre-training (CRISP)—a new pre-training task for ground-level and aerial image representation learning of the natural world—and introduce Nature Multi-View (NMV), a dataset of natural world imagery including >3 million ground-level and aerial image pairs for over 6,000 plant taxa across the ecologically diverse state of California. The NMV dataset and accompanying material are available at hf.co/datasets/andyvhuynh/NatureMultiView.

Live content is unavailable. Log in and register to view live content