Skip to yearly menu bar Skip to main content


Poster

VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition

Ahmad Khaliq · Ming Xu · Stephen Hausler · Michael J Milford · Sourav Garg

# 111
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Paper PDF ]
Wed 2 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Visual Place Recognition (VPR) is a crucial component of many visual localization pipelines for embodied agents. VPR is often achieved by jointly learning local features and an aggregation method. The current state-of-the-art VPR methods rely on VLAD aggregation, which can be trained to learn a weighted contribution of features through their soft assignment to cluster centers. However, this process has two key limitations. Firstly, the feature-to-cluster weighting does not account for over-represented repetitive structures within a cluster, e.g., shadows or window panes; this phenomenon is also referred to as the `burstiness' problem, classically solved by discounting repetitive features before aggregation. Secondly, feature to cluster comparisons are compute-intensive for state-of-the-art image encoders with high-dimensional local features. This paper addresses these limitations by introducing VLAD-BuFF with two novel contributions: i) a self-similarity based feature discounting mechanism to learn {Bu}rst-aware features within end-to-end VPR training, and ii) {F}ast {F}eature aggregation} by reducing local feature dimensions through a learnable projection initialized through a PCA transform. We benchmark our method on 9 public datasets, where VLAD-BuFF sets a new state of the art and achieves perfect recall on St Lucia for the first time in VPR research. Our method is able to maintain its high recall even for 12x reduced local feature dimensions, thus enabling fast feature aggregation without compromising on recall. Through additional qualitative studies, we show how our proposed weighting method effectively downweights the non-distinctive features. We will make the source code publicly available.

Live content is unavailable. Log in and register to view live content