Skip to yearly menu bar Skip to main content


Poster

Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking

Jikai Zheng · Mingjiang Liang · Shaoli Huang · Jifeng Ning

# 218
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Paper PDF ]
Thu 3 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Recent advancements in transformer-based light-weight object tracking have set new standards across various benchmarks due to their efficiency and effectiveness. Despite these achievements, most current trackers rely heavily on pre-existing object detection architectures without optimizing the backbone network to leverage the unique demands of object tracking. Addressing this gap, we introduce the Feature Extraction and Relation Modeling Tracker (FERMT) - a novel approach that significantly enhances tracking speed and accuracy. At the heart of FERMT is a strategic decomposition of the conventional attention mechanism into four distinct sub-modules within a one-stream tracker. This design stems from our insight that the initial layers of a tracking network should prioritize feature extraction, whereas the deeper layers should focus on relation modeling between objects. Consequently, we propose an innovative, light-weight backbone specifically tailored for object tracking. Our approach is validated through meticulous ablation studies, confirming the effectiveness of our architectural decisions. Furthermore, FERMT incorporates a Dual Attention Unit for feature pre-processing, which facilitates global feature interaction across channels and enriches feature representation with attention cues. Benchmarking on GOT-10k, FERMT achieves a groundbreaking Average Overlap (AO) score of 69.6%, outperforming the leading real-time trackers by 5.6% in accuracy while boasting a 54% improvement in CPU tracking speed. This work not only sets a new standard for state-of-the-art (SOTA) performance in light-weight tracking but also bridges the efficiency gap between fast and high-performance trackers.The code and model will be available soon.

Live content is unavailable. Log in and register to view live content