Poster

Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection

Hu Cao · Zehua Zhang · Yan Xia · Xinyi Li · Jiahao Xia · Guang Chen · Alois C. Knoll

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Project Page Paper PDF [ Slides] [ Poster] [ Supplemental]

Abstract

In frame-based vision, object detection faces substantial performance degradation under challenging conditions due to the limited sensing capability of conventional cameras. Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. However, effectively fusing two heterogeneous modalities remains an open issue. In this work, we propose a novel hierarchical feature refinement network for event-frame fusion. The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module. In the initial phase, the bidirectional cross-modality interaction (BCI) part is used to facilitate information bridging from two distinct sources. Subsequently, the features are further refined by aligning the channel-level mean and variance in the two-fold adaptive feature refinement (TAFR) part. We conducted extensive experiments on two benchmarks: the low-resolution PKU-DDD17-Car dataset and the high-resolution DSEC dataset. Experimental results show that our method outperforms the second-best method by an impressive margin of 8.0% on the DSEC dataset. Furthermore, we introduced 15 different corruption types to the frame images to assess the model's robustness. The results reveal that the proposed method exhibits significantly better robustness (69.5% versus 38.7%) compared to using frames only. The code will be available.

Chat is not available.