Skip to yearly menu bar Skip to main content


Poster

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

Jae Soon Baik · In Young Yoon · Kun Hoon Kim · Jun Won Choi

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Deep neural networks have demonstrated remarkable advancements in various fields using large, well-annotated datasets. However, real-world data often exhibit a long-tailed distribution and label noise, significantly degrading generalization performance. Recent studies to address these issues have focused on sample selection methods that estimate the centroid of each class based on a high-confidence sample set within the corresponding target class. However, these methods are limited because they use only a small portion of the training dataset for class centroid estimation due to long-tailed distributions and noisy labels. In this study, we introduce a novel feature-based sample selection method called Distribution-aware Sample Selection (DaSS). Specifically, DaSS leverages model predictions to incorporate features from not only the target class but also various other classes, producing class centroids. By integrating this approach with temperature scaling, we can adaptively exploit informative training data, resulting in class centroids that more accurately reflect the true distribution under long-tailed noisy scenarios. Moreover, we propose confidence-aware contrastive learning to obtain a balanced and robust representation under long-tailed distributions and noisy labels. This method employs semi-supervised balanced contrastive loss for high-confidence labeled samples to mitigate class bias by leveraging trustworthy label information alongside mixup-enhanced instance discrimination loss for low-confidence labeled samples to improve their representation degraded by noisy labels. Comprehensive experimental results on CIFAR and real-world noisy-label datasets demonstrate the effectiveness and superior performance of our method over baselines.

Live content is unavailable. Log in and register to view live content