Skip to yearly menu bar Skip to main content


Poster

Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Tao Li · Weisen Jiang · Fanghui Liu · Xiaolin Huang · James Kwok

# 21
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Paper PDF ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by ``model soups''~\cite{wortsman2022model} via exploring various hyperparameter configurations. The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned models simultaneously, and (ii) a large computational graph encompassing all fine-tuned models. In this paper, we propose Memory Efficient Hyperplane Learned Soup (MEHL-Soup) to tackle this issue by formulating the learned soup as a hyperplane optimization problem and introducing block coordinate gradient descent to learn the mixing coefficients. At each iteration, MEHL-Soup only needs to load a few fine-tuned models and build a computational graph with one combined model. We further extend MEHL-Soup to MEHL-Soup+ in a layer-wise manner. Experiments on various ViT models and data sets show that MEHL-Soup(+) outperforms Learned-Soup(+) in terms of test accuracy, and also has more than 13x reduction in memory. Moreover, MEHL-Soup(+) can be run on a single GPU and achieves over 9x reduction in soup construction time compared with the Learned-Soup.

Live content is unavailable. Log in and register to view live content