Skip to yearly menu bar Skip to main content


Towards Open-ended Visual Quality Comparison

Haoning Wu · Hanwei Zhu · Zicheng Zhang · Erli Zhang · Chaofeng Chen · Liang Liao · Chunyi Li · Annan Wang · Wenxiu Sun · Qiong Yan · Xiaohong Liu · Guangtao Zhai · Shiqi Wang · Weisi Lin

[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT


Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into open-ended settings, that 1) can respond to open-range questions on quality comparison; 2) can provide detailed reasonings beyond direct answers. To this end, we propose the Co-Instruct. To train this first-of-its-kind open-source open-ended visual quality comparer, we collect the Co-Instruct-562K dataset, from two sources: (a) LLM-merged single image quality description, (b) GPT-4V “teacher” responses on unlabeled data. Furthermore, to better evaluate this setting, we propose the MICBench, the first benchmark on multi-image comparison for LMMs. We demonstrate that Co-Instruct not-only achieves in average 30% higher accuracy than state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher), on both existing quality-related benchmarks and the proposed MICBench. We will publish our datasets, training scripts and model weights upon acceptance.

Live content is unavailable. Log in and register to view live content