Skip to yearly menu bar Skip to main content


Poster

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Tianhe Wu · Kede Ma · Jie Liang · Yujiu Yang · Yabin Zhang

# 107
[ ] [ Project Page ] [ Paper PDF ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, the potential they hold as powerful, flexible and text-driven models for Image Quality Assessment (IQA) remains largely unexplored. In this paper, we conduct a comprehensive study of prompting MLLMs for IQA at the system level. Specifically, we first investigate nine system-level prompting methods for MLLMs as the combinations of three standardized testing procedures in psychophysics (i.e., the single-stimulus, double-stimulus, and multiple-stimulus methods) and three popular prompting tricks in natural language processing (i.e., standard, in-context, and chain-of-thought prompting). We then propose a difficult sample selection procedure, taking into account sample diversity and human uncertainty, to further challenge MLLMs coupled with the respective optimal prompting methods identified in the previous step. In our experiments, we assess three open-source and one close-source MLLMs on several visual attributes of image quality (e.g., structural and textural distortions, color differences, and geometric transformations) under both full-reference and no-reference settings, and gain valuable insights into the development of better MLLMs for IQA.

Live content is unavailable. Log in and register to view live content