Multimodal Large Language Models (MLLMs) excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIF-V-Bias, a debiased preference optimization dataset, and a Noise-Aware Preference Optimization (NaPO) algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise-robust Mean Absolute Error (MAE) with the Binary Cross-Entropy (BCE) in Direct Preference Optimization (DPO) by a negative Box-Cox transformation, and dynamically adjust the algorithm’s noise robustness based on the evaluated noise levels in the data. Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.
Examples of different types of modality-biased responses and their preferred counterparts. Left: The model relies excessively on prior knowledge, assuming a bear is brown while overlooking the image, which shows a polar bear. Right: Although the model answers the question correctly, it provides unnecessary image details that are irrelevant to the question.
First, biased responses are constructed by using masking to guide the model toward over-relying on prompts and generating responses based on the base model. Next, NaPO is applied for noise-robust preference optimization to counteract noise in automatically constructed data, dynamically assessing data noise levels to calculate NaPO’s noise robustness coefficient q. Here we assumed that the original data is of high quality, so DPO is used to train on it directly.
Comparison of different loss functions. Orange is the BCE, green is the MAE, and other is the NaPO in different q values.
We evaluated our method based on LLaVA-v1.5-7B on bias and hallucination benchmarks, using DPO and MDPO as primary comparisons. Due to differences in training data, model scale, and training strategies, we included additional results for reference. Our method showed an average improvement of 19% on the bias benchmark and a notable reduction in hallucinations across benchmarks.
@article{zhang2025debiasing,
title={Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization},
author={Zhang, Zefeng and Tang, Hengzhu and Sheng, Jiawei and Zhang, Zhenyu and Ren, Yiming and Li, Zhenyang and Yin, Dawei and Ma, Duohe and Liu, Tingwen},
journal={arXiv preprint arXiv:2503.17928},
year={2025}
}