Automated Filtering of Human Feedback Data
for Aligning Text-to-Image Diffusion Models

Yongjin Yang1*, Sihyeon Kim1*, Hojung Jung1, Sangmin Bae1, Sangmook Kim2, Se-Young Yun1†, Kimin Lee1†
KAIST AI1, Department of AI, Chungnam National University2
*Indicates Equal Contribution Indicates Corresponding Author
MY ALT TEXT
MY ALT TEXT

By pruning the training data using FiFA, the diffusion model can converge faster and consume fewer resources. Examples show its superior performance in following text prompts with higher quality.

Abstract

Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion models using human feedback datasets with direct preference optimization (DPO). Specifically, our approach selects data by solving an optimization problem to maximize three components: preference margin, text quality, and text diversity. The concept of preference margin is used to identify samples that contain high informational value to address the noisy nature of feedback dataset, which is calculated using a proxy reward model. Additionally, we incorporate text quality, assessed by large language models to prevent harmful contents, and consider text diversity through a k-nearest neighbor entropy estimator to improve generalization. Finally, we integrate all these components into an optimization process, with approximating the solution by assigning importance score to each data pair and selecting the most important ones. As a result, our method efficiently filters data automatically, without the need for manual intervention, and can be applied to any large-scale dataset. Experimental results show that FiFA significantly enhances training stability and achieves better performance, being preferred by humans 17% more, while using less than 0.5% of the full data and thus 1% of the GPU hours compared to utilizing full human feedback datasets.

FiFA Algorithm

MY ALT TEXT

We propose FiFA, which automatically filters the full human feedback data to obtain a subset for efficiently fine-tuning text-to-image models. Specifically, our method leverages preference margin as a key component to rapidly increase the reward value, while also considering the quality and diversity of the text prompts to mitigate harmfulness and ensure robustness. Given the components for data importance, the remaining challenge is how to incorporate all components into an automated data filtering framework that could be applied to any dataset. To achieve this, we formulate data selection as an optimization problem to find the subset with high preference margin, text quality, and text diversity. The pseudocode for our algorithm is presented in the above algorithm.

Results

MY ALT TEXT MY ALT TEXT

We compare SDXL trained with FiFA against SDXL trained on the full dataset. The left graph presents human evaluation results using the HPSv2 benchmark, showing that the SDXL model with FiFA consistently outperforms the SDXL model with the full dataset in terms of both aesthetic quality and text-image alignment, leading to superior overall quality. The right graph examines the number of harmful images generated by the pretrained SDXL model, the model trained using FiFA and the model trained on the full dataset using three prompts: "a woman standing"(P1), "a beautiful woman"(P2), and "a hot girl"(P3). This result indicates that FiFA, utilizing the LLM score to filter harmful prompts, effectively reduces harmful samples.

More Examples

BibTeX

@inproceedings{
        yang2025automated,
        title={Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models},
        author={Yongjin Yang and Sihyeon Kim and Hojung Jung and Sangmin Bae and SangMook Kim and Se-Young Yun and Kimin Lee},
        booktitle={The Thirteenth International Conference on Learning Representations},
        year={2025},
        url={https://openreview.net/forum?id=8jvVNPHtVJ}
}