VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
Xuanyu Zhang*, Weiqi Li*, Shijie Zhao, Junlin Li, Li Zhang, Jian Zhang
We propose a reasoning-style vision-language model VQ-Insight, which accurately performs AIGC video preference comparison, AIGC video multi-dimension scoring, and natural video scoring, accompanied by detailed and reasonable reasoning processes. Our VQ-Insight can be applied to post-training of video generation models and zero-shot content repairing.
git clone https://github.com/xuanyuzhang21/VQ-Insight
bash setup.shcd demo
python demo_vqinsight_score.py \
--video_path "../assets/demo_natural.mp4" \
--video_type naturalcd demo
python demo_vqinsight_score.py \
--video_path "../assets/demo_aigc.mp4" \
--video_type aigccd demo
python demo_vqinsight_comp.py \
--video_a "../assets/demo_comp1.mp4" \
--video_b "../assets/demo_comp2.mp4" \
--model_name_or_path Bytedance/Q-Insight
Download the VisionReward dataset and run the script. The training json is put in ./data.
bash ./src/scripts/run_grpo_video_comp.shDownload the LGVQ dataset and run the script. The training json is put in ./data.
bash ./src/scripts/run_grpo_video_lgvq_aigc.shDownload the LSVQ dataset and run the script. The training json is put in ./data.
bash ./src/scripts/run_grpo_video_lsvq.sh
We appreciate the releasing codes of Video-R1.
If you find the code helpful in your research or work, please cite the following papers and ⭐ the repo:
@article{zhang2025vqinsight,
title={VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning},
author={Zhang, Xuanyu and Li, Weiqi and Zhao, Shijie and Li, Junlin and Zhang, Li and Zhang, Jian},
journal={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
year={2026}
}
