We recommand following the instruction in ms-swift and EasyR1 to install the environments.
Also, you could install the RL environments via
git clone https://github.com/DocTron-hub/VinciCoder.git
cd VinciCoder
pip install -e .
The SFT dataset contains 1.6M data. The dataset is avialable at VinciCoder_SFT_Data. The existing dataset are collected from the following works. We are very grateful for their excellent work and opensource data. We also optimize and generate new data, see the above Huggingface link for all the data source.
| Domain | Paper |
|---|---|
| Chart-to-code | ChartCoder, MSRL, VisCodex |
| Web-to-HTML | Web2Code, Web2M, VisCodex |
| Image-to-SVG | UniSVG, StarVector |
| Image-to-Latex | DaTikZ, MathCoder-VL |
| Others | CoSyn |
The RL dataset contains 42k data from five domains. The dataset is avialable at VinciCoder_RL_Data.
Our SFT stage utilize ms-swift, please follow the official document for training.
Our RL stage based on Easyr1, please first modify the configurations in ./examples/qwen3vl_8b_vincicder.sh and check the config in ./examples/reward_function/vincicoder.py and run the following scripts
bash ./examples/qwen3vl_8b_vincicder.sh
For any questions, you can contact 2429527z@gmail.com or open an issue.
If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:
@article{zhao2025vincicoder,
title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning},
author={Zhao, Xuanle and Jiang, Deyang and Zeng, Zhixiong and Chen, Lei and Qiu, Haibo and Huang, Jing and Zhong, Yufeng and Zheng, Liming and Cao, Yilin and Ma, Lin},
journal={arXiv preprint arXiv:2511.00391},
year={2025}
}
@article{chen2025breaking,
title={Breaking the sft plateau: Multimodal structured reinforcement learning for chart-to-code generation},
author={Chen, Lei and Zhao, Xuanle and Zeng, Zhixiong and Huang, Jing and Zheng, Liming and Zhong, Yufeng and Ma, Lin},
journal={arXiv preprint arXiv:2508.13587},
year={2025}
}
@article{zhao2025chartcoder,
title={Chartcoder: Advancing multimodal large language model for chart-to-code generation},
author={Zhao, Xuanle and Luo, Xianzhen and Shi, Qi and Chen, Chi and Wang, Shuo and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2501.06598},
year={2025}
}
The training frameworks are based on the ms-swift and EasyR1. Thanks for these great works and open sourcing!
