VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning

Installation

We recommand following the instruction in ms-swift and EasyR1 to install the environments.

Also, you could install the RL environments via

git clone https://github.com/DocTron-hub/VinciCoder.git
cd VinciCoder
pip install -e .

Dataset

SFT Dataset

The SFT dataset contains 1.6M data. The dataset is avialable at VinciCoder_SFT_Data. The existing dataset are collected from the following works. We are very grateful for their excellent work and opensource data. We also optimize and generate new data, see the above Huggingface link for all the data source.

Domain	Paper
Chart-to-code	ChartCoder, MSRL, VisCodex
Web-to-HTML	Web2Code, Web2M, VisCodex
Image-to-SVG	UniSVG, StarVector
Image-to-Latex	DaTikZ, MathCoder-VL
Others	CoSyn

RL Dataset

The RL dataset contains 42k data from five domains. The dataset is avialable at VinciCoder_RL_Data.

Training Scripts

SFT stage

Our SFT stage utilize ms-swift, please follow the official document for training.

RL stage

Our RL stage based on Easyr1, please first modify the configurations in ./examples/qwen3vl_8b_vincicder.sh and check the config in ./examples/reward_function/vincicoder.py and run the following scripts

bash ./examples/qwen3vl_8b_vincicder.sh

Contact

For any questions, you can contact 2429527z@gmail.com or open an issue.

Citation

If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:

@article{zhao2025vincicoder,
  title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning},
  author={Zhao, Xuanle and Jiang, Deyang and Zeng, Zhixiong and Chen, Lei and Qiu, Haibo and Huang, Jing and Zhong, Yufeng and Zheng, Liming and Cao, Yilin and Ma, Lin},
  journal={arXiv preprint arXiv:2511.00391},
  year={2025}
}

@article{chen2025breaking,
  title={Breaking the sft plateau: Multimodal structured reinforcement learning for chart-to-code generation},
  author={Chen, Lei and Zhao, Xuanle and Zeng, Zhixiong and Huang, Jing and Zheng, Liming and Zhong, Yufeng and Ma, Lin},
  journal={arXiv preprint arXiv:2508.13587},
  year={2025}
}

@article{zhao2025chartcoder,
  title={Chartcoder: Advancing multimodal large language model for chart-to-code generation},
  author={Zhao, Xuanle and Luo, Xianzhen and Shi, Qi and Chen, Chi and Wang, Shuo and Liu, Zhiyuan and Sun, Maosong},
  journal={arXiv preprint arXiv:2501.06598},
  year={2025}
}

Acknowledgement

The training frameworks are based on the ms-swift and EasyR1. Thanks for these great works and open sourcing!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
examples		examples
fig		fig
scripts		scripts
verl		verl
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning

Installation

Dataset

SFT Dataset

RL Dataset

Training Scripts

SFT stage

RL stage

Contact

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning

Installation

Dataset

SFT Dataset

RL Dataset

Training Scripts

SFT stage

RL stage

Contact

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages