Skip to content

DocTron-hub/VinciCoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning

🤗 Mode Collections 🤗 Dataset Collections 📑 Paper (arXiv:2511.00391) 🤖 机器之心

Installation

We recommand following the instruction in ms-swift and EasyR1 to install the environments.

Also, you could install the RL environments via

git clone https://github.com/DocTron-hub/VinciCoder.git
cd VinciCoder
pip install -e .

Dataset

data_construct

SFT Dataset

The SFT dataset contains 1.6M data. The dataset is avialable at VinciCoder_SFT_Data. The existing dataset are collected from the following works. We are very grateful for their excellent work and opensource data. We also optimize and generate new data, see the above Huggingface link for all the data source.

Domain Paper
Chart-to-code ChartCoder, MSRL, VisCodex
Web-to-HTML Web2Code, Web2M, VisCodex
Image-to-SVG UniSVG, StarVector
Image-to-Latex DaTikZ, MathCoder-VL
Others CoSyn

RL Dataset

The RL dataset contains 42k data from five domains. The dataset is avialable at VinciCoder_RL_Data.

Training Scripts

SFT stage

Our SFT stage utilize ms-swift, please follow the official document for training.

RL stage

virl Our RL stage based on Easyr1, please first modify the configurations in ./examples/qwen3vl_8b_vincicder.sh and check the config in ./examples/reward_function/vincicoder.py and run the following scripts

bash ./examples/qwen3vl_8b_vincicder.sh

Contact

For any questions, you can contact 2429527z@gmail.com or open an issue.

Citation

If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:

@article{zhao2025vincicoder,
  title={VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning},
  author={Zhao, Xuanle and Jiang, Deyang and Zeng, Zhixiong and Chen, Lei and Qiu, Haibo and Huang, Jing and Zhong, Yufeng and Zheng, Liming and Cao, Yilin and Ma, Lin},
  journal={arXiv preprint arXiv:2511.00391},
  year={2025}
}

@article{chen2025breaking,
  title={Breaking the sft plateau: Multimodal structured reinforcement learning for chart-to-code generation},
  author={Chen, Lei and Zhao, Xuanle and Zeng, Zhixiong and Huang, Jing and Zheng, Liming and Zhong, Yufeng and Ma, Lin},
  journal={arXiv preprint arXiv:2508.13587},
  year={2025}
}

@article{zhao2025chartcoder,
  title={Chartcoder: Advancing multimodal large language model for chart-to-code generation},
  author={Zhao, Xuanle and Luo, Xianzhen and Shi, Qi and Chen, Chi and Wang, Shuo and Liu, Zhiyuan and Sun, Maosong},
  journal={arXiv preprint arXiv:2501.06598},
  year={2025}
}

Acknowledgement

The training frameworks are based on the ms-swift and EasyR1. Thanks for these great works and open sourcing!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages