Skip to content

OoDBag/VisTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

VisTA

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection


🎯 Overview

VisTA is a reinforcement learning framework designed to enhance visual tool selection capabilities in multimodal AI systems. Our approach focuses on training agents to intelligently select and utilize appropriate visual tools for complex reasoning tasks.


πŸ“‹ Environment Setup

Installation

conda create -n vista python=3.11
conda activate vista

pip install -e ".[dev]"
pip install wandb==0.18.3
pip install qwen_vl_utils torchvision
pip install flash-attn --no-build-isolation
pip install vllm==0.7.2

pip install git+https://github.com/huggingface/transformers.git@336dc69d63d56f232a183a3e7f52790429b871ef

πŸ“Š Dataset Support

  • Currently Supported

  • Upcoming Support

  • Additional visual reasoning datasets with unified tooling interface (coming soon)


πŸ”§ Training

Tool Selection Model Training

To train the visual tool selection model on ChartQA:

cd src/r1-v
./run_grpo.sh

πŸ§ͺ Inference and Evaluation

Generate Tool Predictions

Update the model path in model_name_or_path inside run_grpo_test.sh, then execute:

./run_grpo_test.sh

Evaluate Tool-Based Reasoning

Run the following commands to evaluate:

python test_chartqa_gpt.py
python relax_test.py

πŸ“š Citation

If you use VisTA in your research, please cite:

@misc{huang2025visualtoolagentvistareinforcementlearning,
  title={VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection}, 
  author={Zeyi Huang and Yuyang Ji and Anirudh Sundara Rajan and Zefan Cai and Wen Xiao and Junjie Hu and Yong Jae Lee},
  year={2025},
  eprint={2505.20289},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2505.20289},
}

About

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •