CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model

TL;DR

We propose CoreMatching, a co-adaptive sparse inference framework, which leverages the synergy between token and neuron sparsity to enhance inference efficiency. For the first time, we theoretically prove why token selection metrics based on angle cosin similarity are superior to metrics based on attention scores. On the NVIDIA Titan Xp, it achieved 5× FLOPs reduction and a 10× overall speedup.

Paper Link: https://arxiv.org/abs/2505.19235

Schematic Diagram of CoreMatching. In the Pre-filling stage, CoreMatching calculates Core Neurons in the FFN block based on the activation. Core Neurons are the most frequently activated group of neurons. Afterwards, CoreMatching matches the neurons activated by different tokens with the core neurons, and selects a group of tokens with the largest intersection as the Core Tokens. Only the Core Tokens are passed to the subsequent layers. During the decoding stage, the model only uses Core Neurons for calculations, and there are only core tokens in the kv cache. CoreMatching achieves comprehensive acceleration for inference of VLMs.

The current release version includes:

✅ Fast Inference: You can enter any image and question and get results fast with Corematching.

✅ Visualization of Core Tokens: We provide simple Jupyter notebooks to reproduce the visualization results in our paper.

✅ Performance Evaluation: Our code architecture is based on LLaVA. You can implement performance evaluation on gqa, mm-vet, mmbench, MME, pope, scienceqa, seed_bench, textvqa, vizwiz and vqav2 through simple code.

Install

Clone the repo and navigate to corematching:

git clone https://github.com/wangqinsi1/corematching.git
cd corematching

Set up environment:

conda create -yn corematching python=3.10
conda activate corematching
pip install -e .

Fast Inference

We integrate Corematching in transformers/models/llama. You can directly input any picture and question to implement fast inference and get the answer.

python inference.py  --image [URL/PATH OF IMAGE]   --question [QUESTION]

Example:

python inference.py  --image "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"   --question "What color clothes is the rabbit wearing?"

Visualization of Core Tokens

We provide simple Jupyter files in notebooks/ to reproduce the visualization results in our paper. Corematching can dynamically capture different parts of tokens according to different problems.

Core token under different inputs. The left is the schematic diagram of the maximum geometric distance method to select the threshold. The right side is the core token retained under the distribution of the corresponding image above.

Evaluation

You can quickly start evaluating the task performance. Our code is built on the LLaVA-v1.5 repository, so you can follow the instructions of them exactly to perform evaluation.

VQAv2

Download test2015 and put it under ./playground/data/eval/vqav2.
Multi-GPU inference.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/vqav2.sh

Submit the results to the evaluation server: ./playground/data/eval/vqav2/answers_upload.

GQA

Download the data and evaluation scripts following the official instructions and put under ./playground/data/eval/gqa/data. You may need to modify eval.py as this due to the missing assets in the GQA v1.2 release.
Multi-GPU inference.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/gqa.sh

VisWiz

Download test.json and extract test.zip to test. Put them under ./playground/data/eval/vizwiz.
Single-GPU inference.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/vizwiz.sh

Submit the results to the evaluation server: ./playground/data/eval/vizwiz/answers_upload.

ScienceQA

Under ./playground/data/eval/scienceqa, download images, pid_splits.json, problems.json from the data/scienceqa folder of the ScienceQA repo.
Single-GPU inference and evaluate.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/sqa.sh

TextVQA

Download TextVQA_0.5.1_val.json and images and extract to ./playground/data/eval/textvqa.
Single-GPU inference and evaluate.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh

POPE

Download coco from POPE and put under ./playground/data/eval/pope.
Single-GPU inference and evaluate.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/pope.sh

MME

Download the data following the official instructions here.
Downloaded images to MME_Benchmark_release_version.
put the official eval_tool and MME_Benchmark_release_version under ./playground/data/eval/MME.
Single-GPU inference and evaluate.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh

MMBench

Download mmbench_dev_20230712.tsv and put under ./playground/data/eval/mmbench.
Single-GPU inference.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench.sh

Submit the results to the evaluation server: ./playground/data/eval/mmbench/answers_upload/mmbench_dev_20230712.

MMBench-CN

Download mmbench_dev_cn_20231003.tsv and put under ./playground/data/eval/mmbench.
Single-GPU inference.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmbench_cn.sh

Submit the results to the evaluation server: ./playground/data/eval/mmbench/answers_upload/mmbench_dev_cn_20231003.

SEED-Bench

Following the official instructions to download the images and the videos. Put images under ./playground/data/eval/seed_bench/SEED-Bench-image.
Extract the video frame in the middle from the downloaded videos, and put them under ./playground/data/eval/seed_bench/SEED-Bench-video-image. We provide our script extract_video_frames.py modified from the official one.
Multiple-GPU inference and evaluate.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/seed.sh

Optionally, submit the results to the leaderboard: ./playground/data/eval/seed_bench/answers_upload using the official jupyter notebook.

MM-Vet

Extract mm-vet.zip to ./playground/data/eval/mmvet.
Single-GPU inference.

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mmvet.sh

Evaluate the predictions in ./playground/data/eval/mmvet/results using the official jupyter notebook.

Q-Bench

Download llvisionqa_dev.json (for dev-subset) and llvisionqa_test.json (for test-subset). Put them under ./playground/data/eval/qbench.
Download and extract images and put all the images directly under ./playground/data/eval/qbench/images_llviqionqa.
Single-GPU inference (change dev to test for evaluation on test set).

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/qbench.sh dev

Submit the results by instruction here: ./playground/data/eval/qbench/llvisionqa_dev_answers.jsonl.

Paper and Citation

More technical details can be found in our paper. If you find Corematching useful or relevant to your project and research, please kindly cite our paper:

@misc{wang2025corematchingcoadaptivesparseinference,
      title={CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models}, 
      author={Qinsi Wang and Hancheng Ye and Ming-Yu Chung and Yudong Liu and Yueqian Lin and Martin Kuo and Mingyuan Ma and Jianyi Zhang and Yiran Chen},
      year={2025},
      eprint={2505.19235},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.19235}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
images		images
llava		llava
notebooks		notebooks
playground/data		playground/data
scripts		scripts
transformers		transformers
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
inference.py		inference.py
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model

TL;DR

Install

Fast Inference

Visualization of Core Tokens

Evaluation

VQAv2

GQA

VisWiz

ScienceQA

TextVQA

POPE

MME

MMBench

MMBench-CN

SEED-Bench

MM-Vet

Q-Bench

Paper and Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

wangqinsi1/2025-ICML-CoreMatching

Folders and files

Latest commit

History

Repository files navigation

CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model

TL;DR

Install

Fast Inference

Visualization of Core Tokens

Evaluation

VQAv2

GQA

VisWiz

ScienceQA

TextVQA

POPE

MME

MMBench

MMBench-CN

SEED-Bench

MM-Vet

Q-Bench

Paper and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages