✨SPARC✨: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models
Published in CVPR 2025 (paper) (supplementary) (poster)
After cloning our repo, please install numpy and sklearn, e.g.
pip install scipy
pip install scikit-learn
For ease of use, we have provided all necessary CLIP cosine similarities for reproducing our main results (Tab. 2 in our paper). We computed these similarities using these codes, although the process involved is fairly standard (and described in our paper). The data files also include image filenames, ground-truth labels, classnames, and compound prompts.
Our pipeline requires just a CPU! 😻
python run_SPARC.py --input_dir=<path to data folder> --dataset_name=<COCO2014, VOC2007, or NUSWIDE> --model_type=<ViT-L14336px, ViT-L14, ViT-B16, ViT-B32, RN50x64, RN50x16, RN50x4, RN101, or RN50> --output_prefix=<prefix for output filenames>
The script will produce a .csv file with mAPs, and a .pkl file with both mAPs and individual class APs.
Code for noise model analysis
Code for ablations and other results