Sarthak Kumar Maharana1, Baoming Zhang1, Leonid Karlinsky2, Rogerio Feris2, and Yunhui Guo1
1The University of Texas at Dallas 2 MIT-IBM Watson AI Lab
ICCV 2025
Although open-vocabulary classification models like Contrastive Language Image Pretraining (CLIP) have demonstrated strong zero-shot learning capabilities, their robustness to common image corruptions remains poorly understood. Through extensive experiments, we show that zero-shot CLIP lacks robustness to common image corruptions during test-time, necessitating the adaptation of CLIP to unlabeled corrupted images using test-time adaptation (TTA). However, we found that existing TTA methods have severe limitations in adapting CLIP due to their unimodal nature. To address these limitations, we propose
To use the repository, we provide a conda environment.
conda update conda
conda env create -f environment.yml
conda activate tta Features
-
Datasets
cifar10_cCIFAR10-Ccifar100_cCIFAR100-Cimagenet_cImageNet-C
-
Models
- It is also possible to use the models provided by OpenCLIP.
-
Settings
reset_each_shiftReset the model state after the adaptation to a domain. We follow this setting.
-
Mixed Precision Training
- Almost all of the aforementioned methods (except SAR and GTTA) can be trained with mixed precision. This greatly speeds up your experiments and requires less memory. However, all benchmark results are generated with fp32.
-
Modular Design
- Adding new methods should be rather simple, thanks to the modular design.
Once you’ve obtained any missing datasets, update the root data directory in conf.py by setting _C.DATA_DIR = "./data". If your individual dataset folders use names other than those defined in the complete_data_dir_path mapping (also in conf.py), simply edit that dictionary to match your directory names.
Example run,
python test_time.py --cfg cfgs/imagenet_c/ours.yaml MODEL.ARCH VIT-B-16 MODEL.WEIGHTS openai MODEL.USE_CLIP True SETTING reset_each_shiftYou can head over to the config files to change the parameters.
- Key results and viz.
- Framework pending
@inproceedings{maharana2025batclip,
title={BATCLIP: Bimodal Online Test-Time Adaptation for CLIP},
author={Maharana, Sarthak Kumar and Zhang, Baoming and Karlinsky, Leonid and Feris, Rogerio and Guo, Yunhui},
journal={International Conference on Computer Vision (ICCV)},
year={2025}
}