Skip to content

sarthaxxxxx/BATCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

$\texttt{BATCLIP}$: Bimodal Online Test-Time Adaptation for CLIP

Sarthak Kumar Maharana1, Baoming Zhang1, Leonid Karlinsky2, Rogerio Feris2, and Yunhui Guo1
1The University of Texas at Dallas 2 MIT-IBM Watson AI Lab
ICCV 2025

✍🏻 Paper 🔗 Project

Abstract

Although open-vocabulary classification models like Contrastive Language Image Pretraining (CLIP) have demonstrated strong zero-shot learning capabilities, their robustness to common image corruptions remains poorly understood. Through extensive experiments, we show that zero-shot CLIP lacks robustness to common image corruptions during test-time, necessitating the adaptation of CLIP to unlabeled corrupted images using test-time adaptation (TTA). However, we found that existing TTA methods have severe limitations in adapting CLIP due to their unimodal nature. To address these limitations, we propose $\texttt{BATCLIP}$, a bimodal $\textbf{online}$ TTA method designed to improve CLIP's robustness to common image corruptions. The key insight of our approach is not only to adapt the visual encoders for improving image features but also to strengthen the alignment between image and text features by promoting a stronger association between the image class prototype, computed using pseudo-labels, and the corresponding text feature. We evaluate our approach on benchmark image corruption datasets and achieve state-of-the-art results in online TTA for CLIP. Furthermore, we evaluate our proposed TTA approach on various domain generalization datasets to demonstrate its generalization capabilities.

Prerequisites

To use the repository, we provide a conda environment.

conda update conda
conda env create -f environment.yml
conda activate tta 

Usage

$\texttt{BATCLIP}$ is heavily built upon this. Thanks, Mario Doebler!

Features
  • Datasets

  • Models

    • It is also possible to use the models provided by OpenCLIP.
  • Settings

    • reset_each_shift Reset the model state after the adaptation to a domain. We follow this setting.
  • Mixed Precision Training

    • Almost all of the aforementioned methods (except SAR and GTTA) can be trained with mixed precision. This greatly speeds up your experiments and requires less memory. However, all benchmark results are generated with fp32.
  • Modular Design

    • Adding new methods should be rather simple, thanks to the modular design.

Get Started

Once you’ve obtained any missing datasets, update the root data directory in conf.py by setting _C.DATA_DIR = "./data". If your individual dataset folders use names other than those defined in the complete_data_dir_path mapping (also in conf.py), simply edit that dictionary to match your directory names.

Run Experiments

Example run,

python test_time.py --cfg cfgs/imagenet_c/ours.yaml MODEL.ARCH VIT-B-16 MODEL.WEIGHTS openai MODEL.USE_CLIP True SETTING reset_each_shift

You can head over to the config files to change the parameters.

TODO

  • Key results and viz.
  • Framework pending

Citation

@inproceedings{maharana2025batclip,
  title={BATCLIP: Bimodal Online Test-Time Adaptation for CLIP},
  author={Maharana, Sarthak Kumar and Zhang, Baoming and Karlinsky, Leonid and Feris, Rogerio and Guo, Yunhui},
  journal={International Conference on Computer Vision (ICCV)},
  year={2025}
}

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •