Skip to content

zj5559/SAVLT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Adaptive Vision-Language Tracking With Context Prompting (SAVLT)

PyTorch implementation of "Self-Adaptive Vision-Language Tracking With Context Prompting" (IEEE TIP)

Paper can be found here.

Introduction

To address the substantial gap between vision and language modalities, along with the mismatch problem between fixed language descriptions and dynamic visual information, we propose a self-adaptive vision-language tracking framework that leverages the pre-trained multi-modal CLIP model to obtain well-aligned visual-language representations. A novel context-aware prompting mechanism is introduced to dynamically adapt linguistic cues based on the evolving visual context during tracking. Our framework employs a unified one-stream Transformer architecture, supporting joint training for both vision-only and vision-language tracking scenarios. SAVLT figure SAVLT figure

Install the environment

Please refer to install.sh for environment installation, and set your own project/model/data paths.

Training and Testing

Please see eval.sh to find the commands for training and testing. Commands for language-only tracking can be found in eval_nl.sh.

Models and Results

The required pretrained models are provided here[pwd:c5ie]. (Please download, extract, and place them in your own project directory)

We also release our models here[pwd:jpj8] and results here[pwd:nrkw].

Acknowledgments

We acknowledge prior excellent works (SUTrack) and (CoCoOP) for inspiring our methodology. If you find this work helpful to your research, we would appreciate it if you consider citing our paper.

@article{zhaoself,
  title={Self-Adaptive Vision-Language Tracking with Context Prompting},
  author={Zhao, Jie and Chen, Xin and Li, Shengming and Bo, Chunjuan and Wang, Dong and Lu, Huchuan},
  journal={IEEE Transactions on Image Processing},
  year={2026}
}

About

PyTorch implementation of "Self-Adaptive Vision-Language Tracking with Context Prompting" (IEEE TIP)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors