Self-Adaptive Vision-Language Tracking With Context Prompting (SAVLT)

PyTorch implementation of "Self-Adaptive Vision-Language Tracking With Context Prompting" (IEEE TIP)

Paper can be found here.

Introduction

To address the substantial gap between vision and language modalities, along with the mismatch problem between fixed language descriptions and dynamic visual information, we propose a self-adaptive vision-language tracking framework that leverages the pre-trained multi-modal CLIP model to obtain well-aligned visual-language representations. A novel context-aware prompting mechanism is introduced to dynamically adapt linguistic cues based on the evolving visual context during tracking. Our framework employs a unified one-stream Transformer architecture, supporting joint training for both vision-only and vision-language tracking scenarios.

Install the environment

Please refer to install.sh for environment installation, and set your own project/model/data paths.

Training and Testing

Please see eval.sh to find the commands for training and testing. Commands for language-only tracking can be found in eval_nl.sh.

Models and Results

The required pretrained models are provided here[pwd:c5ie]. (Please download, extract, and place them in your own project directory)

We also release our models here[pwd:jpj8] and results here[pwd:nrkw].

Acknowledgments

We acknowledge prior excellent works (SUTrack) and (CoCoOP) for inspiring our methodology. If you find this work helpful to your research, we would appreciate it if you consider citing our paper.

@article{zhaoself,
  title={Self-Adaptive Vision-Language Tracking with Context Prompting},
  author={Zhao, Jie and Chen, Xin and Li, Shengming and Bo, Chunjuan and Wang, Dong and Lu, Huchuan},
  journal={IEEE Transactions on Image Processing},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
experiments/savlt		experiments/savlt
lib		lib
tracking		tracking
README.md		README.md
eval.sh		eval.sh
eval_nl.sh		eval_nl.sh
framework.png		framework.png
install.sh		install.sh
results.png		results.png
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Adaptive Vision-Language Tracking With Context Prompting (SAVLT)

Introduction

Install the environment

Training and Testing

Models and Results

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Adaptive Vision-Language Tracking With Context Prompting (SAVLT)

Introduction

Install the environment

Training and Testing

Models and Results

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages