- Repository under construction...
- This is the official implementation of our AITtrack: Attention-based Image-Text Alignment for Visual Tracking paper.
Our proposed AITrack simplifies the process of VLM-based tracking using attention-based visual and textual alignment modules. It utilizes a region-of-interest (ROI) text-guided encoder that leverages existing pre-trained language models to implicitly extract and encode textual features and a simple image encoder to encode visual features. A simple alignment module is implemented to combine both encoded visual and textual features, thereby inherently exposing the semantic relationship between the template and search frames with their surroundings, providing rich encodings for improved tracking performance. We employ a simple decoder that takes past predictions as spatiotemporal clues to effectively model the target appearance changes without the need for complex customized postprocessings and prediction heads.
-
We propose an ROI-based text-guided encoder that leverages existing pre-trained language models to implicitly extract and encode textual descriptions.
-
We propose a simple image-text alignment module that encodes the semantic relationship between the template and search regions with their surroundings, providing rich and meaningful representation for improved VOT performance.
-
We also incorporate a simple decoder that leverages the spatiotemporal representations to effectively model the target object appearance variations across the video frames without the need for complex customized postprocessings and prediction heads.
-
We perform rigorous experimental evaluations on seven publicly available VOT benchmark datasets to show the advantages of our proposed AITrack.
- Trackers with Only Bounding Box (BB) Initialization
- Trackers with Bounding Box (BB) and Natural Language (NL) Initialization
-
Use the Anaconda (CUDA 11.3)
conda env create -f environment.yml conda activate aitrack -
Clone this repository
git clone https://github.com/BasitAlawode/AITrack AITrack cd AITrack
Modify project paths by editing these two files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
- To be updated....
- To be updated....
- To be updated....
- Our work is based on
- ARTrack,
- Alpha-CLIP, and
- RTS for the segmentation mask.
We thank the authors for making their codes available.
If you find our work useful, please consider citing:
@ARTICLE{basit_aitrack25,
author={Alawode, Basit and Javed, Sajid},
journal={IEEE Access},
title={AITtrack: Attention-based Image-Text Alignment for Visual Tracking},
year={2025},
volume={},
number={},
pages={1-1},
doi={10.1109/ACCESS.2025.3555816}}



