Skip to content

suhwan-cho/FindTrack

Repository files navigation

FindTrack

This is the official PyTorch implementation of our paper:

Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation, ICCVW 2025
Suhwan Cho*, Seunghoon Lee*, Minhyeok Lee, Jungho Lee, Sangyoun Lee
Link: [ICCVW] [arXiv]

You can also explore other related works at awesome-video-object segmentation.

Demo Video

demo.mp4

Abstract

Existing referring VOS methods typically fuse visual and textual features in a highly entangled manner, processing multi-modal information jointly. However, this entanglement often leads to challenges in resolving ambiguous target identification and maintaining consistent mask propagation across frames. To address these issues, we propose a decoupled framework that explicitly separates object identification from mask propagation. The key frame is adaptively selected based on segmentation confidence and vision-text alignment, establishing a reliable anchor for propagation.

Setup

1. Download the datasets: Ref-YouTube-VOS, Ref-DAVIS17, MeViS.

2. Download Alpha-CLIP weights and place it in the weights/ directory.

Running

Training (optional)

FindTrack works well in a training-free manner, but fine-tuning on specific datasets can improve performance further.

For Ref-YouTube-VOS dataset:

deepspeed --num_gpus 4 train_ytvos.py 

For MeViS dataset:

deepspeed --num_gpus 4 train_mevis.py 

Testing

For Ref-YouTube-VOS dataset:

python run_ytvos.py

For MeViS dataset:

python run_mevis.py

Verify the following before running:
✅ Testing dataset selection
✅ GPU availability and configuration
✅ Pre-trained model path

Gradio Demo

You can use the web demo with your own video!

Run the Gradio demo with:

python demo.py

Attachments

Pre-computed results

Contact

Code and models are only available for non-commercial research purposes.
For questions or inquiries, feel free to contact:

E-mail: suhwanx@gmail.com

About

[ICCVW 2025] Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages