Skip to content

iSEE-Laboratory/Long_RVOS

Repository files navigation

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Tianming Liang¹   Haichao Jiang¹   Yuting Yang¹   Chaolei Tan¹   Shuai Li²   Wei-Shi Zheng¹   Jian-Fang Hu¹*

¹Sun Yat-sen University   ²Shandong University

🎯 Overview

Long-RVOS is the first large-scale long-term referring video object segmentation benchmark, containing 2,000+ videos with an average duration exceeding 60 seconds.

📦 Dataset Download

The Long-RVOS dataset is available on HuggingFace Hub. Use our download script:

python scripts/download_dataset.py \
    --repo_id iSEE-Laboratory/Long-RVOS \
    --output_dir data

Or manually download from Google Drive and extract:

data/
├── long_rvos/
│   ├── train/
│   │   ├── JPEGImages/
│   │   ├── Annotations/
│   │   └── meta_expressions.json
│   ├── valid/
│   │   ├── JPEGImages/
│   │   ├── Annotations/
│   │   └── meta_expressions.json
│   └── test/
│       ├── JPEGImages/
│       ├── Annotations/
│       └── meta_expressions.json

🚀 Environment Setup

# Clone the repo
git clone https://github.com/iSEE-Laboratory/Long_RVOS.git
cd Long_RVOS

# [Optional] Create a clean Conda environment
conda create -n long_rvos python=3.10 -y
conda activate long_rvos

# PyTorch 
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# MultiScaleDeformableAttention
cd models/GroundingDINO/ops
python setup.py build install
python test.py
cd ../../..

# Other dependencies
pip install -r requirements.txt

Install SAM2

ReferMo uses SAM2 for mask propagation. Please install SAM2 following the official instructions:

cd sam2
pip install -e .
cd ..

Download SAM2 checkpoints and put them in sam2/checkpoints/:

cd sam2/checkpoints
bash download_ckpts.sh
cd ../..

Download Pretrained GroundingDINO

Download pretrained GroundingDINO weights and put them in the pretrained directory:

mkdir pretrained
cd pretrained

wget https://github.com/longzw1997/Open-GroundingDino/releases/download/v0.1.0/gdinot-1.8m-odvg.pth # default
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth

Motion Extraction

If you need to extract motion frames from videos, use:

python scripts/extract_motion.py --data_dir data/long_rvos --output_dir motions

Or you can download our processed motions from Google Drive and extract:

motions/
├── train/
│   ├── motions/
│   └── frame_types.json
├── valid/
│   ├── motions/
│   └── frame_types.json
└── test/
    ├── motions/
    └── frame_types.json

🌟 Get Started

Training

python main.py -c configs/lrvos_swint.yaml -rm train -bs 2 -ng 8 --version refermo --epochs 6

Note: you can download our checkpoint from refermo_swint.pth and put it in the diretory ckpt.

Inference

PYTHONPATH=. python eval/inference_lrvos_with_motion.py \
    -ng 8 \
    -ckpt ckpt/refermo_swint.pth \
    --split valid \
    --version refermo

📌 The results will be saved at output/long_rvos/{split}/{version}.

📌 We also provide a script eval/inference_lrvos.py for ReferDINO-style inference, which does not use motions.

Evaluation

After inference, evaluate the results:

bash run_eval.sh output/long_rvos/valid/refermo valid

🙏 Acknowledgements

Our code is built upon ReferDINO, GroundingDINO, and SAM2. We sincerely appreciate these efforts.

📝 Citation

If you find our work helpful for your research, please consider citing our paper:

@article{liang2025longrvos,
  title={Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation},
  author={Liang, Tianming and Jiang, Haichao and Yang, Yuting and Tan, Chaolei and Li, Shuai and Zheng, Wei-Shi and Hu, Jian-Fang},
  journal={arXiv preprint arXiv:2505.12702},
  year={2025}
}

📄 License

This project is licensed under the MIT License. Please refer to the LICENSE file for details.

About

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published