GitHub - iSEE-Laboratory/Long_RVOS: Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Tianming Liang¹ Haichao Jiang¹ Yuting Yang¹ Chaolei Tan¹ Shuai Li² Wei-Shi Zheng¹ Jian-Fang Hu¹*

¹Sun Yat-sen University ²Shandong University

Project Page | Paper

🎯 Overview

Long-RVOS is the first large-scale long-term referring video object segmentation benchmark, containing 2,000+ videos with an average duration exceeding 60 seconds.

📦 Dataset Download

The Long-RVOS dataset is available on HuggingFace Hub. Use our download script:

python scripts/download_dataset.py \
    --repo_id iSEE-Laboratory/Long-RVOS \
    --output_dir data

Or manually download from Google Drive and extract:

data/
├── long_rvos/
│   ├── train/
│   │   ├── JPEGImages/
│   │   ├── Annotations/
│   │   └── meta_expressions.json
│   ├── valid/
│   │   ├── JPEGImages/
│   │   ├── Annotations/
│   │   └── meta_expressions.json
│   └── test/
│       ├── JPEGImages/
│       ├── Annotations/
│       └── meta_expressions.json

🚀 Environment Setup

# Clone the repo
git clone https://github.com/iSEE-Laboratory/Long_RVOS.git
cd Long_RVOS

# [Optional] Create a clean Conda environment
conda create -n long_rvos python=3.10 -y
conda activate long_rvos

# PyTorch 
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# MultiScaleDeformableAttention
cd models/GroundingDINO/ops
python setup.py build install
python test.py
cd ../../..

# Other dependencies
pip install -r requirements.txt

Install SAM2

ReferMo uses SAM2 for mask propagation. Please install SAM2 following the official instructions:

cd sam2
pip install -e .
cd ..

Download SAM2 checkpoints and put them in sam2/checkpoints/:

cd sam2/checkpoints
bash download_ckpts.sh
cd ../..

Download Pretrained GroundingDINO

Download pretrained GroundingDINO weights and put them in the pretrained directory:

mkdir pretrained
cd pretrained

wget https://github.com/longzw1997/Open-GroundingDino/releases/download/v0.1.0/gdinot-1.8m-odvg.pth # default
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth

Motion Extraction

If you need to extract motion frames from videos, use:

python scripts/extract_motion.py --data_dir data/long_rvos --output_dir motions

Or you can download our processed motions from Google Drive and extract:

motions/
├── train/
│   ├── motions/
│   └── frame_types.json
├── valid/
│   ├── motions/
│   └── frame_types.json
└── test/
    ├── motions/
    └── frame_types.json

🌟 Get Started

Training

python main.py -c configs/lrvos_swint.yaml -rm train -bs 2 -ng 8 --version refermo --epochs 6

Note: you can download our checkpoint from refermo_swint.pth and put it in the diretory ckpt.

Inference

PYTHONPATH=. python eval/inference_lrvos_with_motion.py \
    -ng 8 \
    -ckpt ckpt/refermo_swint.pth \
    --split valid \
    --version refermo

📌 The results will be saved at output/long_rvos/{split}/{version}.

📌 We also provide a script eval/inference_lrvos.py for ReferDINO-style inference, which does not use motions.

Evaluation

After inference, evaluate the results:

bash run_eval.sh output/long_rvos/valid/refermo valid

🙏 Acknowledgements

Our code is built upon ReferDINO, GroundingDINO, and SAM2. We sincerely appreciate these efforts.

📝 Citation

If you find our work helpful for your research, please consider citing our paper:

@article{liang2025longrvos,
  title={Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation},
  author={Liang, Tianming and Jiang, Haichao and Yang, Yuting and Tan, Chaolei and Li, Shuai and Zheng, Wei-Shi and Hu, Jian-Fang},
  journal={arXiv preprint arXiv:2505.12702},
  year={2025}
}

📄 License

This project is licensed under the MIT License. Please refer to the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
configs		configs
datasets		datasets
eval		eval
models		models
sam2		sam2
scripts		scripts
util		util
.gitignore		.gitignore
README.md		README.md
main.py		main.py
misc.py		misc.py
pretrainer.py		pretrainer.py
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Project Page | Paper

🎯 Overview

📦 Dataset Download

🚀 Environment Setup

Install SAM2

Download Pretrained GroundingDINO

Motion Extraction

🌟 Get Started

Training

Inference

Evaluation

🙏 Acknowledgements

📝 Citation

📄 License

About

Uh oh!

Releases

Packages

Languages

iSEE-Laboratory/Long_RVOS

Folders and files

Latest commit

History

Repository files navigation

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

Project Page | Paper

🎯 Overview

📦 Dataset Download

🚀 Environment Setup

Install SAM2

Download Pretrained GroundingDINO

Motion Extraction

🌟 Get Started

Training

Inference

Evaluation

🙏 Acknowledgements

📝 Citation

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages