OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework

This repository contains the code for the paper OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework [ECCV'2024].

News

[2024/07] X-Prompt: Our new work X-Prompt: Multimodal Visual Prompt for Video Object Segmentation has been accepted by [ACMMM'2024]. This work proposes a novel multimodal VOS approach, leveraging OneVOS as the RGB-VOS foundation model and incorporating Multi-modal Adaptation Experts to integrate additional modality-specific knowledge. The proposed method achieves SOTA performance across 4 benchmarks.

OneVOS Examples

Model Zoo and Results

Our trained models, benckmark scores, and pre-computed results reproduced by this project can be found in MODEL_ZOO.md.

Getting Started

Install

conda create -n vos python=3.9 -y
conda activate vos
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

pip install -r requirements.txt

git clone https://github.com/ClementPinard/Pytorch-Correlation-extension.git
cd Pytorch-Correlation-extension
python setup.py install
cd -

Prepare Dataset

We follow the same data preparation steps used in AOT, including Static datasets, DAVIS and Youtube-VOS. Besides, we also use the more comlicated dataset MOSE and long-term vos dataset LVOS V1 for training or test.

├── OneVOS
├── datasets
│   ├── Static
│   │    └── ...
│   ├── DAVIS
│   │    └── ...
│   ├── YOUTUBE-VOS
│   │    └── ...
│   ├── MOSE
│   │    └── ...
│   ├── LVOS
│   │    └── ...
│   ├── ...

Pretrained Backbone Model

We initialize OneVOS using the weights of ConvMAE-Base as the backbone. You can download the pretrained weights directly from pretrain_backbone and place them in the OneVOS/pretrain_weights/.

Train & Eval

Stages:

PRE: the pre-training stage with static images.
PRE_YTB_DAV: the main-training stage with YouTube-VOS and DAVIS.
PRE_YTB_DAV_MOSE: the main-training stage with YouTube-VOS and DAVIS and MOSE.

The training script can be referenced train_examples_pre_ytb_dav.sh, and train_examples_pre_ytb_dav_mose.sh

The inference script can be referenced eval_examples_pre_ytb_dav.sh, and eval_examples_pre_ytb_dav_mose.sh

Citations

If you find this repository useful, please consider giving a star and citation:

@inproceedings{li2025onevos,
  title={Onevos: unifying video object segmentation with all-in-one transformer framework},
  author={Li, Wanyun and Guo, Pinxue and Zhou, Xinyu and Hong, Lingyi and He, Yangji and Zheng, Xiangyu and Zhang, Wei and Zhang, Wenqiang},
  booktitle={European Conference on Computer Vision},
  pages={20--40},
  year={2025},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
dataloaders		dataloaders
datasets/DAVIS		datasets/DAVIS
lib		lib
networks		networks
source_images		source_images
tools		tools
utils		utils
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
eval_examples_pre_ytb_dav.sh		eval_examples_pre_ytb_dav.sh
eval_examples_pre_ytb_dav_mose.sh		eval_examples_pre_ytb_dav_mose.sh
requirements.txt		requirements.txt
train_examples_pre_ytb_dav.sh		train_examples_pre_ytb_dav.sh
train_examples_pre_ytb_dav_mose.sh		train_examples_pre_ytb_dav_mose.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework

News

OneVOS Examples

Model Zoo and Results

Getting Started

Install

Prepare Dataset

Pretrained Backbone Model

Train & Eval

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework

News

OneVOS Examples

Model Zoo and Results

Getting Started

Install

Prepare Dataset

Pretrained Backbone Model

Train & Eval

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages