Full length paper accepted in IJCV 2025, Short paper accepted in ML4AD Workshop in NeurIPS 2021.
- Software :
- torch==1.9.0
For both training and testing, metrics monitoring is done through visdom_logger (https://github.com/luizgh/visdom_logger). To install this package with pip, use the following command:
pip install git+https://github.com/luizgh/visdom_logger.git
pip install git+https://github.com/youtubevos/cocoapi.git
pip install -r requirements.txt
- I use a modified version of visdom_logger so its best you keep visdom_port: -1 in the config files to disable it, and use instead wandb for monitoring training.
- Download processed data here
- Expected folder structure:
VSPW/data
├── seq1
│ ├── origin
│ ├── mask
│ ├── flow
└── seq2
| ├── origin
| └── mask ....
- Use the instructions from RePRI for downloading [processed Pascal Data] (https://drive.google.com/file/d/1Lj-oBzBNUsAqA9y65BDrSQxirV8S15Rk/view?usp=sharing).
- use processed data from VSPW_480
- Example explaining why we specificlly pick PASCAL classes and ignore background (i.e. stuff classes). We show the effect of not removing stuff classes (such as road, building, ...) during training and how it would lead to contaminating the learning process with the novel class boundaries (in our case the person class). While removing the stuff classes will ensure that does not occur which explains our choice of classes.
- Use 2019 YTVIS version similar to DANet
The train/val splits are directly provided in lists/.
First, you will need to download the ImageNet pre-trained backbones from RePRI at here and put them under initmodel/. These will be used if you decide to train your models from scratch.
- For Pascal-to-MiniVSPW : use RePRI ones "full pre-trained models."
- For YTVIS: use these provided models
- For YTVIS (with auxiliary DCL): use these provided models
- For MiniVSPW-to-MiniVSPW: use these provided models
All the code is provided in src/. Default configuration files can be found in config_files/. Training and testing scripts are located in scripts/. Lists/ contains the train/validation splits for each dataset.
- Reproduce results for all tables
bash scripts/test_all_datasets.sh
You can train your own models from scratch with the scripts/train.sh script, as follows.
bash scripts/train.sh {data} {fold} {[gpu_ids]} {layers}For instance, if you want to train a Resnet50-based model on the fold-0 of Pascal-5i on GPU 1, use:
bash scripts/train.sh pascal 0 [1] 50For training on ytvis standard training
bash scripts/train.sh ytvis 0 [1] 50Note that this code supports distributed training. If you want to train on multiple GPUs, you may simply replace [1] in the previous examples with the list of gpus_id you want to use.
We gratefully thank the authors of https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation for building upon their code. We also rely on https://github.com/scutpaul/DANet, for understanding their Youtube-VIS episodic version.
Please city my paper if you find it useful in your research
@article{siamijcv2025,
author = {Siam, Mennatullah},
date = {2025/03/06},
date-added = {2025-05-16 14:50:33 +0300},
date-modified = {2025-05-16 14:50:33 +0300},
doi = {10.1007/s11263-025-02390-x},
id = {Siam2025},
isbn = {1573-1405},
journal = {International Journal of Computer Vision},
title = {Temporal Transductive Inference for Few-Shot Video Object Segmentation},
url = {https://doi.org/10.1007/s11263-025-02390-x},
year = {2025}}

