This is the official PyTorch implementation of our paper:
DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation, ICCVW 2025
Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Sangyoun Lee
Link: [ICCVW] [arXiv]
You can also find other related papers at awesome-video-object-segmentation.
In unsupervised VOS, the scarcity of training data has been a significant bottleneck in achieving high segmentation accuracy. Inspired by observations on two-stream approaches, we introduce a novel data generation method based on the depth-to-flow conversion process. With our flow synthesis protocol, large-scale image-flow-mask triplets can be leveraged during network training. To facilitate future research, we also prepare the DUTSv2 dataset that includes pairs of original images and their corresponding synthetic flow maps.
1. Download the datasets: DUTS, DAVIS, FBMS, YouTube-Objects, Long-Videos.
2. Estimate and save optical flow maps from the videos using RAFT.
3. For DUTS, simulate optical flow maps using DPT.
4. I also provide the pre-processed datasets: DUTSv2, DAVIS, FBMS, YouTube-Objects, Long-Videos.
Start DepthFlow training with:
python run.py --train
Verify the following before running:
✅ Training dataset selection and configuration
✅ GPU availability and configuration
✅ Backbone network selection
Run DepthFlow with:
python run.py --test
Verify the following before running:
✅ Testing dataset selection
✅ GPU availability and configuration
✅ Backbone network selection
✅ Pre-trained model path
Pre-trained model (mitb0)
Pre-trained model (mitb1)
Pre-trained model (mitb2)
Pre-computed results
Code and models are only available for non-commercial research purposes.
For questions or inquiries, feel free to contact:
E-mail: suhwanx@gmail.com