Skip to content

ZhengxyFlow/HMHI-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” (ACM MM 25) Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation

ACM MM 2025 License: Apache 2.0

Official code repository for our ACM MM 2025 paper:

"Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation"
Xiangyu Zheng, Songcheng He, Wanyu Li, Xiaoqiang Li, Wei Zhang πŸ”— [Paper Link]


πŸ“– Introduction

This repository provides the official implementation of our ACM MM 2025 paper, "Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation" πŸ”— [Paper Link] .

In this work, we propose a novel method HMHI-Net for Unsupervised Video Object Segmentation (UVOS) with Shallow Features for Memroy. The method features:

  • πŸ”§ A novel Hierarchical Memory Architecture that simultaneously incorporates shallow- and high-level features for memory, facilitating UVOS with both pixel-level details and semantic richness stored in memory banks.
  • πŸ” The Heterogeneous Mutual Refinement Mechanism to perform interaction across two memory banks, through the pixel-guided local alignment module (PLAM) and the semantic-guided global integration module (SGIM) respectively.
  • ⚑ HMHI-Net achieves SOTA on common UVOS and VSOD benchmarks, with 89.8% J&F on DAVIS-16, 86.9% J on FBMS and 76.2% J on YouTube-Objetcs.
Pipeline Overview
(a) Overall pipeline of HMHI-Net. (b) Memory readout mechanism to refine current frame. (c) Pixel-guided local alignment module. (d)Semantic-guided global integration module. (e) Memory update mechanism with the reference encoder.

🎞️ Video Demo

Demo1 Demo2
Car-roundabout_Davis16 Dog_Davis16
Demo3 Demo4
Drift-straight_Davis16 Parkour_Davis16


πŸš€ Getting Started

1. Environment Setup

pip install -r requirements.txt

Thanks to πŸ”— [Calledit] for providing a more detailed environment installation script!

#!/bin/bash

conda create -n env_name python=3.10
conda activate env_name

pip install torch numpy opencv-python timm mmcv bytecode IPython tensorboard scikit-image 

git clone https://github.com/luo3300612/Visualizer

cd Visualizer/
python setup.py install
cd ..


mkdir -p checkpoint/pretrained/mit/
wget -o checkpoint/pretrained/mit/mit_b1.pth https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b1_512x512_160k_ade20k/segformer_mit-b1_512x512_160k_ade20k_20220620_112037-c3f39e00.pth

pip install gdown

gdown --id 1OG_Dla9f-sBuoi3Q6mF55Au3rU-Fc9Sg -O checkpoint/infermodel.pth

mkdir -p Your_eval_data_path/FBMS2SEG_byvideo/frame/val

2. Data Preparation

▢️ Dataset Download

Dataset Download Link
YouTube-VOS πŸ”— Download
DAVIS-16 πŸ”— Download
FBMS πŸ”— Download
Youtube-Objects πŸ”— Download
DAVSOD πŸ”— Download
ViSal πŸ”— Download

▢️ Optical Flow Preparation

Following previous UVOS works, optical flow maps for both training and inference data are generated through πŸ”— [RAFT].

▢️ Folder Structure

Please Ensure to organize the data files as follows:

data/
  └── DAVIS-16/
        β”œβ”€β”€ Images/
        |   β”œβ”€β”€ train/
        |   |   β”œβ”€β”€ video_name1/
        |   |   β”œβ”€β”€ video_name2/
        |   |    ...
        |   └── val/
        |       β”œβ”€β”€ video_name1/
        |       β”œβ”€β”€ video_name2/
        |       ...
        β”œβ”€β”€ Annotations/
        |   β”œβ”€β”€ train/
        |   |   β”œβ”€β”€ video_name1/
        |   |   β”œβ”€β”€ video_name2/
        |   |    ...
        |   └── val/
        |       β”œβ”€β”€ video_name1/
        |       β”œβ”€β”€ video_name2/
        |       ...
        └── Flows/
        |   β”œβ”€β”€ train/
        |   |   β”œβ”€β”€ video_name1/
        |   |   β”œβ”€β”€ video_name2/
        |   |    ...
        |   └── val/
        |       β”œβ”€β”€ video_name1/
        |       β”œβ”€β”€ video_name2/
        |        ...
              
  └── Youtube-VOS/
        β”œβ”€β”€ Images/
            ...
        β”œβ”€β”€ Annotations/
            ...
        └── Flows/
            ...
...

3. Checkpoint Preparation

▢️ Download Pretrained Model

Download the pretrained model and save them in './checkpoint/pretrained/' for model training.

We adopt the Segformer models pretrained on ImageNet-1k

Pretrained Model Model Link
πŸ”— Segformer (NeurIPS 21) πŸ”— Mit_b0 - Mit_b5 or πŸ”— GoogleDrive
πŸ”— Swin-Transformer (ICCV 21) πŸ”— Swin-T - Swin-B

▢️ Download HMHI-Net Checkpoints

Task Download Link
πŸ”— DAVIS-16
UVOS Checkpoints πŸ”— FBMS
πŸ”— Youtube-Objects
πŸ”— DAVIS-16
VSOD Checkpoints πŸ”— DAVSOD
πŸ”— FBMS
πŸ”— ViSal

4. Training

# Certain config values in the file may require modification to suit your local setup.
bash scripts/train.sh

5. Fine-Tuning

Load the best-performing checkpoint on the corresponding dataset at the Training stage and start Fine-Tuning.

# Certain config values in the file may require modification to suit your local setup.
bash scripts/finetune.sh

6. Inference

# Certain config values in the file may require modification to suit your local setup.
bash scripts/infer.sh

7. Evaluation

# Certain config values in the file may require modification to suit your local setup.

# For UVOS tasks
python utils/val_zvos.py

# For VSOD tasks
python utils/val_vsod.py

Acknowledgement

This repository is built upon [πŸ”— Isomer] and [πŸ”— SAM], originally proposed in:

  1. "Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation", Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang ICCV, 2023. [πŸ”— Paper]

  2. "Segment Anything" Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick, arxiv, 2023. [πŸ”— Paper]

We reuse parts of their codebase, including:

  • The data loading pipeline

  • Model initialization logic

  • Training routines

  • Module formulation

License

The model is licensed under the Apache 2.0 license.

Citating HMHI-Net

@inproceedings{Zheng2025mm,
  title     = {Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation},
  author    = {Xiangyu Zheng, Songcheng He, Wanyu Li, Xiaoqiang Li, Wei Zhang},
  booktitle = {Proceedings of the ACM International Conference on Multimedia (ACM MM)},
  year      = {2025}
}

About

Official pytorch implement of ACM MM 25 paper: HMHI-Net

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors