❄️ Boosting Vision State Space Model with Fractal Scanning

Haoke Xiao^*, Lv Tang^{*, 📧}, Peng-Tao Jiang, Hao Zhang,, Jinwei Chen and Bo Li^📧

^* Equal contribution ^📧 Corresponding author

vivo Mobile Communication Co., Ltd, Shanghai, China

📖 Abstract

Recently, foundational models have significantly advanced in different tasks, accompanied by Transformer as the general backbone. However, Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images, which may limit foundational models further development. To alleviate this issue, various efficient State Space Models (SSMs) like Mamba have emerged, initially matching Transformer performance and gradually surpassing it. To improve the performance of SSMs in computer vision tasks, one crucial viewpoint is effective serialization of images. Existing vision Mambas, which rely on a linear scanning mechanism, often struggle to capture complex spatial relationships in 2D images. This results in feature loss during serialization and negatively impacts model performance. To overcome this limitation, we propose the use of fractal scanning curves for image serialization to enhance the Mambas' ability to accurately model complex spatial dependencies. Additionally, unlike existing vision Mambas, which are designed with various curve scanning directions that increase the complexity, contradicting the original intent of Mamba to enhance model performance. We novelty introduce the Fractal Fusion Pathway (FFP) for our FractalMamba, which can enhance its performance efficiently. Extensive experiments underscore the superiority of our proposed FractalMamba.

⚔️ State Space Model with Fractal Scanning

We first revisit the selective state space model and design an fractal scanning algorithm for state space modeling. With this superior algorithm, we develop a Fractal Mamba , which possess excellent spatial structure capture capability and multi-resolution adaptability.

⚙️ Environment Setup

conda create -n fractal-mamba python=3.9
conda activate fractal-mamba

# Install pytorch 
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

# Install other packages
pip install -r fractal-mamba/requirements.txt 

# Install Vision_Tree_Scanning
cd Fractal Mamba/third-party/TreeScan
pip install -v -e .

🍺 Model Zoo

ImageNet-1k Image Classification

name	pretrain	resolution	acc@1	#param	FLOPs	download
Fractal Mamba-T	ImageNet-1K	224x224	83.0	30M	4.8G	ckpt \| cfg
Fractal Mamba-T	ImageNet-1K	384x384	83.9	30M	8.5G	ckpt \| cfg
Fractal Mamba-T	ImageNet-1K	512x512	83.0	30M	15.1G	ckpt \| cfg
Fractal Mamba-T	ImageNet-1k	640x640	81.8	30M		ckpt \| cfg
Fractal Mamba-T	ImageNet-1k	768x768	80.3	30M	30M	ckpt \| cfg
Fractal Mamba-T	ImageNet-1k	1024x1024	76.3	30M	30M	ckpt \| cfg

COCO Object Detection and Instance Segmentation

backbone	method	schedule	box mAP	mask mAP	#param	FLOPs	download
Fractal Mamba-T	Mask R-CNN	1x	46.8	42.4	49M	266G	- \| cfg
Fractal Mamba-T	Mask R-CNN	3x	48.5	43.3	49M	266G	- \| cfg

ADE20K Semantic Segmentation

backbone	method	resolution	mIoU (ss/ms)	#param	FLOPs	download
Fractal Mamba-T	UperNet	512x512	48.0 / 48.9	53M	942G	- \| cfg

LEVIR-CD+ Remote Sensing Binary Change Detection

backbone	method	resolution	IoU	Precision	Recall	KC	F1	#param	FLOPs	download
Fractal Mamba-T	ChangeMamba	1024x1024	80.0	89.3	88.4	88.4	89.9	35M	55G	- \| cfg

🚀 Train & Evaluate

ImageNet-1k Image Classification

bash GrootV/scripts/train.sh

You need to modify the relevant path to your own.

⭐️ BibTeX

If you find this work useful for your research, please cite:

@article{xiao2024Fractal Mambal,
  title={Fractal MambaL: Tree Topology is All You Need in State Space Model},
  author={Xiao, Yicheng and Song, Lin and Huang, Shaoli and Wang, Jiangshan and Song, Siyu and Ge, Yixiao and Li, Xiu and Shan, Ying},
  journal={arXiv preprint arXiv:2406.02395},
  year={2024}
}

❤️ Acknowledgement

Code in this repository is built upon several public repositories. Thanks for the wonderful work GrootVL, InternImage and VMamba ! !

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

❄️ Boosting Vision State Space Model with Fractal Scanning

📖 Abstract

⚔️ State Space Model with Fractal Scanning

⚙️ Environment Setup

🍺 Model Zoo

🚀 Train & Evaluate

⭐️ BibTeX

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

❄️ Boosting Vision State Space Model with Fractal Scanning

📖 Abstract

⚔️ State Space Model with Fractal Scanning

⚙️ Environment Setup

🍺 Model Zoo

🚀 Train & Evaluate

⭐️ BibTeX

❤️ Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages