Haoke Xiao*, Lv Tang*, 📧, Peng-Tao Jiang, Hao Zhang,, Jinwei Chen and Bo Li📧
* Equal contribution 📧 Corresponding author
vivo Mobile Communication Co., Ltd, Shanghai, China
Recently, foundational models have significantly advanced in different tasks, accompanied by Transformer as the general backbone. However, Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images, which may limit foundational models further development. To alleviate this issue, various efficient State Space Models (SSMs) like Mamba have emerged, initially matching Transformer performance and gradually surpassing it. To improve the performance of SSMs in computer vision tasks, one crucial viewpoint is effective serialization of images. Existing vision Mambas, which rely on a linear scanning mechanism, often struggle to capture complex spatial relationships in 2D images. This results in feature loss during serialization and negatively impacts model performance. To overcome this limitation, we propose the use of fractal scanning curves for image serialization to enhance the Mambas' ability to accurately model complex spatial dependencies. Additionally, unlike existing vision Mambas, which are designed with various curve scanning directions that increase the complexity, contradicting the original intent of Mamba to enhance model performance. We novelty introduce the Fractal Fusion Pathway (FFP) for our FractalMamba, which can enhance its performance efficiently. Extensive experiments underscore the superiority of our proposed FractalMamba.
We first revisit the selective state space model and design an fractal scanning algorithm for state space modeling. With this superior algorithm, we develop a Fractal Mamba , which possess excellent spatial structure capture capability and multi-resolution adaptability.
conda create -n fractal-mamba python=3.9
conda activate fractal-mamba
# Install pytorch
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
# Install other packages
pip install -r fractal-mamba/requirements.txt
# Install Vision_Tree_Scanning
cd Fractal Mamba/third-party/TreeScan
pip install -v -e .
ImageNet-1k Image Classification
| name | pretrain | resolution | acc@1 | #param | FLOPs | download |
|---|---|---|---|---|---|---|
| Fractal Mamba-T | ImageNet-1K | 224x224 | 83.0 | 30M | 4.8G | ckpt | cfg |
| Fractal Mamba-T | ImageNet-1K | 384x384 | 83.9 | 30M | 8.5G | ckpt | cfg |
| Fractal Mamba-T | ImageNet-1K | 512x512 | 83.0 | 30M | 15.1G | ckpt | cfg |
| Fractal Mamba-T | ImageNet-1k | 640x640 | 81.8 | 30M | ckpt | cfg | |
| Fractal Mamba-T | ImageNet-1k | 768x768 | 80.3 | 30M | 30M | ckpt | cfg |
| Fractal Mamba-T | ImageNet-1k | 1024x1024 | 76.3 | 30M | 30M | ckpt | cfg |
COCO Object Detection and Instance Segmentation
ADE20K Semantic Segmentation
| backbone | method | resolution | mIoU (ss/ms) | #param | FLOPs | download |
|---|---|---|---|---|---|---|
| Fractal Mamba-T | UperNet | 512x512 | 48.0 / 48.9 | 53M | 942G | - | cfg |
LEVIR-CD+ Remote Sensing Binary Change Detection
| backbone | method | resolution | IoU | Precision | Recall | KC | F1 | #param | FLOPs | download |
|---|---|---|---|---|---|---|---|---|---|---|
| Fractal Mamba-T | ChangeMamba | 1024x1024 | 80.0 | 89.3 | 88.4 | 88.4 | 89.9 | 35M | 55G | - | cfg |
ImageNet-1k Image Classification
bash GrootV/scripts/train.sh
You need to modify the relevant path to your own.
If you find this work useful for your research, please cite:
@article{xiao2024Fractal Mambal,
title={Fractal MambaL: Tree Topology is All You Need in State Space Model},
author={Xiao, Yicheng and Song, Lin and Huang, Shaoli and Wang, Jiangshan and Song, Siyu and Ge, Yixiao and Li, Xiu and Shan, Ying},
journal={arXiv preprint arXiv:2406.02395},
year={2024}
}
Code in this repository is built upon several public repositories. Thanks for the wonderful work GrootVL, InternImage and VMamba ! !
