GitHub - cdb342/OccStudio: A unified framework for 3D Occupancy Prediction

Welcome to OccStudio, a unified framework for 3D Occupancy Prediction. This project unifies our previous works, including ALOcc, CausalOcc, and GDFusion, along with multiple classic methods into a single, standardized codebase to support research in autonomous driving, embodied AI, and other intelligent systems.

The framework is designed to handle both Semantic Occupancy and Occupancy Flow prediction, supporting a wide variety of input modalities, feature encoding methods, temporal fusion strategies, image backbones, etc. Our goal is to provide a flexible foundation to accelerate research in Spatial Intelligence across academia and industry.

🌟 Highlights

🏆 A Unified Framework: Provides a common codebase for multiple occupancy prediction methods, including ALOcc, CausalOcc, GDFusion, BEVDetOcc, FB-Occ, etc.
🔧 Flexible and Configurable Architecture: Supports multiple input modalities (e.g., images, depth), various types of 3D feature encoding (e.g., Volume-based, BEV-based), different temporal fusion methods (e.g., SoloFusion, GDFusion), and different image backbones (e.g., Resnet, InterImage, Swin-Transformer), all of which are switchable via configuration.
📚 Dataset Support: Provides full support for large-scale datasets like nuScenes and Waymo, and allows for seamlessly switching between different occupancy annotation formats (e.g., Occ3D, SurroundOcc, OpenOccupancy) for robust experimentation.

🛠 Model Zoo

OccStudio currently supports the following models:

Method	Task	Publication
ALOcc	Semantic Occupancy & Flow	ICCV 2025
GDFusion	Semantic Occupancy	CVPR 2025
BEVDetOcc	Semantic Occupancy	-
FB-Occ	Semantic Occupancy	ICCV 2023
SparseOcc	Semantic Occupancy	ECCV 2024

🚀 Get Started

1. Installation

We recommend using Conda for environment management.

# Clone this repository (replace OccStudio with your actual repo name)
git clone https://github.com/cdb342/OccStudio
cd OccStudio

# Create and activate the conda environment
conda create -n OccStudio python=3.8 -y
conda activate OccStudio

# Install PyTorch dependencies (for CUDA 11.8)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html

# Install MMCV dependencies
git clone https://github.com/open-mmlab/mmcv
cd mmcv
git checkout 1.x # Use the stable 1.x branch
MMCV_WITH_OPS=1 pip install -e . -v
cd ..

# Install MMDetection and MMSegmentation
pip install mmdet==2.28.2 mmsegmentation==0.30.0

# Install the OccStudio framework itself
pip install -v -e .

# Install other dependencies
pip install torchmetrics timm dcnv4 ninja spconv transformers IPython einops numba
pip install numpy==1.23.4 # Pin numpy version for compatibility

# (Optional for SparseOcc)
cd mmdet3d/models/sparseocc/csrc
pip install -v -e .

2. Data Preparation

nuScenes

Download the full nuScenes dataset from the official website.
Download the Occ3D nuScenes annotations from the project page.
(Optional) Download other community annotations for extended experiments:
- OpenOcc_v2.1 Annotations
- OpenOcc_v2.1 Ray Mask
- SurroundOcc Annotations (rename to gts_surroundocc)
- OpenOccupancy-v0.1 Annotations

Please organize the data into the following directory structure:

├── data
│   ├── nuscenes
│   │   ├── maps, samples, sweeps, v1.0-test, v1.0-trainval
│   │   ├── gts                 # Occ3D annotations
│   │   ├── gts_surroundocc     # (Optional) SurroundOcc annotations
│   │   ├── openocc_v2          # (Optional) OpenOcc annotations
│   │   ├── openocc_v2_ray_mask # (Optional) OpenOcc ray mask
│   │   └── nuScenes-Occupancy-v0.1 # (Optional) OpenOccupancy annotations

Finally, run the preprocessing scripts:

# 1. Extract semantic segmentation labels from LiDAR
python tools/nusc_process/extract_sem_point.py

# 2. Create formatted info files for the dataloader
PYTHONPATH=$(pwd):$PYTHONPATH python tools/create_data_bevdet.py

Alternatively, you can download the pre-processed segmentation labels, train.pkl and val.pkl files from our Hugging Face Hub, and organize their path as:

ALOcc/
├── data/
│   ├── lidar_seg
│   ├── nuscenes/
│   │   ├── train.pkl
│   │   ├── val.pkl
│   │   ...
..

Waymo

Download the Waymo Open Dataset from the official website.
Download the Occ3D Waymo annotations and pkl files from here.
Follow the official instructions to organize the files.

3. Pre-trained Models

For training, please download pre-trained image backbones from BEVDet GitHub, GeoMIM GitHub, or Hugging Face Hub. Place them in the ckpts/pretrain/ directory.

🎮 Usage

Training

Use the following script for distributed training.

# Syntax: bash tools/dist_train.sh [CONFIG_FILE] [WORK_DIR] [NUM_GPUS]
# Example: Train the ALOcc-3D model
bash tools/dist_train.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py work_dir/alocc_3d 8

Testing

Download our pre-trained models from Hugging Face and run the testing script.

# Evaluate semantic occupancy (mIoU) or occupancy flow
# Syntax: bash tools/dist_test.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]
# Example: Evaluate the ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8

# Evaluate semantic occupancy (RayIoU)
# Syntax: bash tools/dist_test_ray.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]
# Example: Evaluate the ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain_wo_mask.py ckpts/alocc_3d_256x704_bevdet_preatrain_wo_mask.pth 8

Note: When performing inference with temporal fusion, please use 1 or 8 GPUs. A sampler bug may cause duplicate sample counting with other GPU configurations.

Benchmarking

We provide convenient tools to benchmark model FPS (Frames Per Second) and FLOPs.

# Benchmark FPS
# Syntax: python tools/analysis_tools/benchmark.py [CONFIG_FILE]
# Example: Benchmark the ALOcc-3D model
python tools/analysis_tools/benchmark.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py

# Calculate FLOPs
# Syntax: python tools/analysis_tools/get_flops.py [CONFIG_FILE] --modality image --shape 256 704
# Example: Calculate FLOPs for the ALOcc-3D model
python tools/analysis_tools/get_flops.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py --modality image --shape 256 704

Visualization

First, ensure you have Mayavi installed. You can install it using pip:

pip install mayavi

Before you can visualize the output, you need to run the model on the test set and save the prediction results.

Use the dist_test.sh script with the --save flag. This will store the model's output in a directory.

# Example: Evaluate the ALOcc-3D model and save the predictions
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8 --save

The prediction results will be saved in the test/ directory, following a path structure like: test/[CONFIG_NAME]/[TIMESTAMP]/.

Once the predictions are saved, you can run the visualization script. This script requires the path to the prediction results and the path to the ground truth data.

# Syntax: python tools/visual.py [PREDICTION_PATH] [GROUND_TRUTH_PATH]
# Example:
python tools/visual.py work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ your/path/to/ground_truth

Replace work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ with the actual path to your saved prediction results from Step 2.
Replace your/path/to/ground_truth with the path to the corresponding ground truth dataset.

This will launch an interactive Mayavi window where you can inspect and compare the 3D occupancy predictions.

📊 Main Results

Here are the performance benchmarks of models implemented in OccStudio.

🏆 Performance on nuScenes (Models on Occ3D Are Trained with Camera Visible Mask)

Model	Annotation	Backbone	Input	Input Size	mIoU	mIoU_D	IoU	FPS	Memory	Checkpoint	Config
BEVDetOcc-SF	Occ3D	R-50	C	`256x704`	41.9	34.4	75.1	6.5	10717	🤗 HF	config
BEVDetOcc-GF	Occ3D	R-50	C	`256x704`	43.6	36.1	77.8	7.0	3017	🤗 HF	config
FB-Occ	Occ3D	R-50	C	`256x704`	39.8	34.2	69.9	10.3	4099	🤗 HF	config
FB-Occ-GF	Occ3D	R-50	C	`256x704`	42.1	36.4	73.3	10.3	2879	🤗 HF	config
ALOcc-2D-mini	Occ3D	R-50	C	`256x704`	41.4	35.4	70.0	30.5	1605	🤗 HF	config
ALOcc-2D	Occ3D	R-50	C	`256x704`	44.8	38.7	74.3	8.2	5553	🤗 HF	config
ALOcc-3D	Occ3D	R-50	C	`256x704`	45.5	39.3	75.3	6.0	10793	🤗 HF	config
ALOcc-3D	Occ3D	R-50	C+D	`256x704`	54.5	50.6	85.2	6.0	13003	🤗 HF	config
ALOcc-3D	Occ3D	Intern-T	C+D	`256x704`	55.6	52.4	85.1	5.8	13015	🤗 HF	config
ALOcc-3D	Occ3D	Swin-Base	C+D	`512x1408`	60.0	57.8	87.8	1.5	26867	🤗 HF	config
ALOcc-3D-GF	Occ3D	R-50	C	`256x704`	46.5	40.2	77.4	6.2	4347	🤗 HF	config
ALOcc-3D-GF	Occ3D	R-50	C+D	`256x704`	54.9	51.4	85.9	6.2	6561	🤗 HF	config
ALOcc-2D-GF	OpenOccupancy	R-50	C	`900x1600`	17.9	13.7	28.6	0.8	13857	🤗 HF	config
ALOcc-2D-GF	OpenOccupancy	R-50	C+D	`900x1600`	24.5	21.6	34.5	0.8	13891	🤗 HF	config
ALOcc-2D-mini*	SurroundOcc	R-50	C	`900x1600`	21.5	19.5	31.5	5.8	2869	🤗 HF	config
ALOcc-3D*	SurroundOcc	R-50	C	`900x1600`	24.0	21.7	34.7	1.7	11117	🤗 HF	config
ALOcc-3D-GF	SurroundOcc	R-50	C	`900x1600`	25.5	22.5	38.2	0.9	11857	🤗 HF	config

🏆 Performance on nuScenes (Trained w/o Camera Visible Mask)

Model	Annotation	Backbone	Input	Input Size	mIoU	RayIoU	RayIoU_{1m, 2m, 4m}	FPS	Memory	Checkpoint	Config
BEVDetOcc-SF	Occ3D	R-50	C	`256x704`	24.3	35.2	31.2, 35.9, 38.4	6.5	10717	🤗 HF	config
FB-Occ	Occ3D	R-50	C	`256x704`	31.1	39.0	33.0, 39.9, 44.0	10.3	4099	🤗 HF	config
SparseOcc	Occ3D	R-50	C	`256x704`	26.6	32.5	26.2, 33.2, 38.1	-	5967	🤗 HF	config
ALOcc-2D-mini	Occ3D	R-50	C	`256x704`	33.4	39.3	32.9, 40.1, 44.8	30.5	1605	🤗 HF	config
ALOcc-2D	Occ3D	R-50	C	`256x704`	37.4	43.0	37.1, 43.8, 48.2	8.2	5553	🤗 HF	config
ALOcc-3D	Occ3D	R-50	C	`256x704`	38.0	43.7	37.8, 44.7, 48.8	6.0	10793	🤗 HF	config
ALOcc-3D-GF	Occ3D	R-50	C	`256x704`	38.4	44.1	38.1, 45.1, 49.3	6.2	4347	🤗 HF	config

🏆 Performance on OpenOcc (Semantic Occupancy and Flow)

Method	Annotation	Backbone	Input	Input Size	Occ Score	mAVE	mAVE_TP	RayIoU	RayIoU_{1m, 2m, 4m}	FPS	Checkpoint	Config
ALOcc-Flow-2D	Occ3D	R-50	C	`256x704`	41.9	0.530	0.431	40.3	34.3, 41.0, 45.5	7.0	🤗 HF	config
ALOcc-Flow-3D	Occ3D	R-50	C	`256x704`	43.1	0.549	0.458	41.9	35.6, 42.9, 47.2	5.5	🤗 HF	config

🤝 Contribution

We welcome contributions from the community! If you find a bug, have a feature request, or want to contribute new models/datasets to OccStudio, please feel free to open an issue or submit a pull request. You can also contact Dubing Chen via email (dobbin.chen@gmail.com).

🙏 Acknowledgement

We gratefully acknowledge the foundational work of many excellent open-source projects, and we would like to extend our special thanks to:

📜 Citation

If you find OccStudio useful in your research, please consider citing our relevant papers:

@InProceedings{chen2025rethinking,
    author    = {Chen, Dubing and Zheng, Huan and Fang, Jin and Dong, Xingping and Li, Xianfei and Liao, Wenlong and He, Tao and Peng, Pai and Shen, Jianbing},
    title     = {Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {1505-1515}
}

@InProceedings{chen2025alocc,
    author    = {Chen, Dubing and Fang, Jin and Han, Wencheng and Cheng, Xinjing and Yin, Junbo and Xu, Chenzhong and Khan, Fahad Shahbaz and Shen, Jianbing},
    title     = {Alocc: adaptive lifting-based 3d semantic occupancy and cost volume-based flow prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

@InProceedings{chen2025semantic,
    author    = {Chen, Dubing and Zheng, Huan and Zhou, Yucheng and Li, Xianfei and Liao, Wenlong and He, Tao and Peng, Pai and Shen, Jianbing},
    title     = {Semantic Causality-Aware Vision-Based 3D Occupancy Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
asset		asset
build		build
configs		configs
grad_cam		grad_cam
mmdet3d		mmdet3d
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌟 Highlights

📋 Table of Contents

🛠 Model Zoo

🚀 Get Started

1. Installation

2. Data Preparation

nuScenes

Waymo

3. Pre-trained Models

🎮 Usage

Training

Testing

Benchmarking

Visualization

📊 Main Results

🤝 Contribution

🙏 Acknowledgement

📜 Citation

About

Uh oh!

Releases

Packages

Languages

License

cdb342/OccStudio

Folders and files

Latest commit

History

Repository files navigation

🌟 Highlights

📋 Table of Contents

🛠 Model Zoo

🚀 Get Started

1. Installation

2. Data Preparation

nuScenes

Waymo

3. Pre-trained Models

🎮 Usage

Training

Testing

Benchmarking

Visualization

📊 Main Results

🤝 Contribution

🙏 Acknowledgement

📜 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages