Markus Gross1,2,📧, Aya Fahmy1, Danit Niwattananan2, Dominik Muhle2, Rui Song1,2, Daniel Cremers 2, Henri Meeß1
1 Fraunhofer IVI, 2 TU Munich
Neural Information Processing Systems (NeurIPS) 2025
- [2025/11]: Code released
- [2025/10]: Project page online
- [2025/09]: IPFormer accepted at NeurIPS 2025
- [2025/06]: Paper preprint available on arXiv
We present IPFormer, the first method that leverages context-adaptive instance proposals at train and test time to address vision-based 3D Panoptic Scene Completion. Recent Transformer-based 3D vision approaches like DETR or Symphonies utilize a fixed set of learned queries to represent objects within the scene volume. Although these queries are typically updated with image context during training, they remain static at test time, limiting their ability to dynamically adapt specifically to the observed scene. To overcome this limitation, IPFormer initializes these static, non-adaptiv queries as instance proposals, which are adaptively derived from image context at both train and test time. Extensive experimental results show that our method achieves state-of-the-art in-domain performance, exhibits superior zero-shot generalization on out-of-domain data, and offers a runtime reduction exceeding 14x. Check the IPFormer YouTube Video for full explanation incl. audio.
Follow install.md
- SemanticKITTI for training
- SSCBench-KITTI-360 for zero-shot generalization evaluation
Follow dataset.md to process and structure them correctly.
We use three pretrained models:
- MobileStereoNetV2 depth estimation model used for lifting 2D to 3D
- EfficientNet image backbone for image feature extraction
- CGFormer's pretrained depth refinement module
Create the necessary directories and get the checkpoints as follows:
# Create required folders
mkdir pretrain
mkdir ckpts
# Depth refinement module
cd pretrain
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/pretrain_geodepth.pth
cd ..
# Image backbone and geodepth weights
cd ckpts
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/efficientnet-b7_3rdparty_8xb32-aa_in1k_20220119-bf03951c.pth
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/ipformer_semkitti_v1.0.0.part_aa
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/ipformer_semkitti_v1.0.0.part_ab
cat ipformer_semkitti_v1.0.0.part_* > ipformer_semkitti.ckpt
Our framework follows a dual-head, two-stage pipeline: The first stage and its corresponding head addresses Semantic Scene Completion (SSC), effectively guiding the latent space towards geometry and semantics. The second training stage and its corresponging head registers individual instances, addressing full Panoptic Scene Completion (PSC).
First, run Stage 1 (SSC-only) to pretrain the semantic branch:
python train_stage1.py \
--config_path ./configs/IPFormer_config.py \
--log_folder IPFormer_SemanticKITTI \
--seed 7240 \
--log_every_n_steps 100
After Stage 1 finishes and the checkpoint is saved, continue with Stage 2 (full PSC):
python train_stage2.py \
--config_path ./configs/IPFormer_config.py \
--log_folder IPFormer_SemanticKITTI \
--seed 7240 \
--log_every_n_steps 100To evaluate a trained checkpoint directly:
python eval.py \
--config_path ./configs/IPFormer_config.py \
--ckpt_path ./ckpts/ipformer_semkitti.ckpt \
--output_dir ./outputs/
--measure_timeWe build upon and thank the following projects:
- CGFormer
- PaSCo
- VoxFormer
- BEVFormer
- MonoScene
- MobileStereoNet
- mmdet3d
- semantic-kitti-api
- EfficientNet
If your work has been missed, please contact us and we will be happy to include it.
If IPFormer has contributed to your work, we would appreciate citing our paper and giving the repository a star.
@inproceedings{gross2025ipformer,
title={{IPF}ormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals},
author={Markus Gross and Aya Fahmy and Danit Niwattananan and Dominik Muhle and Rui Song and Daniel Cremers and Henri Meeß},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
year={2025}
}
