IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals

Markus Gross^1,2,📧, Aya Fahmy¹, Danit Niwattananan², Dominik Muhle², Rui Song^1,2, Daniel Cremers ², Henri Meeß¹

Neural Information Processing Systems (NeurIPS) 2025

🚀 News

[2025/11]: Code released
[2025/10]: Project page online
[2025/09]: IPFormer accepted at NeurIPS 2025
[2025/06]: Paper preprint available on arXiv

💡 Introduction

We present IPFormer, the first method that leverages context-adaptive instance proposals at train and test time to address vision-based 3D Panoptic Scene Completion. Recent Transformer-based 3D vision approaches like DETR or Symphonies utilize a fixed set of learned queries to represent objects within the scene volume. Although these queries are typically updated with image context during training, they remain static at test time, limiting their ability to dynamically adapt specifically to the observed scene. To overcome this limitation, IPFormer initializes these static, non-adaptiv queries as instance proposals, which are adaptively derived from image context at both train and test time. Extensive experimental results show that our method achieves state-of-the-art in-domain performance, exhibits superior zero-shot generalization on out-of-domain data, and offers a runtime reduction exceeding 14x. Check the IPFormer YouTube Video for full explanation incl. audio.

⚙️ Environment Setup

1. Setup Script

Follow install.md

📁 Datasets

SemanticKITTI for training
SSCBench-KITTI-360 for zero-shot generalization evaluation

Follow dataset.md to process and structure them correctly.

📋 Download Checkpoints

We use three pretrained models:

MobileStereoNetV2 depth estimation model used for lifting 2D to 3D
EfficientNet image backbone for image feature extraction
CGFormer's pretrained depth refinement module

Create the necessary directories and get the checkpoints as follows:

# Create required folders
mkdir pretrain
mkdir ckpts

# Depth refinement module
cd pretrain
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/pretrain_geodepth.pth
cd ..
# Image backbone and geodepth weights
cd ckpts
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/efficientnet-b7_3rdparty_8xb32-aa_in1k_20220119-bf03951c.pth
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/ipformer_semkitti_v1.0.0.part_aa
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/ipformer_semkitti_v1.0.0.part_ab
cat ipformer_semkitti_v1.0.0.part_* > ipformer_semkitti.ckpt

💪 Training

Our framework follows a dual-head, two-stage pipeline: The first stage and its corresponding head addresses Semantic Scene Completion (SSC), effectively guiding the latent space towards geometry and semantics. The second training stage and its corresponging head registers individual instances, addressing full Panoptic Scene Completion (PSC).

First, run Stage 1 (SSC-only) to pretrain the semantic branch:

python train_stage1.py \
  --config_path ./configs/IPFormer_config.py \
  --log_folder IPFormer_SemanticKITTI \
  --seed 7240 \
  --log_every_n_steps 100

After Stage 1 finishes and the checkpoint is saved, continue with Stage 2 (full PSC):

python train_stage2.py \
  --config_path ./configs/IPFormer_config.py \
  --log_folder IPFormer_SemanticKITTI \
  --seed 7240 \
  --log_every_n_steps 100

🔢 Evaluation

To evaluate a trained checkpoint directly:

python eval.py \
  --config_path ./configs/IPFormer_config.py \
  --ckpt_path ./ckpts/ipformer_semkitti.ckpt \
  --output_dir ./outputs/
  --measure_time

🙏 Acknowledgements

We build upon and thank the following projects:

If your work has been missed, please contact us and we will be happy to include it.

📜 Citation

If IPFormer has contributed to your work, we would appreciate citing our paper and giving the repository a star.

@inproceedings{gross2025ipformer,
  title={{IPF}ormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals},
  author={Markus Gross and Aya Fahmy and Danit Niwattananan and Dominik Muhle and Rui Song and Daniel Cremers and Henri Meeß},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2025}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LightningTools		LightningTools
Panoptic		Panoptic
configs		configs
media		media
mmdet3d_plugin		mmdet3d_plugin
packages		packages
preprocess		preprocess
tools		tools
install.md		install.md
LICENSE		LICENSE
README.md		README.md
dataset.md		dataset.md
eval.py		eval.py
generalization_kitti360.py		generalization_kitti360.py
inference_to_pkl.py		inference_to_pkl.py
misc.py		misc.py
requirements.txt		requirements.txt
train_stage1.py		train_stage1.py
train_stage2.py		train_stage2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals

🚀 News

💡 Introduction

⚙️ Environment Setup

1. Setup Script

📁 Datasets

📋 Download Checkpoints

💪 Training

🔢 Evaluation

🙏 Acknowledgements

📜 Citation

About

Uh oh!

Releases 1

Languages

License

markus-42/IPFormer

Folders and files

Latest commit

History

Repository files navigation

IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals

🚀 News

💡 Introduction

⚙️ Environment Setup

1. Setup Script

📁 Datasets

📋 Download Checkpoints

💪 Training

🔢 Evaluation

🙏 Acknowledgements

📜 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages