Skip to content

markus-42/IPFormer

Repository files navigation

IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals

Markus Gross1,2,📧Aya Fahmy1Danit Niwattananan2Dominik Muhle2Rui Song1,2Daniel Cremers 2Henri Meeß1

1 Fraunhofer IVI,      2 TU Munich

Neural Information Processing Systems (NeurIPS) 2025

Project   arXiv   NeurIPS 2025

IPFormer intro

🚀 News

  • [2025/11]: Code released
  • [2025/10]: Project page online
  • [2025/09]: IPFormer accepted at NeurIPS 2025
  • [2025/06]: Paper preprint available on arXiv

💡 Introduction

We present IPFormer, the first method that leverages context-adaptive instance proposals at train and test time to address vision-based 3D Panoptic Scene Completion. Recent Transformer-based 3D vision approaches like DETR or Symphonies utilize a fixed set of learned queries to represent objects within the scene volume. Although these queries are typically updated with image context during training, they remain static at test time, limiting their ability to dynamically adapt specifically to the observed scene. To overcome this limitation, IPFormer initializes these static, non-adaptiv queries as instance proposals, which are adaptively derived from image context at both train and test time. Extensive experimental results show that our method achieves state-of-the-art in-domain performance, exhibits superior zero-shot generalization on out-of-domain data, and offers a runtime reduction exceeding 14x. Check the IPFormer YouTube Video for full explanation incl. audio.

IPFormer intro

⚙️ Environment Setup

1. Setup Script

Follow install.md

📁 Datasets

Follow dataset.md to process and structure them correctly.


📋 Download Checkpoints

We use three pretrained models:

  1. MobileStereoNetV2 depth estimation model used for lifting 2D to 3D
  2. EfficientNet image backbone for image feature extraction
  3. CGFormer's pretrained depth refinement module

Create the necessary directories and get the checkpoints as follows:

# Create required folders
mkdir pretrain
mkdir ckpts

# Depth refinement module
cd pretrain
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/pretrain_geodepth.pth
cd ..
# Image backbone and geodepth weights
cd ckpts
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/efficientnet-b7_3rdparty_8xb32-aa_in1k_20220119-bf03951c.pth
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/ipformer_semkitti_v1.0.0.part_aa
wget https://github.com/markus-42/IPFormer/releases/download/v1.0.0/ipformer_semkitti_v1.0.0.part_ab
cat ipformer_semkitti_v1.0.0.part_* > ipformer_semkitti.ckpt

💪 Training

Our framework follows a dual-head, two-stage pipeline: The first stage and its corresponding head addresses Semantic Scene Completion (SSC), effectively guiding the latent space towards geometry and semantics. The second training stage and its corresponging head registers individual instances, addressing full Panoptic Scene Completion (PSC).

First, run Stage 1 (SSC-only) to pretrain the semantic branch:

python train_stage1.py \
  --config_path ./configs/IPFormer_config.py \
  --log_folder IPFormer_SemanticKITTI \
  --seed 7240 \
  --log_every_n_steps 100

After Stage 1 finishes and the checkpoint is saved, continue with Stage 2 (full PSC):

python train_stage2.py \
  --config_path ./configs/IPFormer_config.py \
  --log_folder IPFormer_SemanticKITTI \
  --seed 7240 \
  --log_every_n_steps 100

🔢 Evaluation

To evaluate a trained checkpoint directly:

python eval.py \
  --config_path ./configs/IPFormer_config.py \
  --ckpt_path ./ckpts/ipformer_semkitti.ckpt \
  --output_dir ./outputs/
  --measure_time

🙏 Acknowledgements

We build upon and thank the following projects:

If your work has been missed, please contact us and we will be happy to include it.


📜 Citation

If IPFormer has contributed to your work, we would appreciate citing our paper and giving the repository a star.

@inproceedings{gross2025ipformer,
  title={{IPF}ormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals},
  author={Markus Gross and Aya Fahmy and Danit Niwattananan and Dominik Muhle and Rui Song and Daniel Cremers and Henri Meeß},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2025}
  }

About

[NeurIPS 2025] Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals

Resources

License

Stars

Watchers

Forks