Masked Scene Modeling

Pedro Hermosilla and Christian Stippel and Leon Sick

This is the official repository of the CVPR 2025 paper Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding, and it will be mainly used to share the inference and pre-training code, as well as the pre-trained weights for the models described in our paper.

[Arxiv] - [Project]

Self-Supervised Feature Visualization using PCA

Abstract

Self-supervised learning has transformed 2D computer vision by enabling models trained on large, unannotated datasets to provide versatile off-the-shelf features that perform similarly to models trained with labels. However, in 3D scene understanding, self-supervised methods are typically only used as a weight initialization step for task-specific fine-tuning, limiting their utility for general-purpose feature extraction. This paper addresses this shortcoming by proposing a robust evaluation protocol specifically designed to assess the quality of self-supervised features for 3D scene understanding. Our protocol uses multi-resolution feature sampling of hierarchical models to create rich point-level representations that capture the semantic capabilities of the model and, hence, are suitable for evaluation with linear probing and nearest-neighbor methods. Furthermore, we introduce the first self-supervised model that performs similarly to supervised models when only off-the-shelf features are used in a linear probing setup.In particular, our model is trained natively in 3D with a novel self-supervised approach based on a Masked Scene Modeling objective, which reconstructs deep features of masked patches in a bottom-up manner and is specifically tailored to hierarchical 3D models. Our experiments not only demonstrate that our method achieves competitive performance to supervised models, but also surpasses existing self-supervised approaches by a large margin.

Project structure

We provide code and instructions for the training, inference, and visualization of our model:

msm
├── inference/
│   ├── model/
│   ├── transform/
│   ├── utils/
│   ├── README.md
│   ├── dependencies.sh
│   ├── inference.py
│   ├── model_cfg.py
│   ├── preprocessor.py
│   └── requirements.txt
├── training/
│   ├── configs/
│   ├── libs/
│   ├── pointcept/
│   ├── tools/
│   ├── README.md
│   ├── dependencies.sh
│   └── requirements.txt
├── visualization/
│   ├── README.md
│   ├── blender_load_colored_pc.py
│   ├── blender_template.blend
│   ├── create_pca.py
├── LICENSE
└── README.md

Training

We provide the code to train the self-supervised model by yourself and to evaluate the pre-trained model on the experiments presented in the paper. You can find mode information on the training folder.

Inference

Our model can generate semantic off-the-shelf feature embeddings for any 3D scene. For this, we provide an installation manual and an inference script that works out of the box. To generate embeddings for your scene using our pre-trained model, please refer to the inference folder.

Embedding visualization

We also provide the code used to generate the scene visualizations in our paper. If you want to visualize your point embeddings using PCA, follow the instructions in the visualization folder.

Schedule

Future Releases:

Citation

If you find our work useful to your research, please cite our work as an acknowledgment.

@inproceedings{hermosilla2025msm,
    title={Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding}, 
    author={Hermosilla, Pedro and Stippel, Christian and Sick, Leon},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Masked Scene Modeling

Abstract

Project structure

Training

Inference

Embedding visualization

Schedule

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
imgs		imgs
inference		inference
training		training
visualization		visualization
LICENSE		LICENSE
README.md		README.md

License

phermosilla/msm

Folders and files

Latest commit

History

Repository files navigation

Masked Scene Modeling

Abstract

Project structure

Training

Inference

Embedding visualization

Schedule

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages