The ability to abstract complex 3D environments into simplified and structured representations is crucial across various domains. 3D semantic scene graphs (SSGs) achieve this by representing objects as nodes and their interrelationships as edges, facilitating high-level scene understanding. Existing methods for 3D SSG generation, however, face significant challenges, including high computational demands and non-incremental processing that hinder their suitability for real-time open-world applications. To address this issue, we propose FROSS (Faster-than-Real-Time Online 3D Semantic Scene Graph Generation), an innovative approach for online and faster-than-real-time 3D SSG generation that leverages the direct lifting of 2D scene graphs to 3D space and represents objects as 3D Gaussian distributions. This framework eliminates the dependency on precise and computationally-intensive point cloud processing. Furthermore, we extend the Replica dataset with inter-object relationship annotations, creating the ReplicaSSG dataset for comprehensive evaluation of FROSS. The experimental results from evaluations on ReplicaSSG and 3DSSG datasets show that FROSS can achieve superior performance while operating significantly faster than prior 3D SSG generation methods.
- Installation
- Prepare Dataset
- Download Pretrained RT-DETR-EGTR Weights
- Pretrain RT-DETR Object Detector on 3RScan and ReplicaSSG (Optional)
- Train RT-DETR-EGTR 2D Scene Graph Generator on 3RScan and ReplicaSSG (Optional)
- Estimate Camera Trajectory for ReplicaSSG Using ORB-SLAM3 (Optional)
- Run FROSS
- Evaluate FROSS
- Visualize Output
- Citation
- References
Tested with Python 3.9 and CUDA 12.1 on Ubuntu 22.04.4.
- libvips-dev
git clone https://github.com/Howardkhh/FROSS.git
cd FROSS
pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
cd EGTR/lib/fpn
sh make.sh
cd ../../..Agree to the terms of use and get the download script from here and save it as 3RScan.py.
You may want to parallelize the script for faster download speed.
python 3RScan.py -o Datasets/3RScan/data
wget "http://campar.in.tum.de/public_datasets/3RScan/3RScan.json" -P Datasets/3RScan/data
wget "http://campar.in.tum.de/public_datasets/3DSSG/3DSSG/objects.json" -P Datasets/3RScan/data
wget "http://campar.in.tum.de/public_datasets/3DSSG/3DSSG/relationships.json" -P Datasets/3RScan/datagit clone https://github.com/WaldJohannaU/3RScan.git
cd 3RScan/c++Build the rio_renderer (not rio_example) following the instructions in the 3RScan repository.
Encounter error when building rio_renderer?
If you encounter error similar to the below when building rio_renderer with make command:
[ 50%] Linking CXX executable rio_renderer
/usr/bin/ld: CMakeFiles/rio_renderer.dir/src/renderer.cc.o: warning: relocation against `__glewGenVertexArrays' in read-only section `.text._ZN5Model11processMeshEP6aiMeshPK7aiScene[_ZN5Model11processMeshEP6aiMeshPK7aiScene]'
/usr/bin/ld: CMakeFiles/rio_renderer.dir/src/renderer.cc.o: in function `RIO::Renderer::ReadRGB(cv::Mat&)': renderer.cc:(.text+0x1ed0): undefined reference to `__glewBindFramebuffer'
/usr/bin/ld: CMakeFiles/rio_renderer.dir/src/renderer.cc.o: in function `RIO::Renderer::Render(Model&, Shader&)': renderer.cc:(.text+0x2d95): undefined reference to `__glewUseProgram'
/usr/bin/ld: renderer.cc:(.text+0x2ddf): undefined reference to `__glewUniformMatrix4fv'
usr/bin/ld: renderer.cc:(.text+0x2ed8): undefined reference to `__glewGetUniformLocation'
usr/bin/ld: renderer.cc:(.text+0x3016): undefined reference to `__glewUniform1i'
usr/bin/ld: renderer.cc:(.text+0x3042): undefined reference to `__glewGetUniformLocation'
usr/bin/ld: renderer.cc:(.text+0x322e): undefined reference to `__glewActiveTexture'
usr/bin/ld: renderer.cc:(.text+0x35aa): undefined reference to `__glewBindVertexArray'
usr/bin/ld: renderer.cc:(.text+0x35cf): undefined reference to `__glewBindVertexArray'
usr/bin/ld: renderer.cc:(.text+0x35ea): undefined reference to `__glewActiveTexture'Try patching the CMakeLists.txt file with the following:
cd ../../.. # back to 3RScan directory
git apply ../Scripts/files/rio_renderer.patch
cd c++/rio_renderer/build
make # and try to make againRender depth maps from the 3RScan dataset using the renderer.
You may need a vnc server to run the renderer in a headless environment.
(For example: vncserver && export DISPLAY=:1.0)
cd ../../../.. # back to FROSS directory
python3 Scripts/dataset/extract_and_preprocess_3RScan.py --path ./Datasets/3RScan/ --rio_renderer_path ./3RScan/c++/rio_renderer/build/Check data integrity.
python Scripts/dataset/check.py --path Datasets/3RScanThe output should look like below.
Number of folders: 1482
Number of folders with sequence folder: 1482
Number of folders with all images: 1482
Number of images: 363555
Number of images with bounding box files: 363555
Number of rendered color images: 363555
Number of rendered depth images: 363555
Number of rendered label images: 363555
Number of visibility files: 363555
Number of instance files: 363555cd Scripts
bash prepare_datasets.sh
cd ..Download and process the ReplicaSSG dataset according to the instructions.
Move the dataset folder to ./Datasets
# For example
mv ~/ReplicaSSG/Replica ./DatasetsAnd extract 2D scene graphs from the ReplicaSSG dataset.
python Scripts/dataset/boxes2coco.py --path ./Datasets/Replica --label_categories replicaIf you want to train RT-DETR-EGTR on the visual genome dataset, please download the visual genome dataset according to this instruction.
The file structure should look like this:
Datasets
└── visual_genome
├── images
├── rel.json
├── test.json
├── train.json
└── val.jsonYou can download the pretrained RT-DETR-EGTR weights from the following links:
Extract and put them into the weights/RT-DETR-EGTR directory. You may skip the next two steps if you have downloaded the pretrained weights.
mkdir -p weights/RT-DETR-EGTR
cd weights/RT-DETR-EGTR
# Put the downloaded weight zip files here
unzip 3RScan20.zip
unzip VG.zip
cd ../..Export the model to ONNX and TensorRT format:
# 3RScan dataset
PYTHONPATH=. python Scripts/tools/export_onnx_trt.py --artifact_path weights/RT-DETR-EGTR/3RScan20/egtr__RT-DETR__3RScan20__last.pth/batch__6__epochs__50_25__lr__2e-07_2e-06_0.0002__finetune/version_0
# ReplicaSSG dataset
PYTHONPATH=. python Scripts/tools/export_onnx_trt.py --artifact_path weights/RT-DETR-EGTR/VG/egtr__RT-DETR__VG__last.pth/batch__6__epochs__50_25__lr__2e-07_2e-06_2e-05__finetune/version_0mkdir -p weights/RT-DETR
wget https://github.com/lyuwenyu/storage/releases/download/v0.1/rtdetrv2_r50vd_m_7x_coco_ema.pth -P weights/RT-DETR/# 3RScan dataset
export NUM_PROC=4 # number of GPUs
OMP_NUM_THREADS=4 torchrun --log_dir logs/RT-DETR -r 3 -t 3 --master_port=9909 --nproc_per_node=$NUM_PROC RT-DETR/rtdetrv2_pytorch/tools/train.py -c RT-DETR/rtdetrv2_pytorch/configs/rtdetrv2/rtdetrv2_r50vd_m_7x_3rscan20.yml -t weights/RT-DETR/rtdetrv2_r50vd_m_7x_coco_ema.pth --output-dir weights/RT-DETR/3RScan20 --use-amp --seed=0
# ReplicaSSG dataset
OMP_NUM_THREADS=4 torchrun --log_dir logs/RT-DETR -r 3 -t 3 --master_port=9909 --nproc_per_node=$NUM_PROC RT-DETR/rtdetrv2_pytorch/tools/train.py -c RT-DETR/rtdetrv2_pytorch/configs/rtdetrv2/rtdetrv2_r50vd_m_7x_vg.yml -t weights/RT-DETR/rtdetrv2_r50vd_m_7x_coco_ema.pth --output-dir weights/RT-DETR/VG --use-amp --seed=0# 3RScan dataset
OMP_NUM_THREADS=4 torchrun --master_port=9909 --nproc_per_node=$NUM_PROC RT-DETR/rtdetrv2_pytorch/tools/train.py -c RT-DETR/rtdetrv2_pytorch/configs/rtdetrv2/rtdetrv2_r50vd_m_7x_3rscan20.yml -r weights/RT-DETR/3RScan20/last.pth --test-only
# ReplicaSSG dataset
OMP_NUM_THREADS=4 torchrun --master_port=9909 --nproc_per_node=$NUM_PROC RT-DETR/rtdetrv2_pytorch/tools/train.py -c RT-DETR/rtdetrv2_pytorch/configs/rtdetrv2/rtdetrv2_r50vd_m_7x_vg.yml -r weights/RT-DETR/VG/last.pth --test-onlycd EGTR
# 3RScan dataset
python train_rtdetr_egtr.py --data_path ../Datasets/3RScan/2DSG20 --output_path ../weights/RT-DETR-EGTR/3RScan20 --pretrained ../weights/RT-DETR/3RScan20/last.pth --gpus $NUM_PROC
# ReplicaSSG dataset
python train_rtdetr_egtr.py --data_path ../Datasets/visual_genome --output_path ../weights/RT-DETR-EGTR/VG --pretrained ../weights/RT-DETR/VG/last.pth --gpus $NUM_PROC --lr_initialized 2e-5
cd ..Please change the artifact path to the path of the trained model.
# 3RScan dataset
PYTHONPATH=. python Scripts/tools/export_onnx_trt.py --artifact_path weights/RT-DETR-EGTR/3RScan20/egtr__RT-DETR__3RScan20__last.pth/batch__24__epochs__50_25__lr__2e-07_2e-06_0.0002__finetune/version_0
# ReplicaSSG dataset
PYTHONPATH=. python Scripts/tools/export_onnx_trt.py --artifact_path weights/RT-DETR-EGTR/VG/egtr__RT-DETR__VG__last.pth/batch__24__epochs__50_25__lr__2e-07_2e-06_2e-05__finetune/version_0# 3RScan dataset
python EGTR/evaluate_rtdetr_egtr.py --data_path Datasets/3RScan/2DSG20 --artifact_path weights/RT-DETR-EGTR/3RScan20/egtr__RT-DETR__3RScan20__last.pth/batch__24__epochs__50_25__lr__2e-07_2e-06_0.0002__finetune/version_0
# ReplicaSSG dataset
python EGTR/evaluate_rtdetr_egtr.py --data_path Datasets/visual_genome/ --artifact_path weights/RT-DETR-EGTR/VG/egtr__RT-DETR__VG__last.pth/batch__24__epochs__50_25__lr__2e-07_2e-06_2e-05__finetune/version_01. Build ORB-SLAM3 following the instructions.
cd Scripts/dataset
python generate_association.py --replica_path ../../Datasets/ReplicaWe found that the ORB_SLAM3::System sometimes crashes when closing the viewer. Please consider turning off the viewer by setting the fourth argument to false on Line 62 in the ORB_SLAM3/Examples/RGB-D/rgbd_tum.cc file, and rebuild.
// Create SLAM system. It initializes all system threads and gets ready to process frames.
ORB_SLAM3::System SLAM(argv[1],argv[2],ORB_SLAM3::System::RGBD,false);If you want to use the viewer for visualization, you may need a vnc server to run the ORB-SLAM3 in a headless environment.
(For example: vncserver && export DISPLAY=:1.0)
In addition, please fix the src/Sim3Solver.cc file in the ORB-SLAM3 repository to solve nan errors according to this issue.
# Example bash script to run ORB-SLAM3 on all scenes in ReplicaSSG
cd <orbslam_path> # ~/ORB_SLAM3
set -euxo pipefail
scenes=("room_0" "room_1" "room_2" "hotel_0" "frl_apartment_0" "frl_apartment_1" "frl_apartment_2" "frl_apartment_3" "frl_apartment_4" "frl_apartment_5" "apartment_0" "apartment_1" "apartment_2" "office_0" "office_1" "office_2" "office_3" "office_4")
replica_path=<replica_path> # path to the Replica dataset (e.g. ~/FROSS/Datasets/Replica)
# Activate Python2 virtual environment with Numpy and Matplotlib installed
source .py2_venv/bin/activate
for scene in "${scenes[@]}"
do
./Examples/RGB-D/rgbd_tum Vocabulary/ORBvoc.txt ${replica_path}/ReplicaSSG/ORBSLAM3_parameters.yaml ${replica_path}/data/${scene} ${replica_path}/data/${scene}/association.txt
mv CameraTrajectory.txt CameraTrajectory_${scene}.txt
mv KeyFrameTrajectory.txt KeyFrameTrajectory_${scene}.txt
python2 evaluation/evaluate_ate_scale.py ${replica_path}/data/${scene}/trajectory_gt.txt CameraTrajectory_${scene}.txt --plot ${scene}.pdf --verbose --verbose2 > ATE_scale_${scene}.txt
donecd Scripts/dataset
python convert_SLAM_trajectory.py --replica_path ../../Datasets/Replica --orbslam_path <orbslam_path>
cd ../..main.py parameters
--use_gt_sg: Use the ground truth 2D scene graph instead of RT-DETR-EGTR prediction.--not_use_gt_pose: Use SLAM trajectory instead of ground truth camera pose.--not_preload: Do not preload all images into memory prior to running each scene. Set this if you run out of RAM. Disable this if you are measuring runtime performance.
cd Merging
# 3RScan dataset
python main.py --artifact_path ../weights/RT-DETR-EGTR/3RScan20/egtr__RT-DETR__3RScan20__last.pth/batch__6__epochs__50_25__lr__2e-07_2e-06_0.0002__finetune/version_0/ --dataset_path ../Datasets/3RScan
# With ground truth 2D scene graph
python main.py --dataset_path ../Datasets/3RScan --use_gt_sg
# ReplicaSSG dataset
python main.py --artifact_path ../weights/RT-DETR-EGTR/VG/egtr__RT-DETR__VG__last.pth/batch__6__epochs__50_25__lr__2e-07_2e-06_2e-05__finetune/version_0/ --dataset_path ../Datasets/Replica --label_categories replica
# With ground truth 2D scene graph
python main.py --dataset_path ../Datasets/Replica --label_categories replica --use_gt_sg
# With SLAM trajectory
python main.py --artifact_path ../weights/RT-DETR-EGTR/VG/egtr__RT-DETR__VG__last.pth/batch__6__epochs__50_25__lr__2e-07_2e-06_2e-05__finetune/version_0/ --dataset_path ../Datasets/Replica --label_categories replica --not_use_gt_pose# 3RScan dataset
python evaluate.py --dataset_path ../Datasets/3RScan/ --prediction_path output/scannet/predictions_gaussian_obj0.7_rel10_hell0.85_kfnone_test_gtpose.pkl
# With ground truth 2D scene graph
python evaluate.py --dataset_path ../Datasets/3RScan/ --prediction_path output/scannet/predictions_gaussian_obj0.7_rel10_hell0.85_kfnone_test_gt2dsg_gtpose.pkl
# ReplicaSSG dataset
python evaluate.py --dataset_path ../Datasets/Replica/ --label_categories replica --prediction_path output/replica/predictions_gaussian_obj0.7_rel10_hell0.85_kfnone_test_gtpose.pkl
# With ground truth 2D scene graph
python evaluate.py --dataset_path ../Datasets/Replica/ --label_categories replica --prediction_path output/replica/predictions_gaussian_obj0.7_rel10_hell0.85_kfnone_test_gt2dsg_gtpose.pkl
# With SLAM trajectory
python evaluate.py --dataset_path ../Datasets/Replica/ --label_categories replica --prediction_path output/replica/predictions_gaussian_obj0.7_rel10_hell0.85_kfnone_test.pklTo generate visualization videos as shown in our project page (using ReplicaSSG as an example):
1. Specify --visualize_folder in main.py.
python main.py --artifact_path ../weights/RT-DETR-EGTR/VG/egtr__RT-DETR__VG__last.pth/batch__6__epochs__50_25__lr__2e-07_2e-06_2e-05__finetune/version_0/ --dataset_path ../Datasets/Replica --label_categories replica --visualize_folder <visualization_output_folder>2. Use the render.sh script in Visualization folder to visualize the output.
cd Visualization
bash render.sh ../../Datasets/Replica/data ../<visualization_output_folder>@InProceedings{hou2025fross,
author = {Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira, and Yasutomo Kawanishi},
title = {{FROSS}: {F}aster-than-{R}eal-{T}ime {O}nline 3{D} {S}emantic {S}cene {G}raph {G}eneration from {RGB-D} {I}mages},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {28818-28827}
}
