Tianye Ding1*,
Yiming Xie1*,
Yiqing Liang2*,
Moitreya Chatterjee3,
Pedro Miraldo3,
Huaizu Jiang1
1 Northeastern University, 2 Independent Researcher, 3 Mitsubishi Electric Research Laboratories
* Equal Contribution
- [2025-12-15] ArXiv preprint released.
- Release framework codebase
- Release inference code
- Add data preparation instruction
- Release evaluation code
- Add Viser integration
- Release loop-closure demo
We propose LASER, a training-free framework that converts an offline reconstruction model into a streaming system by aligning predictions across consecutive temporal windows. We observe that simple similarity transformation (Sim(3)) alignment fails due to layer depth misalignment: monocular scale ambiguity causes relative depth scales of different scene layers to vary inconsistently between windows. To address this, we introduce layer-wise scale alignment, which segments depth predictions into discrete layers, computes per-layer scale factors, and propagates them across both adjacent windows and timestamps.
# 1. Clone the repository
git clone --recursive git@github.com:neu-vi/LASER.git
cd LASER
# 2. Create environment
conda create -n laser -y python=3.11
conda activate laser
# 3. Install dependencies
pip install -r requirements.txt
# 4. Compile cython modules
python setup.py build_ext --inplace
# 5. Install Viser
pip install -e viser(Optional) Download checkpoints needed for loop-closure inference
bash ./scripts/download_weights.shTo run the inference code, you can use the following command:
export PYTHONPATH="./":$PYTHONPATH
python demo.py \
--data_path DATA_PATH \
--output_path "./viser_results" \
--cache_path "./cache" \
--sample_interval SAMPLE_INTERVAL \
--window_size WINDOW_SIZE \
--overlap OVERLAP \
--depth_refine
# example inference script
python demo.py \
--data_path "examples/titanic" \
--output_path "./viser_results" \
--cache_path "./cache" \
--sample_interval 1 \
--window_size 30 \
--overlap 10 \
--depth_refineThe results will be saved in the viser_results/SEQ_NAMEdirectory for future visualization.
To visualize the interactive 4D results, you can use the following command:
python viser/visualizer_monst3r.py --data viser_results/SEQ_NAME
# example visualization script
python viser/visualizer_monst3r.py --data viser_results/titanicPlease refer to MonST3R for dataset setup details.
Put all datasets in data/.
Sintel
export PYTHONPATH="./":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=sintel \
--output_dir="outputs/video_depth/sintel_depth" \
--full_seq \
--no_crop
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 depth_metric.py \
--eval_dataset=sintel \
--result_dir="outputs/video_depth/sintel_depth" \
--output_dir="outputs/video_depth"Bonn
export PYTHONPATH="./":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=bonn \
--output_dir="outputs/video_depth/bonn_depth" \
--no_crop
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 depth_metric.py \
--eval_dataset=bonn \
--result_dir="outputs/video_depth/bonn_depth" \
--output_dir="outputs/video_depth"KITTI
export PYTHONPATH="./":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=kitti \
--output_dir="outputs/video_depth/kitti_depth" \
--no_crop \
--flow_loss_weight 0 \
--translation_weight 1e-3
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 depth_metric.py \
--eval_dataset=kitti \
--result_dir="outputs/video_depth/kitti_depth" \
--output_dir="outputs/video_depth"Sintel
export PYTHONPATH="./":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=sintel \
--output_dir="outputs/cam_pose/sintel_pose"ScanNet
export PYTHONPATH="./":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=scannet \
--output_dir="outputs/cam_pose/scannet_pose"TUM
export PYTHONPATH="./":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=tum \
--output_dir="outputs/cam_pose/tum_pose"If you find this repository useful in your research, please consider giving a star ⭐ and a citation
@article{ding2025laser,
title={LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction},
author={Ding, Tianye and Xie, Yiming and Liang, Yiqing and Chatterjee, Moitreya and Miraldo, Pedro and Jiang, Huaizu},
year={2025}
}We would like to thank the authors for the following excellent open source projects: VGGT, π3, MonST3R, CUT3R, VGGT-Long and many other inspiring works in the community.