Skip to content

neu-vi/LASER

Repository files navigation

LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction

arXiv Project Page

Tianye Ding1*, Yiming Xie1*, Yiqing Liang2*, Moitreya Chatterjee3, Pedro Miraldo3, Huaizu Jiang1
1 Northeastern University, 2 Independent Researcher, 3 Mitsubishi Electric Research Laboratories
* Equal Contribution

📢 Updates

  • [2025-12-15] ArXiv preprint released.

📝 To-Do List

  • Release framework codebase
  • Release inference code
  • Add data preparation instruction
  • Release evaluation code
  • Add Viser integration
  • Release loop-closure demo

💡 Abstract

We propose LASER, a training-free framework that converts an offline reconstruction model into a streaming system by aligning predictions across consecutive temporal windows. We observe that simple similarity transformation (Sim(3)) alignment fails due to layer depth misalignment: monocular scale ambiguity causes relative depth scales of different scene layers to vary inconsistently between windows. To address this, we introduce layer-wise scale alignment, which segments depth predictions into discrete layers, computes per-layer scale factors, and propagates them across both adjacent windows and timestamps.

🛠️ Installation

# 1. Clone the repository
git clone --recursive git@github.com:neu-vi/LASER.git
cd LASER

# 2. Create environment
conda create -n laser -y python=3.11
conda activate laser

# 3. Install dependencies
pip install -r requirements.txt

# 4. Compile cython modules
python setup.py build_ext --inplace

# 5. Install Viser
pip install -e viser

(Optional) Download checkpoints needed for loop-closure inference

bash ./scripts/download_weights.sh

🚀 Usage

Inference

To run the inference code, you can use the following command:

export PYTHONPATH="./":$PYTHONPATH

python demo.py \
--data_path DATA_PATH \
--output_path "./viser_results" \
--cache_path "./cache" \
--sample_interval SAMPLE_INTERVAL \
--window_size WINDOW_SIZE \
--overlap OVERLAP \
--depth_refine

# example inference script
python demo.py \
--data_path "examples/titanic" \
--output_path "./viser_results" \
--cache_path "./cache" \
--sample_interval 1 \
--window_size 30 \
--overlap 10 \
--depth_refine

The results will be saved in the viser_results/SEQ_NAMEdirectory for future visualization.

Visualization

To visualize the interactive 4D results, you can use the following command:

python viser/visualizer_monst3r.py --data viser_results/SEQ_NAME

# example visualization script
python viser/visualizer_monst3r.py --data viser_results/titanic

Evaluation

Please refer to MonST3R for dataset setup details.

Put all datasets in data/.

Video Depth

Sintel

export PYTHONPATH="./":$PYTHONPATH

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=sintel \
--output_dir="outputs/video_depth/sintel_depth" \
--full_seq \
--no_crop

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 depth_metric.py \
--eval_dataset=sintel \
--result_dir="outputs/video_depth/sintel_depth" \
--output_dir="outputs/video_depth"

Bonn

export PYTHONPATH="./":$PYTHONPATH

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=bonn \
--output_dir="outputs/video_depth/bonn_depth" \
--no_crop

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 depth_metric.py \
--eval_dataset=bonn \
--result_dir="outputs/video_depth/bonn_depth" \
--output_dir="outputs/video_depth"

KITTI

export PYTHONPATH="./":$PYTHONPATH

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=kitti \
--output_dir="outputs/video_depth/kitti_depth" \
--no_crop \
--flow_loss_weight 0 \
--translation_weight 1e-3

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 depth_metric.py \
--eval_dataset=kitti \
--result_dir="outputs/video_depth/kitti_depth" \
--output_dir="outputs/video_depth"

Camera Pose

Sintel

export PYTHONPATH="./":$PYTHONPATH

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=sintel \
--output_dir="outputs/cam_pose/sintel_pose"

ScanNet

export PYTHONPATH="./":$PYTHONPATH

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=scannet \
--output_dir="outputs/cam_pose/scannet_pose"

TUM

export PYTHONPATH="./":$PYTHONPATH

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=12345 eval_launch.py \
--mode=eval_pose \
--model=streaming_pi3 \
--eval_dataset=tum \
--output_dir="outputs/cam_pose/tum_pose"

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation

@article{ding2025laser,
  title={LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction},
  author={Ding, Tianye and Xie, Yiming and Liang, Yiqing and Chatterjee, Moitreya and Miraldo, Pedro and Jiang, Huaizu},
  year={2025}
}

Acknowledgements

We would like to thank the authors for the following excellent open source projects: VGGT, π3, MonST3R, CUT3R, VGGT-Long and many other inspiring works in the community.

About

Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages