Hao Yu* •
Haotong Lin* •
Jiawei Wang* •
Jiaxin Li •
Yida Wang •
Xueyang Zhang •
Yue Wang
Xiaowei Zhou •
Ruizhen Hu •
Sida Peng
[2026-03] 🎉 Inference code of InfiniDepth (RGB Only & Depth Sensor Augmentation) is available now!
[2026-02] 🎉 InfiniDepth has been accepted to CVPR 2026! Code coming soon!
InfiniDepth supports three practical capabilities for single-image 3D perception and reconstruction:
| Capability | Input | Output |
|---|---|---|
| Monocular & Arbitrary-Resolution Depth Estimation | RGB Image | Arbitrary-Resolution Depth Map |
| Monocular View Synthesis | RGB Image | 3D Gaussian Splatting (3DGS) |
| Depth Sensor Augmentation (Monocular Metric Depth Estimation) | RGB Image + Depth Sensor | Metric Depth + 3D Gaussian Splatting (3DGS) |
Please see INSTALL.md for manual installation.
If you want to test InfiniDepth before running local CLI inference, start with the hosted demo:
- Hugging Face Space: https://huggingface.co/spaces/ritianyu/InfiniDepth
This repo also includes a Gradio Space entrypoint at app.py:
- Input: RGB image (required), depth map (optional)
- Task Switch:
Depth/3DGS - Model Switch:
InfiniDepth/InfiniDepth_DepthSensor
python app.py- In this demo,
InfiniDepth_DepthSensorrequires a depth map input; RGB-only inference should useInfiniDepth. - Supported depth formats in the demo upload:
.png,.npy,.npz,.h5,.hdf5,.exr.
| If you want ... | Recommended command |
|---|---|
| Relative Depth from Single RGB Image | bash example_scripts/infer_depth/courtyard_infinidepth.sh |
| 3D Gaussian from Single RGB Image | bash example_scripts/infer_gs/courtyard_infinidepth_gs.sh |
| Metric Depth from RGB + Depth Sensor | bash example_scripts/infer_depth/eth3d_infinidepth_depthsensor.sh |
| 3D Gaussian from RGB + Depth Sensor | bash example_scripts/infer_gs/eth3d_infinidepth_depthsensor_gs.sh |
| Multi-View / Video Depth + Global Point Cloud | bash example_scripts/infer_depth/waymo_multi_view_infinidepth.sh |
1. Relative Depth from Single RGB Image (inference_depth.py)
Use this when you want a relative depth map from a single RGB image and, optionally, a point cloud export.
Required input
RGB image
Required checkpoints
checkpoints/depth/infinidepth.ckptcheckpoints/moge-2-vitl-normal/model.ptrecover metric scale for point cloud export
Optional checkpoint
checkpoints/sky/skyseg.onnxadditional sky filtering
Recommended command
python inference_depth.py \
--input_image_path=example_data/image/courtyard.jpg \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \
--output_resolution_mode=upsample \
--upsample_ratio=2Replace example_data/image/courtyard.jpg with your own image path.
For the example above, outputs are written to
example_data/pred_depth/for the colorized depth mapexample_data/pred_pcd/for the exported point cloud when--save_pcd=True
Example scripts
bash example_scripts/infer_depth/courtyard_infinidepth.sh
bash example_scripts/infer_depth/camera_infinidepth.sh
bash example_scripts/infer_depth/eth3d_infinidepth.sh
bash example_scripts/infer_depth/waymo_infinidepth.shMost useful options
| Argument | What it controls |
|---|---|
--output_resolution_mode |
Choose upsample, original, or specific. |
--upsample_ratio |
Used when output_resolution_mode=upsample. |
--output_size |
Explicit output size (H,W) when output_resolution_mode=specific. |
--save_pcd |
Export a point cloud alongside the depth map. |
--fx_org --fy_org --cx_org --cy_org |
Camera intrinsics in the original image resolution. |
2. 3D Gaussian + Novel-View Video from Single RGB Image (inference_gs.py)
Use this when you want a 3D Gaussian export from a single RGB image and an optional novel-view video.
Required input
RGB image
Required checkpoints
checkpoints/depth/infinidepth.ckptcheckpoints/gs/infinidepth_gs.ckptcheckpoints/moge-2-vitl-normal/model.ptrecover metric scale for 3D Gaussian export
Optional checkpoint
checkpoints/sky/skyseg.onnxadditional sky filtering
Recommended command
python inference_gs.py \
--input_image_path=example_data/image/courtyard.jpg \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \
--gs_model_path=checkpoints/gs/infinidepth_gs.ckptReplace example_data/image/courtyard.jpg with your own image path.
For the example above, outputs are written to
example_data/pred_gs/InfiniDepth_courtyard_gaussians.plyexample_data/pred_gs/InfiniDepth_courtyard_novel_orbit.mp4
If --render_size is omitted, the novel-view video is rendered at the original input image resolution.
Example scripts
bash example_scripts/infer_gs/courtyard_infinidepth_gs.sh
bash example_scripts/infer_gs/camera_infinidepth_gs.sh
bash example_scripts/infer_gs/fruit_infinidepth_gs.sh
bash example_scripts/infer_gs/eth3d_infinidepth_gs.shMost useful options
| Argument | What it controls |
|---|---|
--render_novel_video |
Turn novel-view rendering on or off. |
--render_size |
Output video resolution (H,W). |
--novel_trajectory |
Camera motion type: orbit or swing. |
--sample_point_num |
Number of sampled points used for gaussian construction. |
--enable_skyseg_model |
Enable sky masking before gaussian sampling. |
--sample_sky_mask_dilate_px |
Dilate the sky mask before filtering. |
The exported
.plyfiles can be visualized in 3D viewers such as SuperSplat.
3. Depth Sensor Augmentation (Metric Depth and 3D Gaussian from RGB + Depth Sensor)
Use this mode when you have an RGB image plus metric depth from a depth sensor.
Required inputs
RGB imageSparse depthin.png,.npy,.npz,.h5,.hdf5, or.exr
Required checkpoints
checkpoints/depth/infinidepth_depthsensor.ckptcheckpoints/moge-2-vitl-normal/model.ptcheckpoints/gs/infinidepth_depthsensor_gs.ckpt
Required flags
--model_type=InfiniDepth_DepthSensor--input_depth_path=...
Metric Depth Inference Command
python inference_depth.py \
--input_image_path=example_data/image/eth3d_office.png \
--input_depth_path=example_data/depth/eth3d_office.npz \
--model_type=InfiniDepth_DepthSensor \
--depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
--fx_org=866.39 \
--fy_org=866.04 \
--cx_org=791.5 \
--cy_org=523.81 \
--output_resolution_mode=upsample \
--upsample_ratio=13D Gaussian Inference Command
python inference_gs.py \
--input_image_path=example_data/image/eth3d_office.png \
--input_depth_path=example_data/depth/eth3d_office.npz \
--model_type=InfiniDepth_DepthSensor \
--depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
--gs_model_path=checkpoints/gs/infinidepth_depthsensor_gs.ckpt \
--fx_org=866.39 \
--fy_org=866.04 \
--cx_org=791.5 \
--cy_org=523.81Example scripts
bash example_scripts/infer_depth/eth3d_infinidepth_depthsensor.sh
bash example_scripts/infer_depth/waymo_infinidepth_depthsensor.sh
bash example_scripts/infer_gs/eth3d_infinidepth_depthsensor_gs.sh
bash example_scripts/infer_gs/waymo_infinidepth_depthsensor_gs.shMost useful options
| Argument | What it controls |
|---|---|
--fx_org --fy_org --cx_org --cy_org |
Strongly recommended when you know the sensor intrinsics. |
--output_resolution_mode |
Output behavior for inference_depth.py. |
--render_size |
Video resolution for inference_gs.py. |
--output_ply_dir |
Custom output directory for gaussian export. |
4. Multi-View / Video Depth + Global Point Cloud (inference_multi_view_depth.py)
Use this when you want sequence-level depth inference from an RGB image folder or video, plus per-frame aligned point clouds and one merged global point cloud. The script runs DA3 once on the whole sequence, then aligns each InfiniDepth depth map to the corresponding DA3 depth map before export.
Required inputs
RGB image directory,single RGB image, orvideoSparse depthdirectory / single file / depth video when--model_type=InfiniDepth_DepthSensor
Required checkpoints / dependencies
checkpoints/depth/infinidepth.ckptfor RGB-only inferencecheckpoints/depth/infinidepth_depthsensor.ckptfor RGB + depth sensor inferencecheckpoints/moge-2-vitl-normal/model.ptrecover metric scale for RGB-only frame inferencedepth-anything-3installed in the current environment; default DA3 model isdepth-anything/DA3-LARGE-1.1
Optional checkpoint
checkpoints/sky/skyseg.onnxadditional sky filtering
RGB-Only Multi-View / Video Command
python inference_multi_view_depth.py \
--input_path=example_data/multi-view/waymo/image \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \RGB + Depth Sensor Multi-View / Video Command
python inference_multi_view_depth.py \
--input_path=example_data/multi-view/waymo/image \
--input_depth_path=example_data/multi-view/waymo/depth \
--model_type=InfiniDepth_DepthSensor \
--depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \For video input, replace --input_path with a video file. When --model_type=InfiniDepth_DepthSensor, --input_depth_path can also be a depth video and must contain the same number of frames as the RGB input.
For the RGB-only example above, outputs are written to
example_data/multi-view/waymo/pred_sequence/image/frames/depth/for aligned raw depth mapsexample_data/multi-view/waymo/pred_sequence/image/frames/depth_vis/for colorized depth mapsexample_data/multi-view/waymo/pred_sequence/image/frames/pcd/for per-frame aligned point cloudsexample_data/multi-view/waymo/pred_sequence/image/frames/meta/for per-frame camera and alignment metadataexample_data/multi-view/waymo/pred_sequence/image/da3/sequence_pose.npzfor cached DA3 predictionsexample_data/multi-view/waymo/pred_sequence/image/merged/sequence_merged.plyfor the merged global point cloud
Example scripts
bash example_scripts/infer_depth/waymo_multi_view_infinidepth.sh
bash example_scripts/infer_depth/waymo_multi_view_infinidepth_depthsensor.shMost useful options
| Argument | What it controls |
|---|---|
--input_path |
RGB image directory, single image, or video path. |
--input_depth_path |
Depth directory, single depth file, or depth video; required for InfiniDepth_DepthSensor. |
--input_mode |
Force images or video instead of auto detection. |
--align_to_da3_depth |
Align each InfiniDepth depth map to the corresponding DA3 depth map before export. |
--save_frame_pcd |
Save one aligned point cloud per frame. |
--save_merged_pcd |
Save the merged global point cloud across the whole sequence. |
--da3_scale_align_conf_threshold |
Minimum DA3 confidence used during per-frame scale estimation. |
--output_root |
Override the default pred_sequence/<sequence_name>/ output directory. |
5. Common Argument Conventions
| Argument | Used in | Description |
|---|---|---|
--input_image_path |
depth + gs | Path to the input RGB image. |
--input_path |
multi-view | Path to an RGB image directory, single image, or video. |
--input_depth_path |
depth + gs + multi-view | Optional metric depth prompt; required for InfiniDepth_DepthSensor. In multi-view mode this can be a depth directory, single depth file, or depth video. |
--model_type |
depth + gs + multi-view | InfiniDepth for RGB-only, InfiniDepth_DepthSensor for RGB + sparse depth. |
--depth_model_path |
depth + gs | Path to the depth checkpoint. |
--gs_model_path |
gs only | Path to the gaussian predictor checkpoint. |
--moge2_pretrained |
depth + gs | MoGe-2 checkpoint used when --input_depth_path is missing. |
--fx_org --fy_org --cx_org --cy_org |
depth + gs | Camera intrinsics in original image resolution. Missing values fall back to MoGe-2 estimates or image-size defaults. |
--input_size |
depth + gs | Network input size (H,W) used during inference. |
--enable_skyseg_model |
depth + gs + multi-view | Enable sky masking before depth or gaussian sampling. |
--sky_model_ckpt_path |
depth + gs | Path to the sky segmentation ONNX checkpoint. |
Depth output modes
--output_resolution_mode=upsample: output size =input_size * upsample_ratio--output_resolution_mode=original: output size = original input image size--output_resolution_mode=specific: output size =output_size
Default output directories
| Script | Default directory |
|---|---|
inference_depth.py depth images |
pred_depth/ next to your input data folder |
inference_depth.py point clouds |
pred_pcd/ next to your input data folder |
inference_gs.py gaussians and videos |
pred_gs/ next to your input data folder |
inference_multi_view_depth.py sequence outputs |
pred_sequence/<sequence_name>/ next to your input data folder |
We thank Yuanhong Yu, Gangwei Xu, Haoyu Guo and Chongjie Ye for their insightful discussions and valuable suggestions, and Zhen Xu for his dedicated efforts in curating the synthetic data.
If you find InfiniDepth useful in your research, please consider citing:
@article{yu2026infinidepth,
title={InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields},
author={Hao Yu, Haotong Lin, Jiawei Wang, Jiaxin Li, Yida Wang, Xueyang Zhang, Yue Wang, Xiaowei Zhou, Ruizhen Hu and Sida Peng},
booktitle={arXiv preprint},
year={2026}
}