PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction
Xiang Zhang*,1,2,† · Sohyun Yoo*,1 · Hongrui Wu*,1,‡ · Chuan Li2 · Jianwen Xie2 · Zhuowen Tu1
1UC San Diego · 2Lambda, Inc.
CVPR 2026
* Equal contribution
† Work partially done during internship at Lambda.
‡ H. Wu contributed during internship at UC San Diego.
Project Page | Paper | arXiv
PixARMesh is a mesh-native autoregressive framework for single-view 3D scene reconstruction.
Instead of reconstructing via intermediate volumetric or implicit representations, PixARMesh directly models instances with native mesh representation. Object poses and meshes are predicted in a unified autoregressive sequence.
This repository contains the official implementation for PixARMesh (CVPR 2026).
We recommend using our pre-built Docker image:
docker pull zx1239856/trl-runner:0.2.0Alternatively, you can build the environment manually using the provided Dockerfile.
Key requirements:
- Install dependencies from:
requirements.txt
requirements-no-iso.txt
- Install the EdgeRunner tokenizer.
- PyTorch ≥ 2.10 recommended.
Download the packed dataset from HuggingFace:
https://huggingface.co/datasets/zx1239856/3d-front-ar-packed
Flatten the dataset to ensure uniform instance sampling across scenes:
python -m scripts.flatten_datasetThis will generate:
datasets/3d-front-ar-packed-flattened
Flattening prevents instances from scenes with many objects from being under-sampled during training.
Download the following items and unzip them inside the datasets/ directory:
-
3D-FUTURE-model-ply Ground-truth object meshes (undecimated)
-
ar-eval-gt-undecimated Ground-truth scene meshes (undecimated)
-
depth_pro_aligned_npy Aligned Depth Pro predictions used for inference
-
grounded_sam Segmentation masks generated with Grounded-SAM
launch.py is a wrapper around accelerate launch that automatically configures the environment.
PixARMesh uses two-stage training:
- Layout prediction
- Full autoregressive sequence training
python launch.py train.py --config-name=edgerunner_3d_front_global_obj_pose_w_img_ctx_layout_onlyReplace model.local_path with the checkpoint path from Stage 1.
python launch.py train.py --config-name=edgerunner_3d_front_global_obj_pose_w_img_ctx model.local_path=outputs/edgerunner-3d-front-global-obj-pose-w-img-ctx-layout-only/1/checkpoints/finalDistributed inference is supported via Accelerate.
You may either:
- Use the pretrained model from HuggingFace
- Provide a path to a local checkpoint
- Inference
accelerate launch --module scripts.infer --model-type edgerunner --run-type obj --checkpoint zx1239856/PixARMesh-EdgeRunner --output outputs/inference- Evaluation
accelerate launch --module scripts.eval_obj --pred-dir outputs/inference/obj/edgerunner/gt_layout_gt_mask_pred_depth --save-dir outputs/evaluations-obj/edgerunner- Inference
accelerate launch --module scripts.infer --model-type edgerunner --run-type scene --checkpoint zx1239856/PixARMesh-EdgeRunner --output outputs/inference- Compose Scene Meshes
python -m scripts.compose_scene --pred-dir outputs/inference/scene/edgerunner/pred_layout_pred_mask_pred_depth- Evaluation
accelerate launch --module scripts.eval_scene --pred-dir outputs/inference/scene/edgerunner/pred_layout_pred_mask_pred_depth/scenes --save-dir outputs/evaluation-scene/edgerunnerThis repository is released under the CC-BY-SA 4.0 License.
PixARMesh builds upon several excellent open-source projects:
Core libraries and frameworks:
- HuggingFace Transformers
- DepR - evaluation pipeline
- EdgeRunner - pre-trained weights
- BPT - pre-trained weights
We also use physically-based renderings from the 3D-FRONT scenes provided by InstPIFu, along with additional processed assets from DepR.
If you find PixARMesh useful in your research, please consider citing:
@article{zhang2026pixarmesh,
title={PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction},
author={Zhang, Xiang and Yoo, Sohyun and Wu, Hongrui and Li, Chuan and Xie, Jianwen and Tu, Zhuowen},
journal={arXiv preprint arXiv:2603.05888},
year={2026}
}