PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Xiang Zhang^*,1,2,† · Sohyun Yoo^*,1 · Hongrui Wu^*,1,‡ · Chuan Li² · Jianwen Xie² · Zhuowen Tu¹

¹UC San Diego · ²Lambda, Inc.

CVPR 2026

^* Equal contribution

^† Work partially done during internship at Lambda.

^‡ H. Wu contributed during internship at UC San Diego.

Project Page | Paper | arXiv

PixARMesh is a mesh-native autoregressive framework for single-view 3D scene reconstruction.
Instead of reconstructing via intermediate volumetric or implicit representations, PixARMesh directly models instances with native mesh representation. Object poses and meshes are predicted in a unified autoregressive sequence.

This repository contains the official implementation for PixARMesh (CVPR 2026).

🛠️ Environment Setup

We recommend using our pre-built Docker image:

docker pull zx1239856/trl-runner:0.2.0

Alternatively, you can build the environment manually using the provided Dockerfile.

Key requirements:

Install dependencies from:

requirements.txt
requirements-no-iso.txt

Install the EdgeRunner tokenizer.
PyTorch ≥ 2.10 recommended.

🗂️ Dataset Preparation

Download the packed dataset from HuggingFace:

https://huggingface.co/datasets/zx1239856/3d-front-ar-packed

Training Only

Flatten the dataset to ensure uniform instance sampling across scenes:

python -m scripts.flatten_dataset

This will generate:

datasets/3d-front-ar-packed-flattened

Flattening prevents instances from scenes with many objects from being under-sampled during training.

Inference / Evaluation Only

Download the following items and unzip them inside the datasets/ directory:

3D-FUTURE-model-ply Ground-truth object meshes (undecimated)
ar-eval-gt-undecimated Ground-truth scene meshes (undecimated)
depth_pro_aligned_npy Aligned Depth Pro predictions used for inference
grounded_sam Segmentation masks generated with Grounded-SAM

🧠 Training

launch.py is a wrapper around accelerate launch that automatically configures the environment.

PixARMesh uses two-stage training:

Layout prediction
Full autoregressive sequence training

Stage 1 - Layout Prediction

python launch.py train.py --config-name=edgerunner_3d_front_global_obj_pose_w_img_ctx_layout_only

Stage 2 - Full Training

Replace model.local_path with the checkpoint path from Stage 1.

python launch.py train.py --config-name=edgerunner_3d_front_global_obj_pose_w_img_ctx model.local_path=outputs/edgerunner-3d-front-global-obj-pose-w-img-ctx-layout-only/1/checkpoints/final

📊 Evaluation

Distributed inference is supported via Accelerate.

You may either:

Use the pretrained model from HuggingFace
Provide a path to a local checkpoint

Object-Level

Inference

accelerate launch --module scripts.infer --model-type edgerunner --run-type obj --checkpoint zx1239856/PixARMesh-EdgeRunner --output outputs/inference

Evaluation

accelerate launch --module scripts.eval_obj --pred-dir outputs/inference/obj/edgerunner/gt_layout_gt_mask_pred_depth --save-dir outputs/evaluations-obj/edgerunner

Scene-Level

Inference

accelerate launch --module scripts.infer --model-type edgerunner --run-type scene --checkpoint zx1239856/PixARMesh-EdgeRunner --output outputs/inference

Compose Scene Meshes

python -m scripts.compose_scene --pred-dir outputs/inference/scene/edgerunner/pred_layout_pred_mask_pred_depth

Evaluation

accelerate launch --module scripts.eval_scene --pred-dir outputs/inference/scene/edgerunner/pred_layout_pred_mask_pred_depth/scenes --save-dir outputs/evaluation-scene/edgerunner

🏷️ License

This repository is released under the CC-BY-SA 4.0 License.

🙏 Acknowledgements

PixARMesh builds upon several excellent open-source projects:

Core libraries and frameworks:

HuggingFace Transformers
DepR - evaluation pipeline
EdgeRunner - pre-trained weights
BPT - pre-trained weights

We also use physically-based renderings from the 3D-FRONT scenes provided by InstPIFu, along with additional processed assets from DepR.

📝 Citation

If you find PixARMesh useful in your research, please consider citing:

@article{zhang2026pixarmesh,
  title={PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction},
  author={Zhang, Xiang and Yoo, Sohyun and Wu, Hongrui and Li, Chuan and Xie, Jianwen and Tu, Zhuowen},
  journal={arXiv preprint arXiv:2603.05888},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
configs		configs
figures		figures
metadata		metadata
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
accelerate.yaml		accelerate.yaml
launch.py		launch.py
requirements-no-iso.txt		requirements-no-iso.txt
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Project Page | Paper | arXiv

🛠️ Environment Setup

🗂️ Dataset Preparation

Training Only

Inference / Evaluation Only

🧠 Training

Stage 1 - Layout Prediction

Stage 2 - Full Training

📊 Evaluation

Object-Level

Scene-Level

🏷️ License

🙏 Acknowledgements

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Project Page | Paper | arXiv

🛠️ Environment Setup

🗂️ Dataset Preparation

Training Only

Inference / Evaluation Only

🧠 Training

Stage 1 - Layout Prediction

Stage 2 - Full Training

📊 Evaluation

Object-Level

Scene-Level

🏷️ License

🙏 Acknowledgements

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages