Jiyuan Wang1,2,3 Chunyu Lin1,✉ Lei Sun2,✝ Zhi Cao1 Yuyang Yin1 Lang Nie4 Zhenlong Yuan2 Xiangxiang Chu2 Yunchao Wei1 Kang Liao3 Guosheng Lin3,✉
1BJTU
2AMap, Alibaba Group
3NTU
4CQUPT
✉Corresponding author
✝Project leader
We propose RL3DEdit, a novel RL-based single-pass framework for 3D scene editing. Our core insight is that while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable — naturally positioning reinforcement learning as a feasible solution. We leverage the 3D foundation model VGGT as a geometry-aware reward model and employ GRPO to effectively anchor the 2D editor's prior onto the 3D consistency manifold.
- [2026-03-11]: Code and model weights are coming soon. Stay tuned! 🚀
- [2026-03-04]: Paper released on arXiv.
- 🏆 State-of-the-Art Performance: RL3DEdit achieves a VIEScore of 5.48 (vs. 3.23 for the strongest baseline), demonstrating superior editing fidelity and semantic alignment.
- ⚡ High Efficiency: Single-pass inference in just 1.5 minutes — over 2× faster than traditional pipelines and over 20× faster than other FLUX-based baselines.
- 🧠 Novel RL Paradigm: First work to introduce reinforcement learning into 3D scene editing, using VGGT as a geometry-aware reward model.
Code is coming soon. This section will be updated once the code is released.
- Clone the repository:
git clone https://github.com/AMAP-ML/RL3DEdit.git
cd RL3DEdit- Install dependencies:
conda create -n rl3dedit python=3.10 -y
conda activate rl3dedit
pip install -r requirements.txt # Coming soon-
Prepare Training Data:
We collect 8 scenes from IN2N, BlendedMVS, and Mip-NeRF360 datasets, and construct 7–9 editing prompts per scene using a VLM, yielding 70 prompts in total.
-
Run Training Script:
# Training script will be released soon
accelerate launch train_grpo.py \
--config configs/rl3dedit.yaml \
--lora_rank 32 \
--num_views 9 \
--group_size 16 \
--sde_noise 0.8Training was conducted for one epoch on an NVIDIA RTX Pro 6000 GPU and took ~42 hours.
# Inference script will be released soon
python inference.py \
--scene_path /path/to/your/scene \
--instruction "your editing instruction" \
--output_path /path/to/output# Evaluation script will be released soon
python evaluate.py --config configs/eval.yamlOur test data includes 100 cases: novel views (70), unseen instructions (16), and new scenes (14).
| Model | Backbone | Training Data | Download |
|---|---|---|---|
| RL3DEdit | FLUX-Kontext-dev | 70 prompts, 1319 samples | Coming Soon |
If you find our work useful in your research, please consider citing our paper:
@article{wang2026geometry,
title={Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing},
author={Wang, Jiyuan and Lin, Chunyu and Sun, Lei and Cao, Zhi and Yin, Yuyang and Nie, Lang and Yuan, Zhenlong and Chu, Xiangxiang and Wei, Yunchao and Liao, Kang and others},
journal={arXiv preprint arXiv:2603.03143},
year={2026}
}We thank the authors of FLUX-Kontext, VGGT, GRPO, and Flow-GRPO for their excellent work.
⭐ If you find this project useful, please give it a star! ⭐
