Skip to content

AMAP-ML/RL3DEdit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

arXiv Project Page HuggingFace License: MIT

Jiyuan Wang1,2,3   Chunyu Lin1,✉   Lei Sun2,✝   Zhi Cao1   Yuyang Yin1   Lang Nie4   Zhenlong Yuan2   Xiangxiang Chu2   Yunchao Wei1   Kang Liao3   Guosheng Lin3,✉

1BJTU    2AMap, Alibaba Group    3NTU    4CQUPT   
Corresponding author   Project leader


We propose RL3DEdit, a novel RL-based single-pass framework for 3D scene editing. Our core insight is that while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable — naturally positioning reinforcement learning as a feasible solution. We leverage the 3D foundation model VGGT as a geometry-aware reward model and employ GRPO to effectively anchor the 2D editor's prior onto the 3D consistency manifold.

📢 News

  • [2026-03-11]: Code and model weights are coming soon. Stay tuned! 🚀
  • [2026-03-04]: Paper released on arXiv.

💡 Highlights

  • 🏆 State-of-the-Art Performance: RL3DEdit achieves a VIEScore of 5.48 (vs. 3.23 for the strongest baseline), demonstrating superior editing fidelity and semantic alignment.
  • High Efficiency: Single-pass inference in just 1.5 minutes — over faster than traditional pipelines and over 20× faster than other FLUX-based baselines.
  • 🧠 Novel RL Paradigm: First work to introduce reinforcement learning into 3D scene editing, using VGGT as a geometry-aware reward model.

🛠️ Setup

Code is coming soon. This section will be updated once the code is released.

  1. Clone the repository:
git clone https://github.com/AMAP-ML/RL3DEdit.git
cd RL3DEdit
  1. Install dependencies:
conda create -n rl3dedit python=3.10 -y
conda activate rl3dedit
pip install -r requirements.txt  # Coming soon

🔥 Training

  1. Prepare Training Data:

    We collect 8 scenes from IN2N, BlendedMVS, and Mip-NeRF360 datasets, and construct 7–9 editing prompts per scene using a VLM, yielding 70 prompts in total.

  2. Run Training Script:

# Training script will be released soon
accelerate launch train_grpo.py \
    --config configs/rl3dedit.yaml \
    --lora_rank 32 \
    --num_views 9 \
    --group_size 16 \
    --sde_noise 0.8

Training was conducted for one epoch on an NVIDIA RTX Pro 6000 GPU and took ~42 hours.

🕹️ Inference

Editing a 3D Scene

# Inference script will be released soon
python inference.py \
    --scene_path /path/to/your/scene \
    --instruction "your editing instruction" \
    --output_path /path/to/output

Evaluation on Test Set

# Evaluation script will be released soon
python evaluate.py --config configs/eval.yaml

Our test data includes 100 cases: novel views (70), unseen instructions (16), and new scenes (14).

🤗 Model Zoo

Model Backbone Training Data Download
RL3DEdit FLUX-Kontext-dev 70 prompts, 1319 samples Coming Soon

🎓 Citation

If you find our work useful in your research, please consider citing our paper:

@article{wang2026geometry,
  title={Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing},
  author={Wang, Jiyuan and Lin, Chunyu and Sun, Lei and Cao, Zhi and Yin, Yuyang and Nie, Lang and Yuan, Zhenlong and Chu, Xiangxiang and Wei, Yunchao and Liao, Kang and others},
  journal={arXiv preprint arXiv:2603.03143},
  year={2026}
}

🙏 Acknowledgements

We thank the authors of FLUX-Kontext, VGGT, GRPO, and Flow-GRPO for their excellent work.


⭐ If you find this project useful, please give it a star! ⭐

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages