GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation

GAF is a 4D representation to formulate the V-4D-A(vision to 4d to action) paradigm that infers future scene evolution from current visual observations to guide robotic manipulation. It supports scene reconstruction, future prediction, and action generation within a unified framework. This feed-forward pipeline requires only sparse-view RGB images and supports real-time execution.

📄 Paper

Paper (PDF): arXiv:2506.14135
arXiv: https://arxiv.org/abs/2506.14135
Status: Accepted to ICRA 2026

👥 Authors

Ying Chai*¹, Litao Deng*^2,3, Ruizhi Shao¹, Jiajun Zhang Sun¹, Kangchen Lv¹, Liangjun Xing¹, Xiang Li^1,†, Hongwen Zhang^1,†, Yebin Liu^1,†

¹ Tsinghua University
² Beijing Normal University
³ Shadow AI

^* Equal contributions.
^† Corresponding author

🎯 Abstract

Accurate scene perception is critical for vision-based robotic manipulation. Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A) paradigm, leveraging intermediate 3D representations. However, these methods often struggle with action inaccuracies due to the complexity and dynamic nature of manipulation scenes. In this paper, we adopt a V-4D-A framework that enables direct action reasoning from motion-aware 4D representations via a Gaussian Action Field (GAF). GAF extends 3D Gaussian Splatting (3DGS) by incorporating learnable motion attributes, allowing 4D modeling of dynamic scenes and manipulation actions. To learn time-varying scene geometry and action-aware robot motion, GAF provides three interrelated outputs: reconstruction of the current scene, prediction of future frames, and estimation of init action via Gaussian motion. Furthermore, we employ an action-vision-aligned denoising framework, conditioned on a unified representation that combines the init action and the Gaussian perception, both generated by the GAF, to further obtain more precise actions. Extensive experiments demonstrate significant improvements, with GAF achieving +11.5385 dB PSNR, +0.3864 SSIM and -0.5574 LPIPS improvements in reconstruction quality, while boosting the average +7.3% success rate in robotic manipulation tasks over state-of-the-art methods.

🎥 Demo Video

See the project website for the full demo video.

🏗️ Architecture

GAF provides three interrelated outputs:

Current Gaussian: GAF reconstructs 3D Gaussian Pointclouds at current timestep.
Future Gaussian: GAF predicts future 3D Gaussian Pointclouds by motion attributes.
Init action: The init action is estimated through point cloud matching between the current and future manipulation-related Gaussian.

Both the high-level module and adaptive corrector are implemented as causal transformers that leverage historical scanning sequences for more informed decision-making.

📊 Results

+11.5385 dB PSNR, +0.3864 SSIM and -0.5574 LPIPS improvements in reconstruction.
boosting the average +7.3% success rate in robotic manipulation tasks over SOTA.
Outperforms baseline methods in Reconstruction Quality & Success Rate

📦 Code

Code will be released after acceptance. Stay tuned!

📝 Citation

If you find this work useful in your research, please cite:

@misc{chai2025gafgaussianactionfield,
      title={GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation}, 
      author={Ying Chai and Litao Deng and Ruizhi Shao and Jiajun Zhang and Kangchen Lv and Liangjun Xing and Xiang Li and Hongwen Zhang and Yebin Liu},
      year={2025},
      eprint={2506.14135},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.14135}, 
}

🌐 Project Website

Visit our project website: [https://chaiying1.github.io/GAF.github.io/project_page/]

This website is deployed and maintained by [Ying Chai].

📄 License

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

🙏 Acknowledgments

This website borrows the source code of this website. We sincerely thank Keunhong Park for developing and open-sourcing this template.

📧 Contact

For questions or inquiries, please contact:

Ying Chai : chaiy25@mails.tsinghua.edu.cn

Note: This work was supported by the National Natural Science Foundation of China (NSFC) No.62125107.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
static		static
video		video
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation

📄 Paper

👥 Authors

🎯 Abstract

🎥 Demo Video

🏗️ Architecture

📊 Results

📦 Code

📝 Citation

🌐 Project Website

📄 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation

📄 Paper

👥 Authors

🎯 Abstract

🎥 Demo Video

🏗️ Architecture

📊 Results

📦 Code

📝 Citation

🌐 Project Website

📄 License

🙏 Acknowledgments

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages