Skip to content

ChaiYing1/GAF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation

ICRA 2026

GAF is a 4D representation to formulate the V-4D-A(vision to 4d to action) paradigm that infers future scene evolution from current visual observations to guide robotic manipulation. It supports scene reconstruction, future prediction, and action generation within a unified framework. This feed-forward pipeline requires only sparse-view RGB images and supports real-time execution.

📄 Paper

👥 Authors

Ying Chai*1, Litao Deng*2,3, Ruizhi Shao1, Jiajun Zhang Sun1, Kangchen Lv1, Liangjun Xing1, Xiang Li1,†, Hongwen Zhang1,†, Yebin Liu1,†

1 Tsinghua University
2 Beijing Normal University
3 Shadow AI

* Equal contributions.
Corresponding author

🎯 Abstract

Accurate scene perception is critical for vision-based robotic manipulation. Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A) paradigm, leveraging intermediate 3D representations. However, these methods often struggle with action inaccuracies due to the complexity and dynamic nature of manipulation scenes. In this paper, we adopt a V-4D-A framework that enables direct action reasoning from motion-aware 4D representations via a Gaussian Action Field (GAF). GAF extends 3D Gaussian Splatting (3DGS) by incorporating learnable motion attributes, allowing 4D modeling of dynamic scenes and manipulation actions. To learn time-varying scene geometry and action-aware robot motion, GAF provides three interrelated outputs: reconstruction of the current scene, prediction of future frames, and estimation of init action via Gaussian motion. Furthermore, we employ an action-vision-aligned denoising framework, conditioned on a unified representation that combines the init action and the Gaussian perception, both generated by the GAF, to further obtain more precise actions. Extensive experiments demonstrate significant improvements, with GAF achieving +11.5385 dB PSNR, +0.3864 SSIM and -0.5574 LPIPS improvements in reconstruction quality, while boosting the average +7.3% success rate in robotic manipulation tasks over state-of-the-art methods.

🎥 Demo Video

See the project website for the full demo video.

🏗️ Architecture

GAF provides three interrelated outputs:

  • Current Gaussian: GAF reconstructs 3D Gaussian Pointclouds at current timestep.
  • Future Gaussian: GAF predicts future 3D Gaussian Pointclouds by motion attributes.
  • Init action: The init action is estimated through point cloud matching between the current and future manipulation-related Gaussian.

Both the high-level module and adaptive corrector are implemented as causal transformers that leverage historical scanning sequences for more informed decision-making.

📊 Results

  • +11.5385 dB PSNR, +0.3864 SSIM and -0.5574 LPIPS improvements in reconstruction.
  • boosting the average +7.3% success rate in robotic manipulation tasks over SOTA.
  • Outperforms baseline methods in Reconstruction Quality & Success Rate

📦 Code

Code will be released after acceptance. Stay tuned!

📝 Citation

If you find this work useful in your research, please cite:

@misc{chai2025gafgaussianactionfield,
      title={GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation}, 
      author={Ying Chai and Litao Deng and Ruizhi Shao and Jiajun Zhang and Kangchen Lv and Liangjun Xing and Xiang Li and Hongwen Zhang and Yebin Liu},
      year={2025},
      eprint={2506.14135},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.14135}, 
}

🌐 Project Website

Visit our project website: [https://chaiying1.github.io/GAF.github.io/project_page/]

This website is deployed and maintained by [Ying Chai].

📄 License

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

🙏 Acknowledgments

This website borrows the source code of this website. We sincerely thank Keunhong Park for developing and open-sourcing this template.

📧 Contact

For questions or inquiries, please contact:


Note: This work was supported by the National Natural Science Foundation of China (NSFC) No.62125107.

About

GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors