Skip to content

ZJU-REAL/ClawGUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

145 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClawGUI Logo

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Python 3.12 License Stars arXiv Daily Paper

HuggingFace Model ModelScope Model Project Page

English | 中文

A full-stack framework for GUI agents, covering online RL training, standardized evaluation, and deployment.
clawgui-agent.mp4

ClawGUI-Agent controls a real phone
via natural language
clawgui-rl.mp4

ClawGUI-RL trains a GUI agent with online
reinforcement learning

News

Table of Contents

💡 Overview

ClawGUI is a research framework for GUI agents, covering the complete lifecycle from online RL training and standardized evaluation to real-device deployment.

Building a capable GUI agent involves three tightly coupled problems that are rarely solved together: you need an environment to train the agent online, rigorous benchmarks to measure what it has learned, and a production system to deploy it on real devices. ClawGUI addresses all three.

Module Role
🚀 ClawGUI-RL Build — Train GUI agents online with scalable RL: parallel Docker environments, real Android devices, and GiGPO+PRM for fine-grained step-level rewards
📊 ClawGUI-Eval Evaluate — Measure what the agent has learned: 6 benchmarks, 11+ models, 95.8% faithful reproduction of official results
🤖 ClawGUI-Agent Deploy — Use GUI agents in the real world: control mobile devices via natural language through 12+ chat platforms, with one-command evaluation built in
🏆 ClawGUI-2B End-to-end validation: trained entirely with ClawGUI-RL and GiGPO, achieving 17.1 MobileWorld SR vs. the 11.1 baseline

🏗️ Architecture

ClawGUI System Architecture

🚀 Quick Start

git clone https://github.com/ZJU-REAL/ClawGUI.git
cd ClawGUI

Each module is independent with its own environment. Click into each one for full installation and usage instructions.

🚀 ClawGUI-RL — Build

📁 clawgui-rl/ · 📖 Full Documentation

ClawGUI-RL trains GUI agents with online reinforcement learning. It runs dozens of Docker-based Android emulators in parallel or trains directly on physical devices — and replaces standard GRPO with GiGPO+PRM for fine-grained step-level rewards that drive stronger policy learning.

  • Parallel multi-environment — Dozens of Docker-based virtual Android environments simultaneously
  • Real-device training — Physical or cloud Android phones with the same API
  • GiGPO + PRM — Fine-grained step-level reward for better policy optimization than standard GRPO
  • Spare server rotation — Automatic failover keeps training running without interruption
  • Episode visualization — Record and replay any training trajectory
ClawGUI-RL Architecture

Get started with ClawGUI-RL

📊 ClawGUI-Eval — Evaluate

📁 clawgui-eval/ · 📖 Full Documentation · 🤗 Dataset · 🤖 ModelScope

ClawGUI-Eval gives GUI grounding research a reliable measurement baseline. Its three-stage Infer → Judge → Metric pipeline covers 6 benchmarks and 11+ models, with a 95.8% reproduction rate against official results — so numbers across papers are actually comparable.

  • 6 benchmarks — ScreenSpot-Pro, ScreenSpot-V2, UIVision, MMBench-GUI, OSWorld-G, AndroidControl
  • 11+ models — Qwen3-VL, Qwen2.5-VL, UI-TARS, MAI-UI, GUI-G2, UI-Venus, Gemini, Seed 1.8, and more
  • Dual backend — Local GPU (transformers) or remote API (OpenAI-compatible)
  • Multi-GPU & multi-thread — Parallel inference with automatic resume
  • ClawGUI-Agent integration — Pair with ClawGUI-Agent to run the full pipeline via natural language
ClawGUI-Eval Architecture

Get started with ClawGUI-Eval

🤖 ClawGUI-Agent — Deploy

📁 clawgui-agent/ · 📖 Full Documentation · 中文

ClawGUI-Agent closes the loop from training to production. Built on OpenClaw and powered by nanobot, it lets you control Android, HarmonyOS, or iOS devices with natural language from 12+ chat platforms — and trigger the full ClawGUI-Eval benchmark pipeline with a single sentence, no scripts required.

  • Cross-platform — Android (ADB), HarmonyOS (HDC), iOS (XCTest)
  • Multi-model — AutoGLM, MAI-UI, GUI-Owl, Qwen-VL, UI-TARS via OpenAI-compatible API
  • One-command evaluation — Say "benchmark qwen3vl on screenspot-pro" and it handles env check → multi-GPU inference → judging → metrics → result comparison
  • Personalized memory — Automatically learns user preferences and injects context across tasks
  • Episode recording — Every task saved as structured episodes for replay and dataset building
  • Web UI — Gradio interface for device management, task execution, and memory inspection
ClawGUI-Agent

Get started with ClawGUI-Agent

🎯 Roadmap

  • ClawGUI-Agent — GUI agent framework for phone control and evaluation via natural language
  • ClawGUI-RL — Scalable mobile online RL training infrastructure with GiGPO + PRM
  • ClawGUI-Eval — Standardized GUI grounding evaluation suite with 6 benchmarks and 95%+ reproduction rate
  • ClawGUI-2B — 2B GUI agent trained with GiGPO, achieving 17.1 MobileWorld SR (vs. 11.1 baseline)
  • On-device ClawGUI-Agent — Deploy ClawGUI-Agent directly on real phones to avoid cloud-based privacy leakage
  • Desktop Online RL — Extend ClawGUI-RL to desktop environments for online reinforcement learning
  • Web Online RL — Extend ClawGUI-RL to web environments for online reinforcement learning
  • More Skills for ClawGUI-Agent — Add more pluggable skills to expand ClawGUI-Agent's capabilities
  • Hybrid CLI & GUI Mechanism — Explore hybrid interaction combining command-line and GUI operations
  • Real-time RL — Integrate real-time reinforcement learning based on the OPD algorithm for ClawGUI-RL and ClawGUI-Agent

🤝 Contributing

We welcome contributions of all kinds — new model support, new RL environments, bug fixes, and documentation improvements. See CONTRIBUTING.md for how to get started, module-specific guidelines, and PR requirements.

🙏 Acknowledgements

ClawGUI is built upon the following excellent open-source projects. We sincerely thank their contributors:

License

This project is licensed under the Apache License 2.0.

Star History

Star History Chart

About

Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages