I’m a third-year Computer Science student at the College of Computer Science and Technology, Xi’an Jiaotong University, expected to earn my B.S. in Engineering in fall 2027. My research interests primarily focus on Large Language Model (LLM) Agents in Domain-specific Scenarios, and Autonomous Agents in Computer Use (CLI, GUI). Contact me at jiayuw794@gmail.com.
🔥 News
- 2026.01.05: 🎉🎉 I attend the semi-final competition of AI Agent 2025(Link) held by (Chinese Association for Artificial Intelligence)(CAAI) in Shenzhen, China!
- 2025.12.01: 🎉🎉 I get the merit award(¥1500) working with my teammates zepeng and weijiang in Harmony System Control Agent Competition(Link) held by Nanjing University and Huawei!
- 2025.11.30: 🎉🎉 The number of subscribers of the Account has reached 500!
- 2025.11.22: 🎉🎉 GeoPlan-bench Releases!
- 2025.11.04: 🎉🎉 Earth-Insights WeChat Official Account’s Paper-Deep-Dive Feature Releases!
- 2025.10.28: 🎉🎉 I attend “Vibe Coding hackthon” held by WaytoAGI(Link) and give a talk on GUI-Agents and their potential in the future!
- 2025.10.21: 🎉🎉 Auto-Cursor Releases!
- 2025.10.21: 🎉🎉 Earth-Insights Account has started cross-posting on RedNote!
- 2025.10.03: 🎉🎉 Earth-Insights WeChat Official Account’s first Semi-Weekly-Report Release!
- 2025.09.20: 🎉🎉 I attend Cursor Meetup Xi’an!
- 2025.09.15: 🎉🎉 EarthAgent succeeds in MVP test by AI Agent 2025 Committee!
- 2025.08.24: 🎉🎉 I participate in the “First National College Student Artificial Intelligence Security Competition” (Link) held at Beijing University of Posts and Telecommunications and win the first prize!
📝 Publications

Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism
| Paper | Code | Project |
Kaiyu Li, Jiayu Wang, Zhi Wang, Hui Qiao, Weizhan Zhang, Deyu Meng, Xiangyong Cao
- We introduce a novel agent design framework centered on a Hierarchical Task Abstraction Mechanism (HTAM).
- We instantiate this framework as EarthAgent, a multi-agent system tailored for complex geospatial analysis. To evaluate such complex planning capabilities, we build GeoPlan-bench, a comprehensive benchmark of realistic, multi-step geospatial planning tasks. It is accompanied by a suite of carefully designed metrics to evaluate tool selection, path similarity, and logical completeness.
- Experiments show that EarthAgent substantially outperforms a range of established single- and multi-agent systems.

DescribeEarth: Describe Anything for Remote Sensing Images
| Paper | Code | Dataset | Benchmark |
Kaiyu Li, Zixuan Jiang, Xiangyong Cao, Jiayu Wang, Yuchen Xiao, Deyu Meng, Zhi Wang
- We propose Geo-DLC, a novel task of object-level fine-grained image captioning for remote sensing.
- We construct DE-Dataset, a large-scale dataset containing 25 categories and 261,806 annotated instances with detailed descriptions of object attributes, relationships, and contexts.
- Furthermore, we introduce DE-Benchmark, an LLM-assisted question-answering based evaluation suite designed to systematically measure model capabilities on the Geo-DLC task.
- We also present DescribeEarth, a Multi-modal Large Language Model (MLLM) architecture explicitly designed for Geo-DLC.
- Our DescribeEarth model consistently outperforms state-of-the-art general MLLMs on DE-Benchmark, demonstrating superior factual accuracy, descriptive richness, and grammatical soundness, particularly in capturing intrinsic object features and surrounding environmental attributes across simple, complex, and even out-of-distribution remote sensing scenarios.
🚀 Projects (selected)
🔥Auto-Cursor (link) (Oct 2025)
Auto-Cursor is THE FIRST (to the best of my knowledge) GUI-native orchestration layer that pilots the Cursor IDE like a human operator. By combining large language models, visual grounding, and deterministic automation, the project explores how agents can build software without being confined to command-line tooling.
Why Through GUI?
- Human-parity reach: Command-line automation is capped by the APIs that tools expose. A GUI agent, however, can click, type, drag, and navigate any surface that a human can. This dramatically widens the solution space—if a person can operate it, an agent can learn to operate it too, opening the door to automating entire product lifecycles.
- Grounded perception is ready: Domain-specific MLLMs now recognize icons, layouts, and context with far higher reliability. The bottleneck has shifted from perception to orchestration. Auto-Cursor focuses on that orchestration layer—sequencing vision, language, and action—to unlock richer, end-to-end workflows.
- Standing on the shoulders of the ecosystem: GUI-first control leverages advances in agents, LLMs, GPU-accelerated rendering, and even display hardware. We treat the modern desktop as a programmable environment, turning existing tools into improvable building blocks instead of rewriting them.
Vision
- Build a resilient, self-improving system that can iterate on its own behaviors, learn from failures, and adapt to different project constraints.
- Provide tangible GUI agent scenarios that inspire new ideas for downstream industries—design, ops, education, assistive tech, and beyond.
- Stimulate thinking on AI safety and software design, showing how oversight, logging, and guardrails can coexist with highly capable automation.
🔥EarthAgent (link) (July 2025)
EarthAgent is a groundbreaking general AI agent for the remote sensing field, dedicated to making complex and high-threshold geospatial analysis more accessible and automated. It allows users to drive a fully automated workflow that integrates multimodal data acquisition, intelligent interpretation, and deep reasoning through simple natural language conversations. Whether with text or image inputs, EarthAgent can autonomously plan and execute tasks, reducing traditional manual analysis processes that used to take days to just minutes. It has attracted 300+ likes on RedNote. link
🔥Earth-Insights WeChat Official Account Agent (地球洞察微信公众号) (Oct 2025)
This is an automated system that fetches the latest papers in the fields of remote sensing and deep learning. Through sophisticated design and arrangement, it utilizes document analysis agents and various document analysis tools to achieve fully automated analysis and summarization. Currently, it consists of two modules: the Semi-Weekly Report and the Paper Deep Dive. As of December 16, the official account has published a total of 60+ blogs, covering 240+ papers, accumulating over 4,000 reads, and attracting more than 500 followers, providing the community with convenient access to the latest information.
