arXiv:2604.01687 · 2026

CoEvoSkills: Self-Evolving Agent Skills
via Co-Evolutionary Verification

A self-evolving framework that lets LLM agents autonomously construct complex, multi-file skill packages.

Hanrong Zhang, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang,
Jiayu Zhou, Chengze Li, Wei-Chieh Huang, Yifei Yao, Kening Zheng,
Xue Liu, Xiaoxiao Li, Philip S. Yu
01 / Abstract

The Skill Gap in Agentic LLMs

Anthropic proposes the concept of skills for LLM agents to tackle multi-step professional tasks that simple tool invocations cannot address. A tool is a single, self-contained function, whereas a skill is a structured bundle of interdependent multi-file artifacts. Currently, skill generation is label-intensive and suffers from human–machine cognitive misalignment, which degrades agent performance.

We propose CoEvoSkills, a self-evolving framework that enables agents to autonomously construct complex, multi-file skill packages. It couples a Skill Generator that iteratively refines skills with a Surrogate Verifier that co-evolves to provide informative and actionable feedback without access to ground-truth test content. On SkillsBench, CoEvoSkills achieves the highest pass rate among five baselines on both Claude Code and Codex, and also exhibits strong generalization to six additional LLMs.

5
baselines beaten
2
agent backends
6
transfer LLMs
11
domains
02 / Motivation

What is a Skill?

A tool is a single function. A skill is an orchestrated package of instructions, scripts, and references for long-horizon professional tasks.

Tool vs. Skill
Figure 1. A tool is a single self-contained function; a skill is a structured, multi-file package with instructions, scripts, and assets.
03 / Method

The Co-Evolution Loop

Two components evolve together through iterative generate – verify – refine cycles.

🛠️
Skill Generator

Iteratively produces and refines structured multi-file skill bundles from the verifier's dense diagnostic feedback.

🧪
Surrogate Verifier

Information-isolated; independently evolves test assertions to provide actionable failure signals without ground-truth leakage.

🔒
Opaque Oracle

Returns only a pass/fail signal, triggering test escalation while preserving strict information isolation.

CoEvoSkills Framework
Figure 2. Overview of the CoEvoSkills co-evolutionary framework. The Skill Generator and Surrogate Verifier co-evolve via iterative refinement; a ground-truth oracle returns only an opaque pass/fail signal, triggering test escalation and preserving information isolation.
04 / Highlights

Why It Matters

🧩
First of its kind

First framework to produce structured, executable, multi-file skill packages via self-evolution.

🚫
No GT supervision

Dense diagnostic feedback without test-content leakage during co-evolution.

🏆
SOTA on SkillsBench

Highest pass rate among five baselines on Claude Code and Codex.

🌐
Cross-model transfer

Generated skills transfer effectively to six additional LLMs without retraining.

05 / Results

Experiments on SkillsBench

Main Results

Main results on SkillsBench
Figure 3. Pass rate on SkillsBench. CoEvoSkills achieves the best performance against five baselines on both Claude Code and Codex.

Cross-Model Transferability

Cross-model transfer
Figure 4. Skills generated by CoEvoSkills transfer effectively to six additional LLMs without retraining.

Per-Domain Breakdown

Per-domain breakdown
Figure 5. Per-domain pass rate across the 11 SkillsBench domains.

Evolution Trajectory

Evolution trajectory
Figure 6. Pass rate improves monotonically as the Skill Generator and Surrogate Verifier co-evolve across iterations.
06 / Citation

Cite This Work

If you find CoEvoSkills useful, please consider citing:

@article{zhang2026coevoskills,
  title   = {CoEvoSkills: Self-Evolving Agent Skills via
             Co-Evolutionary Verification},
  author  = {Zhang, Hanrong and Fan, Shicheng and Zou, Henry Peng and
             Chen, Yankai and Wang, Zhenting and Zhou, Jiayu and
             Li, Chengze and Huang, Wei-Chieh and Yao, Yifei and
             Zheng, Kening and Liu, Xue and Li, Xiaoxiao and
             Yu, Philip S.},
  journal = {arXiv preprint arXiv:2604.01687},
  year    = {2026}
}