The PyTorch for Agentic AI. A-Evolve is an open-source infrastructure that evolves any agent, across any domain, using any evolution algorithm — with zero human intervention.
Quick Start | News | Benchmark Highlights | Architecture & Design | Contribution
You provide a Base Agent. A-Evolve returns a SOTA Agent. 3 lines of code. 0 hours of manual harness engineering. One infra, any domain, any evolution algorithm.
import agent_evolve as ae
evolver = ae.Evolver(agent="./my_agent", benchmark="swe-verified")
results = evolver.run(cycles=10)By applying our open-source reference evolution algorithms to a base Claude Opus-4.6 model with zero manual harness engineering, A-Evolve pushed agents into top-tier performance across four diverse benchmarks:
|
🥇 #1 Baseline → 79.4% (+3.4pp) |
~#5 Baseline → 76.8% (+2.6pp) |
~#7 Baseline → 76.5% (+13.0pp) |
#2 Baseline → 34.9% (+15.2pp) |
All results achieved with a single Claude Opus-4.6 base model, evolved using A-Evolve's sample algorithms. 0 hours of human harness engineering. Data checked March 2026.
- 03/25 🚀 Open-source A-Evolve, the universal infrastructure for developing and testing evolving algorithms.
- 03/25 📊 Open-source 4 evolving algorithms developed with A-Evolve, achieving SOTA (#1, ~#5, ~#7, #2) on MCP-Atlas, SWE-bench Verified, Terminal-Bench 2.0, and SkillsBench.
- 02/17 📄 Release the official implementation of Position: Agentic Evolution is the Path to Evolving LLMs (arXiv 2602.00359).
We are evolving fast! Support our research by leaving a ⭐.
A-Evolve mutates real files in the workspace. Here's a before/after from our MCP-Atlas evolution:
| Before (Seed Workspace) | After (Evolved — 79.4% on MCP-Atlas) |
|---|---|
|
|
5 targeted skills outperformed 10 generic ones. Every mutation is git-tagged (evo-1, evo-2, …) for full reproducibility.
Install from source with all dependencies:
git clone https://github.com/A-EVO-Lab/a-evolve.git && cd a-evolve
pip install -e ".[all,dev]"import agent_evolve as ae
evolver = ae.Evolver(
agent="swe-verified", # built-in seed workspace (or path to yours)
benchmark="swe-verified", # built-in benchmark adapter
)
results = evolver.run(cycles=10)
print(f"Final score: {results.final_score:.3f}")
print(f"Converged: {results.converged}")A-Evolve ships with built-in seed workspaces (swe, mcp, terminal, skillbench) and benchmark adapters (swe-verified, mcp-atlas, terminal-bench, skill-bench). Point agent= at any of them — or at your own workspace directory.
To make any agent evolvable, implement one method — solve():
from agent_evolve.protocol.base_agent import BaseAgent
from agent_evolve.types import Task, Trajectory
class MyAgent(BaseAgent):
def solve(self, task: Task) -> Trajectory:
return Trajectory(task_id=task.id, output="result")Then evolve it:
evolver = ae.Evolver(agent=MyAgent("./my_workspace"), benchmark="mcp-atlas")
results = evolver.run(cycles=10)Your agent's evolvable state (prompts, skills, memory) lives as a standard directory — the Agent Workspace. A-Evolve mutates these files; your agent reloads. See Architecture & Design for the full picture.
For benchmark-specific walkthroughs, see SWE-bench Demo Guide and MCP-Atlas Demo Guide.
A-Evolve's core insight: all evolvable agent state lives on the file system as a standard directory structure. This lets the evolution engine mutate any agent via LLM-driven file operations — without knowing the agent's internals.
my_agent/
├── manifest.yaml # identity, entrypoint, evolvable layers
├── prompts/system.md # system prompt
├── skills/ # SKILL.md files (dynamic skill library)
├── tools/ # tool configurations
└── memory/ # episodic + semantic memory (JSONL)
The evolution engine reads these files, analyzes performance logs, and writes mutations back. The agent reloads. That's the entire contract.
Every cycle follows five phases:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────┐ ┌────────┐
│ Solve │───▶│ Observe │───▶│ Evolve │───▶│ Gate │───▶│ Reload │
└─────────┘ └─────────┘ └─────────┘ └──────┘ └────────┘
- Solve — Agent processes a batch of tasks (black-box execution).
- Observe — Collect trajectories + benchmark feedback into structured logs.
- Evolve — Evolution engine analyzes observations and mutates workspace files (prompts, skills, memory).
- Gate — Validate mutations on holdout tasks. Regressed mutations are rolled back via git.
- Reload — Agent reloads from the (possibly rolled-back) workspace.
The loop converges when EGL (Evolutionary Generality Loss) stabilizes or max_cycles is reached. Every accepted mutation is git-tagged (evo-1, evo-2, …), providing a full audit trail.
A-Evolve ships with ready-to-use benchmark adapters and seed workspaces:
| Adapter | Domain | Seed Workspace | Best Result |
|---|---|---|---|
swe-verified |
Real-world GitHub issues (Python repos) | seed_workspaces/swe/ |
76.8% (~#5) |
mcp-atlas |
Tool-calling via MCP (16+ servers) | seed_workspaces/mcp/ |
79.4% (🥇 #1) |
terminal-bench |
Terminal/CLI ops in Docker | seed_workspaces/terminal/ |
76.5% (~#7) |
skill-bench |
Agentic skill discovery | seed_workspaces/reasoning/ |
34.9% (~#2) |
A-Evolve is a framework, not a standalone agent. Every axis is pluggable:
| Axis | Interface | You Provide | Built-in Examples |
|---|---|---|---|
| Agent (BYOA) | BaseAgent.solve() |
Any agent architecture — ReAct, Plan-and-Solve, custom | SweAgent, McpAgent |
| Benchmark (BYOE) | BenchmarkAdapter.get_tasks() / .evaluate() |
Any domain with task + evaluation signal | SWE-bench, MCP-Atlas, Terminal-Bench 2.0, SkillsBench |
| Algorithm (BYO-Algo) | EvolutionEngine.step() |
Any evolution strategy | AEvolveEngine (LLM-driven mutation) |
| LLM Provider | LLMProvider.complete() |
Any model API | Anthropic, OpenAI, AWS Bedrock |
A-Evolve ships with 4 reference evolution algorithms, each targeting different domains and strategies:
| Algorithm | Strategy | Best For | Docs |
|---|---|---|---|
adaptive_evolve |
Per-claim feedback analysis + meta-learning | MCP-Atlas (🥇 #1, 79.4%) | Guide |
adaptive_skill |
LLM-driven workspace mutation with bash tool access | Terminal-Bench 2.0 (~#7, 76.5%) | Guide |
skillforge |
LLM-driven workspace mutation with EGL gating | SkillsBench (#2, 34.9%) | Guide |
guided_synth |
Memory-first evolution + LLM-guided intervention synthesis | General-purpose, SWE-bench (~#5, 76.8%) | Guide |
Each algorithm lives in its own directory under algorithms/. Implement a single method:
from agent_evolve.engine.base import EvolutionEngine
from agent_evolve.types import StepResult
class MyEvolutionEngine(EvolutionEngine):
def step(self, workspace, observations, history, trial) -> StepResult:
# Analyze observations, mutate workspace files, optionally run trial tasks
...
return StepResult(accepted=True, score=new_score)Then pass it to the Evolver:
evolver = ae.Evolver(
agent="swe-verified",
benchmark="swe-verified",
engine=MyEvolutionEngine(config),
)The engine has full access to shared primitives — TrialRunner (on-demand validation), EvolutionHistory (observation + version queries), and VersionControl (git-based rollback) — but is never forced to use them. Minimal contract, maximum freedom.
A-Evolve is built for the research community. We welcome contributions across every axis of the framework.
If you work in LLM self-optimization, reinforcement learning, or agent architectures — implement the EvolutionEngine interface and your algorithm instantly gains access to:
- Diverse environments (SWE-bench, MCP-Atlas, Terminal-Bench 2.0, SkillsBench, and more).
- Standardized agent workspace representations.
- Rigorous evaluation, gating, and logging infrastructure.
Drop your algorithm into agent_evolve/algorithms/your_algo/ and open a PR.
Implement BenchmarkAdapter to plug any new evaluation domain into A-Evolve. The interface is two methods: get_tasks() and evaluate().
- ⭐ Star this repo to support our research — we are evolving fast.
- 🐛 Open an issue to report bugs or request features.
- 🔀 Submit a PR — new evolution algorithms, benchmark adapters, agent implementations, and documentation improvements are all welcome.
- 💬 Join our Discord to discuss research directions, share results, and collaborate.
If you use A-Evolve in your research, please cite our position paper:
@article{a-evolve2025,
title = {Position: Agentic Evolution is the Path to Evolving LLMs},
author = {A-EVO-Lab},
journal = {arXiv preprint arXiv:2602.00359},
year = {2026},
url = {https://arxiv.org/abs/2602.00359}
}

