Overview
Imbue has open-sourced the Darwinian Evolver, a lightweight framework that uses LLM-driven evolution to optimize code and prompts. It maintains populations of "organisms" (solutions), selects parents via weighted fitness+novelty sampling, applies LLM-powered mutations, and iterates — achieving 2-3x performance improvements over base model capabilities. Notably, it achieved 95.1% on ARC-AGI-2 with Gemini 3.1 Pro, and 34% with open-weights Kimi K2.5 (the highest open-weights score).
A Hermes Agent skill that teaches the agent how to install, configure, and run the Darwinian Evolver for user-defined optimization tasks would unlock evolutionary search as a powerful problem-solving tool. Users could evolve prompts, optimize code, discover algorithms, and tune agent configurations — all orchestrated by the agent.
Research source: LLM-based Evolution as a Universal Optimizer and Beating ARC-AGI-2 with Code Evolution
Research Findings
How the Darwinian Evolver Works
The framework operates on a simple evolutionary loop:
-
Selection — Sample parents from the population using sigmoid-scaled fitness × novelty bonus. The midpoint is set dynamically (e.g., 75th percentile) so the selector stays in the "high-gradient range" as the population improves.
-
Mutation — An LLM proposes targeted improvements based on specific failure cases. Advanced techniques include:
- Batch mutations — Multiple failure cases provided simultaneously (like mini-batches in SGD)
- Learning log — History of past mutations and their outcomes, provided as code diffs
- Crossover mutations — Combining logic from 2-3 different parents (25% frequency)
- Randomized mutation strength — Varying between incremental fixes and "outside the box" thinking
-
Post-mutation verification — Quick check: does the mutation fix the specific failure it targeted? If not, discard immediately without full evaluation. Imbue reports >10x cost/time improvement from this step alone.
-
Evaluation — Score the organism using ground truth, performance metrics, or LLM-based quality heuristics.
-
Integration — Add scored organisms back to the population atomically (no mid-iteration visibility).
Key Architecture (from source code)
The framework requires three user-defined components:
# 1. Organism — the thing being evolved
class MyOrganism(Organism):
prompt_template: str
# 2. Evaluator — scores organisms and identifies failure cases
class MyEvaluator(Evaluator[MyOrganism, EvaluationResult, MyFailureCase]):
def evaluate(self, organism: MyOrganism) -> EvaluationResult:
return EvaluationResult(score=0.85, trainable_failure_cases=[...])
# 3. Mutator — LLM-powered agent that proposes fixes
class MyMutator(Mutator[MyOrganism, MyFailureCase]):
def mutate(self, organism, failure_cases, learning_log_entries):
return [MyOrganism(prompt_template=improved_prompt)]
It also provides GitBasedOrganism for evolving code tracked in Git repos, and EvolveProblemLoop for managing the full lifecycle (snapshots, resumption, statistics).
Key Design Decisions
- Population diversity via novelty bonus prevents evolutionary dead ends
- Separate train/score datasets prevent overfitting to specific failure cases
- Concurrent execution with configurable parallelism for mutations and evaluations
- Pickle-based snapshots enable pause/resume of long evolution runs
- Provider-agnostic — supports Anthropic, OpenAI, and Google GenAI
Performance Data
| Model |
Base Score |
Evolved Score |
Improvement |
Cost/Task |
| Kimi K2.5 (open-weights) |
12.1% |
34.0% |
2.8x |
$2.67 |
| Gemini 3 Flash |
34.0% |
61.4% |
1.8x |
$2.42 |
| Gemini 3.1 Pro |
88.1% |
95.1% |
+7% |
$8.71 |
Current State in Hermes Agent
Hermes Agent currently has no evolutionary optimization capability. Related existing components:
- batch_runner.py — Parallel batch processing with trajectory saving (could provide evaluation signals)
- agent/trajectory.py — Trajectory saving in ShareGPT format
- RL Training (Tinker-Atropos) — RL-based training that scores agent rollouts via
compute_reward()
- delegate_tool.py — Subagent spawning (could orchestrate parallel evaluations)
- mixture_of_agents — Multi-model collaboration (complementary but different approach)
No existing open issues cover evolutionary optimization, prompt evolution, or LLM-driven code search.
Implementation Plan
Skill vs. Tool Classification
This should be a skill (Skills Hub, not bundled) because:
- The Darwinian Evolver is an external CLI tool installable via
uv pip install
- The agent orchestrates it via
terminal commands and file operations
- No custom Python integration or API key management needed in the agent harness
- It is specialized (not broadly useful to most users) → Skills Hub, not bundled
⚠️ License Note: The Darwinian Evolver is AGPL v3 (copyleft). Hermes Agent is MIT. The skill must treat the evolver as an external tool invoked via CLI — NOT as a Python import/dependency. This is a "mere aggregation" which is permitted.
What We'd Need
- A
SKILL.md with procedures for:
- Installing the evolver (
uv pip install darwinian-evolver or git clone)
- Defining custom problems (organism, evaluator, mutator templates)
- Running evolutions and interpreting results
- Common use cases with examples (prompt optimization, code generation, algorithm discovery)
- Template files for common problem types
- Helper scripts for parsing evolution results and visualizing lineage
Phased Rollout
Phase 1: Basic skill with prompt optimization example
- Install instructions and verification
- Template for prompt optimization problems
- Run evolution via CLI, parse results
- Lineage visualization guidance
Phase 2: Code evolution and multi-problem support
- Template for code evolution problems (using
GitBasedOrganism)
- Template for algorithm discovery
- Integration with Hermes Agent's trajectory data for evaluation
- Cost estimation guidance
Phase 3: Agent-assisted problem definition
- The agent helps users define custom organisms, evaluators, and mutators interactively
- Auto-generates problem.py files from natural language descriptions
- Monitors running evolutions and reports progress
Pros & Cons
Pros
- Unlocks a fundamentally new problem-solving paradigm (evolutionary search vs. one-shot generation)
- Proven 2-3x performance improvements on diverse tasks
- Open-source with active maintenance (last commit Feb 26, 2026)
- Complements existing Hermes capabilities (batch_runner, trajectories, RL training)
- Low integration cost as a skill — no codebase changes needed
Cons / Risks
- AGPL v3 license — Must remain an external tool, cannot be imported as a Python library
- Cost — Evolution runs consume significant API credits ($2-9/task for ARC-AGI-2, more for complex problems)
- Complexity — Defining good evaluators is the hard part; bad evaluation → bad evolution
- Niche use case — Most Hermes users won't need evolutionary optimization
- Dependency chain — Requires anthropic, openai, google-genai, numpy, pydantic, jinja2
Open Questions
- Should the skill ship with pre-built problem templates (e.g., "optimize this system prompt") or be more generic?
- Should we integrate with Hermes Agent's existing trajectory/evaluation infrastructure, or keep the skill self-contained?
- Is there demand for this among Hermes users, or is it too specialized?
References
Overview
Imbue has open-sourced the Darwinian Evolver, a lightweight framework that uses LLM-driven evolution to optimize code and prompts. It maintains populations of "organisms" (solutions), selects parents via weighted fitness+novelty sampling, applies LLM-powered mutations, and iterates — achieving 2-3x performance improvements over base model capabilities. Notably, it achieved 95.1% on ARC-AGI-2 with Gemini 3.1 Pro, and 34% with open-weights Kimi K2.5 (the highest open-weights score).
A Hermes Agent skill that teaches the agent how to install, configure, and run the Darwinian Evolver for user-defined optimization tasks would unlock evolutionary search as a powerful problem-solving tool. Users could evolve prompts, optimize code, discover algorithms, and tune agent configurations — all orchestrated by the agent.
Research source: LLM-based Evolution as a Universal Optimizer and Beating ARC-AGI-2 with Code Evolution
Research Findings
How the Darwinian Evolver Works
The framework operates on a simple evolutionary loop:
Selection — Sample parents from the population using sigmoid-scaled fitness × novelty bonus. The midpoint is set dynamically (e.g., 75th percentile) so the selector stays in the "high-gradient range" as the population improves.
Mutation — An LLM proposes targeted improvements based on specific failure cases. Advanced techniques include:
Post-mutation verification — Quick check: does the mutation fix the specific failure it targeted? If not, discard immediately without full evaluation. Imbue reports >10x cost/time improvement from this step alone.
Evaluation — Score the organism using ground truth, performance metrics, or LLM-based quality heuristics.
Integration — Add scored organisms back to the population atomically (no mid-iteration visibility).
Key Architecture (from source code)
The framework requires three user-defined components:
It also provides
GitBasedOrganismfor evolving code tracked in Git repos, andEvolveProblemLoopfor managing the full lifecycle (snapshots, resumption, statistics).Key Design Decisions
Performance Data
Current State in Hermes Agent
Hermes Agent currently has no evolutionary optimization capability. Related existing components:
compute_reward()No existing open issues cover evolutionary optimization, prompt evolution, or LLM-driven code search.
Implementation Plan
Skill vs. Tool Classification
This should be a skill (Skills Hub, not bundled) because:
uv pip installterminalcommands and file operationsWhat We'd Need
SKILL.mdwith procedures for:uv pip install darwinian-evolverorgit clone)Phased Rollout
Phase 1: Basic skill with prompt optimization example
Phase 2: Code evolution and multi-problem support
GitBasedOrganism)Phase 3: Agent-assisted problem definition
Pros & Cons
Pros
Cons / Risks
Open Questions
References