Feature: Darwinian Evolver Skill — Evolutionary Code & Prompt Optimization

## Overview

Imbue has open-sourced the [Darwinian Evolver](https://github.com/imbue-ai/darwinian_evolver), a lightweight framework that uses LLM-driven evolution to optimize code and prompts. It maintains populations of "organisms" (solutions), selects parents via weighted fitness+novelty sampling, applies LLM-powered mutations, and iterates — achieving **2-3x performance improvements** over base model capabilities. Notably, it achieved [95.1% on ARC-AGI-2](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/) with Gemini 3.1 Pro, and [34% with open-weights Kimi K2.5](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/) (the highest open-weights score).

A Hermes Agent **skill** that teaches the agent how to install, configure, and run the Darwinian Evolver for user-defined optimization tasks would unlock evolutionary search as a powerful problem-solving tool. Users could evolve prompts, optimize code, discover algorithms, and tune agent configurations — all orchestrated by the agent.

**Research source:** [LLM-based Evolution as a Universal Optimizer](https://imbue.com/research/2026-02-27-darwinian-evolver/) and [Beating ARC-AGI-2 with Code Evolution](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/)

---

## Research Findings

### How the Darwinian Evolver Works

The framework operates on a simple evolutionary loop:

1. **Selection** — Sample parents from the population using sigmoid-scaled fitness × novelty bonus. The midpoint is set dynamically (e.g., 75th percentile) so the selector stays in the "high-gradient range" as the population improves.

2. **Mutation** — An LLM proposes targeted improvements based on specific failure cases. Advanced techniques include:
   - **Batch mutations** — Multiple failure cases provided simultaneously (like mini-batches in SGD)
   - **Learning log** — History of past mutations and their outcomes, provided as code diffs
   - **Crossover mutations** — Combining logic from 2-3 different parents (25% frequency)
   - **Randomized mutation strength** — Varying between incremental fixes and "outside the box" thinking

3. **Post-mutation verification** — Quick check: does the mutation fix the specific failure it targeted? If not, discard immediately without full evaluation. Imbue reports >10x cost/time improvement from this step alone.

4. **Evaluation** — Score the organism using ground truth, performance metrics, or LLM-based quality heuristics.

5. **Integration** — Add scored organisms back to the population atomically (no mid-iteration visibility).

### Key Architecture (from source code)

The framework requires three user-defined components:

```python
# 1. Organism — the thing being evolved
class MyOrganism(Organism):
    prompt_template: str

# 2. Evaluator — scores organisms and identifies failure cases
class MyEvaluator(Evaluator[MyOrganism, EvaluationResult, MyFailureCase]):
    def evaluate(self, organism: MyOrganism) -> EvaluationResult:
        return EvaluationResult(score=0.85, trainable_failure_cases=[...])

# 3. Mutator — LLM-powered agent that proposes fixes
class MyMutator(Mutator[MyOrganism, MyFailureCase]):
    def mutate(self, organism, failure_cases, learning_log_entries):
        return [MyOrganism(prompt_template=improved_prompt)]
```

It also provides `GitBasedOrganism` for evolving code tracked in Git repos, and `EvolveProblemLoop` for managing the full lifecycle (snapshots, resumption, statistics).

### Key Design Decisions

- **Population diversity** via novelty bonus prevents evolutionary dead ends
- **Separate train/score datasets** prevent overfitting to specific failure cases
- **Concurrent execution** with configurable parallelism for mutations and evaluations
- **Pickle-based snapshots** enable pause/resume of long evolution runs
- **Provider-agnostic** — supports Anthropic, OpenAI, and Google GenAI

### Performance Data

| Model | Base Score | Evolved Score | Improvement | Cost/Task |
|:---|:---|:---|:---|:---|
| Kimi K2.5 (open-weights) | 12.1% | 34.0% | 2.8x | $2.67 |
| Gemini 3 Flash | 34.0% | 61.4% | 1.8x | $2.42 |
| Gemini 3.1 Pro | 88.1% | 95.1% | +7% | $8.71 |

---

## Current State in Hermes Agent

Hermes Agent currently has no evolutionary optimization capability. Related existing components:
- **batch_runner.py** — Parallel batch processing with trajectory saving (could provide evaluation signals)
- **agent/trajectory.py** — Trajectory saving in ShareGPT format
- **RL Training (Tinker-Atropos)** — RL-based training that scores agent rollouts via `compute_reward()`
- **delegate_tool.py** — Subagent spawning (could orchestrate parallel evaluations)
- **mixture_of_agents** — Multi-model collaboration (complementary but different approach)

No existing open issues cover evolutionary optimization, prompt evolution, or LLM-driven code search.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** (Skills Hub, not bundled) because:
- The Darwinian Evolver is an external CLI tool installable via `uv pip install`
- The agent orchestrates it via `terminal` commands and file operations
- No custom Python integration or API key management needed in the agent harness
- It is specialized (not broadly useful to most users) → Skills Hub, not bundled

**⚠️ License Note:** The Darwinian Evolver is **AGPL v3** (copyleft). Hermes Agent is **MIT**. The skill must treat the evolver as an external tool invoked via CLI — NOT as a Python import/dependency. This is a "mere aggregation" which is permitted.

### What We'd Need

1. A `SKILL.md` with procedures for:
   - Installing the evolver (`uv pip install darwinian-evolver` or `git clone`)
   - Defining custom problems (organism, evaluator, mutator templates)
   - Running evolutions and interpreting results
   - Common use cases with examples (prompt optimization, code generation, algorithm discovery)
2. Template files for common problem types
3. Helper scripts for parsing evolution results and visualizing lineage

### Phased Rollout

**Phase 1: Basic skill with prompt optimization example**
- Install instructions and verification
- Template for prompt optimization problems
- Run evolution via CLI, parse results
- Lineage visualization guidance

**Phase 2: Code evolution and multi-problem support**
- Template for code evolution problems (using `GitBasedOrganism`)
- Template for algorithm discovery
- Integration with Hermes Agent's trajectory data for evaluation
- Cost estimation guidance

**Phase 3: Agent-assisted problem definition**
- The agent helps users define custom organisms, evaluators, and mutators interactively
- Auto-generates problem.py files from natural language descriptions
- Monitors running evolutions and reports progress

---

## Pros & Cons

### Pros
- Unlocks a fundamentally new problem-solving paradigm (evolutionary search vs. one-shot generation)
- Proven 2-3x performance improvements on diverse tasks
- Open-source with active maintenance (last commit Feb 26, 2026)
- Complements existing Hermes capabilities (batch_runner, trajectories, RL training)
- Low integration cost as a skill — no codebase changes needed

### Cons / Risks
- **AGPL v3 license** — Must remain an external tool, cannot be imported as a Python library
- **Cost** — Evolution runs consume significant API credits ($2-9/task for ARC-AGI-2, more for complex problems)
- **Complexity** — Defining good evaluators is the hard part; bad evaluation → bad evolution
- **Niche use case** — Most Hermes users won't need evolutionary optimization
- **Dependency chain** — Requires anthropic, openai, google-genai, numpy, pydantic, jinja2

---

## Open Questions

- Should the skill ship with pre-built problem templates (e.g., "optimize this system prompt") or be more generic?
- Should we integrate with Hermes Agent's existing trajectory/evaluation infrastructure, or keep the skill self-contained?
- Is there demand for this among Hermes users, or is it too specialized?

---

## References

- [LLM-based Evolution as a Universal Optimizer](https://imbue.com/research/2026-02-27-darwinian-evolver/) — Imbue blog post
- [Beating ARC-AGI-2 with Code Evolution](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/) — ARC-AGI-2 application
- [imbue-ai/darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver/) — Source code (AGPL v3)
- [Darwin Gödel Machines](https://arxiv.org/abs/2505.22954) — Sakana AI paper that inspired the framework
- [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) — Google DeepMind's similar approach
- [The New AI Darwinism](https://medium.com/@roberto.g.infante/the-new-ai-darwinism-how-evolutionary-llm-based-coding-systems-are-rewriting-themselves-27b3f971f9e3) — Comparative overview

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Darwinian Evolver Skill — Evolutionary Code & Prompt Optimization #336

Overview

Research Findings

How the Darwinian Evolver Works

Key Architecture (from source code)

Key Design Decisions

Performance Data

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Base Score	Evolved Score	Improvement	Cost/Task
Kimi K2.5 (open-weights)	12.1%	34.0%	2.8x	$2.67
Gemini 3 Flash	34.0%	61.4%	1.8x	$2.42
Gemini 3.1 Pro	88.1%	95.1%	+7%	$8.71

Feature: Darwinian Evolver Skill — Evolutionary Code & Prompt Optimization #336

Description

Overview

Research Findings

How the Darwinian Evolver Works

Key Architecture (from source code)

Key Design Decisions

Performance Data

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions