[Feature]: Adaptive delegation policy — learn which tasks pay off delegating

## Problem

Current delegation policy in Hermes is *static*: the orchestrator picks a profile (#4928), a model tier (#7929), or a chain (#7481) based on configured rules or the agent's in-context judgment. There is no feedback loop from delegation outcomes back into future delegation decisions — if `claude -p` just wasted 90s on a task that the local model would have solved in 30s, nothing stops the orchestrator picking the same route next time.

Hermes already has the pieces a learning policy needs:

- **Hindsight** records execution traces and outcomes.
- **#4928** / **#7929** / **#5586** provide the *dispatch vocabulary* (profile, tier, async) a policy can output.
- **delegate_task** knows wall-time, token count, and success/failure per call.

What's missing is an adaptive layer that consumes outcomes and biases future dispatch.

## Proposal

An opt-in, profile-scoped **delegation policy** that maps `task features → recommended dispatch`:

```yaml
# config.yaml
delegation:
  policy:
    enabled: true
    store: hindsight                    # where outcomes persist
    features:
      - task_kind                       # typo-fix | cross-file-edit | unfamiliar-repo | …
      - repo_fingerprint
      - rough_token_estimate
      - parent_model_tier
    decision_space:
      - ["local", null]                 # keep it in-process
      - ["delegate", "fast-worker"]
      - ["delegate", "strong-worker"]
    update_rule: thompson_sampling      # or: ucb, epsilon_greedy, off
    cold_start:
      seed_from: ~/.hermes/delegation_seed.jsonl
```

At `delegate_task` call time the orchestrator asks the policy "given these features, which dispatch wins?"; at completion it reports `(features, dispatch, wall_time_ms, cost_tokens, succeeded)` back. The policy updates its posterior and the next similar task gets a better pick.

## Why it matters

- **Cost** — the capability gap between orchestrator and worker is modest on easy-to-medium tasks; uniformly delegating pays worker cost for no capability gain. An adaptive policy converges on delegating only where the capability gap actually pays off.
- **No-code personalization** — different users have different models / hardware / providers; a shared static config can't express "delegate to Claude Opus on hard-refactor, stay local for simple edits" for everyone at once. A learned policy adapts per user.
- **Grounds existing abstractions** — #4928 profiles and #7929 tiers give the *names* to choose between; this issue gives the *mechanism* for choosing wisely.

## Cold start

There's already a real seed dataset: an internal benchmark we ran of 30 trials × 3 delegation strategies × 10 tasks (strategies A: never-delegate / B: always-delegate / C: rule-based SOUL.md). Results: A won 7/10, C won 2/10 (the two hardest), B won 1/10. That's 30 labeled `(task_features, dispatch, wall_time, succeeded)` rows ready to bootstrap a policy, plus a ready-made test bed for regression testing the policy head.

Happy to contribute the benchmark data as a `delegation_seed.jsonl` if this is useful.

## Non-goals

- Not replacing the agent's in-context reasoning — the policy's output is a *prior*, the agent can still override when it has task-specific knowledge the features don't capture.
- Not a training pipeline — a Thompson-sampling / UCB posterior over a small discrete action set is enough; no model training, no GPU.
- Not a new backend — persists via the existing Hindsight store so there's one source of truth.

## Cross-refs

- #4928 named delegation profiles — provides the dispatch targets.
- #7929 delegation model tiers + list_models — provides tier-level dispatch targets.
- #7481 per-delegation fallback provider chain — fallback is orthogonal; policy picks the *primary*, chain handles failures.
- #5586 async_delegation — adaptive policy can include "async vs sync" in the decision space.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Adaptive delegation policy — learn which tasks pay off delegating #9557

Problem

Proposal

Why it matters

Cold start

Non-goals

Cross-refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Adaptive delegation policy — learn which tasks pay off delegating #9557

Description

Problem

Proposal

Why it matters

Cold start

Non-goals

Cross-refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions