Skip to content

[Feature]: Adaptive delegation policy — learn which tasks pay off delegating #9557

@easyvibecoding

Description

@easyvibecoding

Problem

Current delegation policy in Hermes is static: the orchestrator picks a profile (#4928), a model tier (#7929), or a chain (#7481) based on configured rules or the agent's in-context judgment. There is no feedback loop from delegation outcomes back into future delegation decisions — if claude -p just wasted 90s on a task that the local model would have solved in 30s, nothing stops the orchestrator picking the same route next time.

Hermes already has the pieces a learning policy needs:

What's missing is an adaptive layer that consumes outcomes and biases future dispatch.

Proposal

An opt-in, profile-scoped delegation policy that maps task features → recommended dispatch:

# config.yaml
delegation:
  policy:
    enabled: true
    store: hindsight                    # where outcomes persist
    features:
      - task_kind                       # typo-fix | cross-file-edit | unfamiliar-repo | …
      - repo_fingerprint
      - rough_token_estimate
      - parent_model_tier
    decision_space:
      - ["local", null]                 # keep it in-process
      - ["delegate", "fast-worker"]
      - ["delegate", "strong-worker"]
    update_rule: thompson_sampling      # or: ucb, epsilon_greedy, off
    cold_start:
      seed_from: ~/.hermes/delegation_seed.jsonl

At delegate_task call time the orchestrator asks the policy "given these features, which dispatch wins?"; at completion it reports (features, dispatch, wall_time_ms, cost_tokens, succeeded) back. The policy updates its posterior and the next similar task gets a better pick.

Why it matters

  • Cost — the capability gap between orchestrator and worker is modest on easy-to-medium tasks; uniformly delegating pays worker cost for no capability gain. An adaptive policy converges on delegating only where the capability gap actually pays off.
  • No-code personalization — different users have different models / hardware / providers; a shared static config can't express "delegate to Claude Opus on hard-refactor, stay local for simple edits" for everyone at once. A learned policy adapts per user.
  • Grounds existing abstractionsFeature: add named delegation capability profiles for subagents #4928 profiles and feat: delegation model tiers + list_models tool (builds on #7586) #7929 tiers give the names to choose between; this issue gives the mechanism for choosing wisely.

Cold start

There's already a real seed dataset: an internal benchmark we ran of 30 trials × 3 delegation strategies × 10 tasks (strategies A: never-delegate / B: always-delegate / C: rule-based SOUL.md). Results: A won 7/10, C won 2/10 (the two hardest), B won 1/10. That's 30 labeled (task_features, dispatch, wall_time, succeeded) rows ready to bootstrap a policy, plus a ready-made test bed for regression testing the policy head.

Happy to contribute the benchmark data as a delegation_seed.jsonl if this is useful.

Non-goals

  • Not replacing the agent's in-context reasoning — the policy's output is a prior, the agent can still override when it has task-specific knowledge the features don't capture.
  • Not a training pipeline — a Thompson-sampling / UCB posterior over a small discrete action set is enough; no model training, no GPU.
  • Not a new backend — persists via the existing Hindsight store so there's one source of truth.

Cross-refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetool/delegateSubagent delegationtype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions