Skip to content

[Feature] Centralised agent-routing config — agent-routing.yaml in private repo, propagates to .claude/agents/*.md at SessionStart #351

@atlas-apex

Description

@atlas-apex

User Story

As an operator running ApexYard who wants to configure the agent → model mapping in one file (rather than editing 22+ agent-frontmatter entries by hand), I want a centralised agent-routing.yaml that lives in my private portfolio repo (split-portfolio mode) or a gitignored fork-root file (single-fork mode), so that:

  1. Per-agent model assignments (Haiku for QA, Opus for SRE, local for Idris, etc.) live in ONE place I edit.
  2. The public fork's .claude/agents/*.md files stay shipped with FRAMEWORK DEFAULTS — my adopter-specific choices never leak to the public fork's commit history.
  3. Local-model routing (per the [Spike] Local-model routing feasibility for ticket-manager, Data Analyst, QA Engineer via LiteLLM → Ollama #348 spike outcome) is wired the same way as remote-model routing — the routing config carries endpoint: + auth env-var refs for adopters who run Ollama / LiteLLM / Bedrock / Vertex behind their own proxy.
  4. Switching an agent from Sonnet to Opus, or from Claude API to local Ollama, is a one-line YAML edit + a re-session — not a hunt-and-replace across 22 markdown files.

Acceptance Criteria

File + schema

  • Adopter-facing config at <private_repo>/agent-routing.yaml (split-portfolio) OR <fork_root>/agent-routing.yaml (gitignored in single-fork)

  • YAML schema:

    # agent-routing.yaml — per-agent model + endpoint overrides
    agents:
      <agent-name>:
        model: <model-spec>           # e.g. opus, sonnet, haiku, ollama/qwen2.5-coder:14b, bedrock/...
        endpoint: <url>               # optional, for local / proxy routing
        env:                          # optional, additional env vars set for invocations
          OPENAI_API_KEY: $OPENAI_KEY # supports env-var refs
          AWS_REGION: eu-west-2
        timeout_seconds: 60           # optional, default = framework default
  • Schema documented in the file's leading comment + in docs/multi-project.md "Daily workflow under split mode" section

  • Adopters with NO agent-routing.yaml get framework defaults (zero-config zero-behaviour-change)

  • An empty agent-routing.yaml (with just agents: {}) is also zero-config — equivalent to "no file"

Sync mechanism

  • SessionStart hook apply-agent-routing.sh runs at session start:
    • Reads agent-routing.yaml from the resolved path (via _lib-portfolio-paths.sh if it gains an agent_routing key, or via convention)
    • For each entry in agents:, rewrites the matching .claude/agents/<name>.md frontmatter — replaces model: if present; adds it otherwise. Same for allowed-tools: if specified (advanced override; off by default).
    • Sets endpoint-related env vars (ANTHROPIC_BASE_URL, AWS_REGION, etc.) per-agent into a .claude/session/agent-env/<name>.env per-agent file that Claude Code reads when invoking that specific agent (if Claude Code supports per-agent env — verify in implementation; alternative: rely on ANTHROPIC_BASE_URL being scope-leaked to all agents which is fine if every agent uses the same proxy)
  • Idempotent — re-running the hook doesn't compound changes
  • Silent on success; one-line banner per applied override on first run after agent-routing.yaml edit
  • Pre-commit guard (sibling hook): if .claude/agents/*.md has a model: line that doesn't match the FRAMEWORK DEFAULT (from a checksum or a sibling .claude/agents/<name>.md.default file), block the commit with "Your routing config is leaking into the agent file — revert this model: change and re-apply via agent-routing.yaml."

Drift prevention

  • Smoke test at .claude/hooks/tests/test_agent_routing_drift.sh — fixture: plant a routing-config edit, run the SessionStart hook, assert (a) the affected agent file's frontmatter changed, (b) other agent files unchanged, (c) the change reverts cleanly when the routing-config entry is removed
  • AgDR documents the design choice (sync-from-YAML vs frontmatter-only-with-CLI-tool vs per-user-overlay vs ENV-only) + the local-routing integration shape

Split-portfolio integration

  • _lib-portfolio-paths.sh gains portfolio_agent_routing resolver — returns <private_repo>/agent-routing.yaml in split-portfolio mode, <fork_root>/agent-routing.yaml in single-fork (gitignored), or empty if no file exists
  • .gitignore in single-fork mode includes agent-routing.yaml so adopter customisations don't leak to the public fork
  • /setup --split-portfolio (and the migration path) seeds a starter agent-routing.yaml template in the private repo with framework defaults documented as comments
  • docs/multi-project.md updated with the routing-config setup section + edit-workflow

Public-fork hygiene

  • Public fork's .claude/agents/*.md files stay committed with FRAMEWORK DEFAULTS in the matrix from [Feature] Promote all 19 role definitions to Claude Code sub-agents (per-role model + tool restriction + isolated context) #347
  • Adopter's local-only model: rewrites (post-hook) are gitignored — either via .gitattributes clean-filter OR by writing to a sibling shadow file the framework reads instead of the canonical agent file. (Decide-and-go in AgDR — recommendation: clean-filter is the cleanest, but a shadow file is the simplest.)
  • Pre-push guard sweep — any .claude/agents/*.md with a model: line different from the committed default fails the push with a clear error pointing at agent-routing.yaml

Phased delivery

  • PR 1 — Schema + skeleton: agent-routing.yaml template + _lib-portfolio-paths.sh resolver + AgDR documenting the design + docs/multi-project.md section. NO sync hook yet — adopters can author the file, but it's a no-op until the hook ships in PR 2.
  • PR 2 — Sync hook: apply-agent-routing.sh SessionStart hook + drift-prevention smoke test + pre-commit guard
  • PR 3 — Setup integration: /setup --split-portfolio seeds the file with commented defaults; single-fork mode adds the gitignore entry + a one-time seed when --seed-agent-routing is passed
  • PR 4 — Local-routing entries (depends on [Spike] Local-model routing feasibility for ticket-manager, Data Analyst, QA Engineer via LiteLLM → Ollama #348 spike outcome): if the spike confirms local routing works for any of the 3 candidates, add the local-routing example entries to the seeded template + document the LiteLLM + Ollama setup in docs/multi-project.md

PRs 1-3 can land independently of #348. PR 4 depends on the spike's verdict.

Design notes

Why one config file vs editing 22+ frontmatters

  • Adopter cognitive load: one source of truth, not "find every agent that runs Haiku" via grep
  • Public fork hygiene: routing decisions stay in private repo
  • Migration ergonomics: switching a deployment from Claude API to local Ollama is one section edit, not 22
  • Reviewability: a diff on agent-routing.yaml is small + self-describing

Why sync-at-SessionStart vs CLI tool

  • Adopters don't run a setup CLI on every session — they edit the file and re-enter. The hook makes this transparent.
  • Idempotence is easy to enforce — the hook is the only writer of model: lines post-edit.

Why this is its OWN ticket (not folded into #347)

#347 is about agent-file structure + per-role default models. This ticket is about ADOPTER CUSTOMISATION — a layer on top. Splitting keeps each concern reviewable + each PR-set shippable independently. #347's PRs land first (the agent files); this ticket's PRs land second (the customisation surface).

Cross-references

Risks

  • Claude Code may not support per-agent env vars — if ANTHROPIC_BASE_URL is session-scoped (not per-agent invocation), the local-routing config can't mix-and-match remote + local on the same session. Spike output will inform whether we need a global-route mode (all-agents-local) instead of per-agent mode.
  • Pre-commit guard false-positives — if an adopter intentionally edits a model: line in an agent file (e.g. for an immediate one-off override they don't want in the routing config), the guard blocks them. Mitigation: an explicit # routing-config:override <reason> comment in the agent file bypasses the guard. Same pattern as <!-- agdr: not-applicable -->.
  • YAML schema migration — once shipped, adopters depend on the shape. Schema changes need a deprecation flag + migration helper (per the existing apexyard.projects.yaml shape-change pattern).
  • Adopter forgets to re-session after editing — clear UX from the hook: one-line "5 agent overrides applied from agent-routing.yaml" banner at SessionStart.

Out of scope

  • A web UI for editing the routing config — adopters edit YAML directly
  • Per-task / per-invocation model overrides — that's claude --model for one-off use, not routing-config scope
  • Auto-detection of local Ollama servers — adopter declares endpoint explicitly
  • Cost dashboards / per-agent usage tracking — separate concern, file if needed
  • Multi-region / multi-cloud routing rules (e.g. "use bedrock when in EU") — single-endpoint per agent in v1

Glossary

Term Definition
Agent-routing config The agent-routing.yaml file — adopter's per-agent model + endpoint mapping
Framework default The model: line in .claude/agents/<name>.md as committed to the public fork (Claude API model from the #347 matrix)
Sync hook apply-agent-routing.sh — SessionStart hook that rewrites agent-file frontmatter from the routing config
LiteLLM OSS proxy that translates Anthropic-shape requests to OpenAI / Ollama / Bedrock / Vertex backends
Endpoint override The endpoint: field in a routing entry — sets ANTHROPIC_BASE_URL for that agent's invocations
Drift prevention Pre-commit guard that blocks accidental commits of routing-config rewrites to the public fork
Shadow file (option) Alternative to in-place rewrite — sync hook writes to a sibling .claude/agents-effective/<name>.md; Claude Code is pointed at the shadow dir
Clean filter (option) Alternative to in-place rewrite — .gitattributes clean-filter strips the rewritten model: line at git-add time, so the working tree is "dirty" but commits stay framework-default

Refs #347 (agent promotion + matrix this consumes) / #348 (local-routing spike whose verdict feeds PR 4) / AgDR-0021 (split-portfolio v2 path resolution) / AgDR-0023 (custom-templates override-semantics — same pattern for adopter customisation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — plan-worthy, not urgentenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions