Skip to content

feat(delegate): model tiers + list_models tool (builds on #7586)#7957

Open
malaiwah wants to merge 131 commits into
NousResearch:mainfrom
malaiwah:feat/delegation-tiers-list-models
Open

feat(delegate): model tiers + list_models tool (builds on #7586)#7957
malaiwah wants to merge 131 commits into
NousResearch:mainfrom
malaiwah:feat/delegation-tiers-list-models

Conversation

@malaiwah

Copy link
Copy Markdown
Contributor

Summary

Builds on @Labhund's per-task model parameter (#7586) with two additions:

  1. Named model tierssmall / medium / large mapped to configured models
  2. list_models tool — agent discovers available models and tiers at runtime

What This Adds (on top of #7586)

Model Tiers

delegation:
  model_tiers:
    small: google/gemini-flash-2.0
    # medium: inherits parent model
    large: anthropic/claude-opus-4-6

The agent passes tier names instead of deployment-specific model IDs:

delegate_task(goal="List all .py files", model="small")        # → fast model
delegate_task(goal="Review this PR for security", model="large") # → strong model

Tier names resolve via _resolve_model_or_tier() before entering #7586's _resolve_model_override() pipeline. Unknown tiers fall back to the default model.

list_models Tool

Returns available models with tier assignments, context lengths, and providers. Registered in the delegation toolset.

{
  "models": [
    {"name": "qwen35-397b", "provider": "litellm", "context_length": 524288, "is_default": true},
    {"name": "gemma4-nothink", "tier": "small", ...},
    {"name": "claude-sonnet-4-6", "tier": "large", ...}
  ],
  "tiers": {"small": "gemma4-nothink", "large": "claude-sonnet-4-6"}
}

Why Tiers Matter

With #7586, the agent needs to know exact model names per deployment. Tiers abstract this: a skill or system prompt can say "use small for lookups" without knowing whether "small" is Gemini Flash, Gemma 4, or Haiku on this particular instance.

Credits

Foundation: @Labhund's per-task model parameter and _resolve_model_override() from #7586. This PR cherry-picks those commits and adds the tier + discovery layer on top.

Fixes #7929

Test plan

  • Tier resolution: model="small" resolves to configured tier model
  • Unknown tier falls back to default model
  • list_models returns models with tier labels
  • Direct model names still work (bypass tier resolution)
  • Per-task model in batch mode works with tiers
  • No regression when model_tiers is unconfigured

🤖 Generated with Claude Code

Hermes Agent (angelos) and others added 30 commits April 8, 2026 03:05
- Add DEFAULT_ALLOWED_TOOLSETS including 'mcp' to enable MCP tools for subagents
- Make BLOCKED_TOOLSET_NAMES configurable (was hardcoded)
- Subagents now inherit MCP access from parent when available
- Fixes subagent limitation where only terminal+process were available
- Allows subagents to use SearXNG and Crawl4AI MCP servers
- Reorder imports to top of file (E402)
- Add noqa comment for registry import (circular import requirement)
- Readd missing constants: MAX_DEPTH, MAX_CONCURRENT_CHILDREN, DEFAULT_MAX_ITERATIONS
- All ruff checks now passing
- Remove 'memory' from BLOCKED_TOOLSET_NAMES in delegate_tool.py
- Add 'memory' to DEFAULT_ALLOWED_TOOLSETS for subagent access
- Add DEFAULT_SUBAGENT_MEMORY_MODE = 'read_only' configuration
- Modify memory_tool() to accept is_subagent and subagent_memory_mode params
- Enforce memory write blocking in read_only mode for subagents
- Support three modes: 'read_only', 'full', 'none'
- Add comprehensive tests for subagent memory access
- Maintains backward compatibility with existing memory tests

Benefits:
✅ Subagents can now query Honcho observations dialectically
✅ Subagents can read MEMORY.md and USER.md for context
✅ Subagents blocked from writing (prevents memory pollution)
✅ Parent agent remains sole writer of memory
✅ Enables orchestrator/coordinator pattern with long-lived subagents
✅ Configurable per-subagent or global default
- docker.py: remove --pids-limit (unavailable without cgroup delegation),
  add _cgroup_limits_available() probe for --cpus/--memory
- delegate_tool.py: add "browser" to DEFAULT_ALLOWED_TOOLSETS
- Dockerfile: build from source, add podman-remote shim, wait-for-honcho
- docker/wait-for-honcho.sh: poll Honcho API before starting gateway
…tation)

- delegate_tool.py: Set skip_memory=False and pass subagent_memory_mode to child agents
- delegate_tool.py: Add _is_subagent=True flag for memory tool access control
- run_agent.py: Pass is_subagent and subagent_memory_mode to memory_tool calls
- memory_tool.py: Enforce read-only mode for subagents (blocks add/replace/remove)
- memory_tool.py: Support three modes: 'read_only' (default), 'full', 'none'

Benefits:
✅ Subagents can read MEMORY.md and USER.md for context
✅ Subagents can query Honcho observations dialectically
✅ Subagents blocked from memory writes (prevents pollution)
✅ Parent agent remains sole writer of memory
✅ Configurable per-subagent via delegation config
✅ Enables orchestrator/coordinator pattern with long-lived subagents

Tests:
✅ test_memory_subagent_readonly.py (4/4 passed)
✅ test_mcp_subagent_access.py (4/4 passed)
✅ Existing memory tests (33/33 passed)
✅ Existing delegate tests (5/5 passed)
- Added _cleanup_orphaned_containers() function to gateway/run.py
- Automatically removes exited/dead/created hermes-* containers on startup
- Prevents container accumulation from crashes, OOM kills, or manual stops
- Logs cleanup activity with INFO level for visibility
- Added comprehensive documentation in docs/CONTAINER_CLEANUP.md
- Includes manual cleanup commands and CLI design proposal

Benefits:
✅ No more manual container cleanup needed
✅ Recovers gracefully from crashes
✅ Reduces disk space usage from stale containers
✅ Improves system hygiene automatically
✅ Safe - only removes non-running containers

Manual cleanup (if needed before deploying):
  podman ps -a --filter 'name=^hermes-' --filter 'status=exited' -q | xargs -r podman rm -f
Remove GitHub Actions workflows (deploy-site, docker-publish, tests, nix,
supply-chain-audit, docs-site-checks) — these are for upstream's GitHub CI.

Add .gitea/workflows/build-push.yml: builds the container image and pushes
to Gitea's container registry on every push to main.
- run_agent.py: Add _shared_memory_store parameter to AIAgent.__init__()
- run_agent.py: Use shared memory store when provided (subagents)
- run_agent.py: Add _is_subagent flag for access control
- delegate_tool.py: Pass parent's _memory_store to subagents
- Subagents now share parent's memory store (read-only enforced)
- Memory writes still blocked for subagents via memory_tool.py enforcement

Benefits:
✅ Subagents can READ from MEMORY.md and USER.md
✅ Subagents can query Honcho observations
✅ Subagents blocked from memory writes (add/replace/remove)
✅ Parent remains sole writer of memory
✅ Shared store prevents duplicate memory initialization
✅ Enables orchestrator pattern with long-lived subagents

Technical:
- Subagents share parent's MemoryStore instance
- Read-only enforcement in memory_tool.py still active
- No duplicate memory loading for subagents
- Memory stays consistent across parent + subagents
delegate_tool.py passes subagent_memory_mode and _is_subagent to
AIAgent() but AIAgent.__init__ did not declare subagent_memory_mode,
causing an unexpected keyword argument crash on every delegate_task call.

Added the parameter to the signature and stored both self._is_subagent
and self.subagent_memory_mode as instance attributes (previously only
accessed via getattr with defaults, never stored).
…ra_body config

- terminal_tool.py: add docker_forward_env to container_config dict passed to
  _create_environment — it was read from config but never propagated, so
  docker exec calls were built with no -e flags and credentials were never
  forwarded into sandbox containers

- run_agent.py: add subagent_memory_mode param to AIAgent.__init__ — delegate_tool
  was passing it but AIAgent didn't accept it, crashing every delegate_task call;
  also store self._is_subagent and self._config_extra_body; add model.extra_body
  config support so extra_body fields (e.g. enable_thinking: false) are merged
  into every API call

- tools/environments/docker.py: log exact docker exec command at WARNING level
  (secrets masked) for debugging env forwarding

- Dockerfile: add logging to podman-remote shim — appends full command to
  /opt/data/logs/shim.log for introspection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g to debug

- gateway/run.py: move _cleanup_orphaned_containers() before main() so it
  is defined before it is called; call it before asyncio.run() so cleanup
  happens at startup, not after the gateway exits
- gateway/run.py: replace hardcoded "podman" with find_docker() so the
  function respects the configured docker/shim executable
- docker.py: downgrade exec command log from WARNING to DEBUG (too noisy
  for normal operation; shim.log already captures all podman-remote calls)
- tools/: delete leftover delegate_tool.py.patch artifact

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows config.yaml to specify which Docker/Podman network sandbox
containers join via `terminal.docker_network`. When set, passes
`--network <name>` to docker run so containers can resolve hostnames
of other services on that network (e.g. hermes-litellm on hermes-net).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When pricing/cost estimation calls fetch_endpoint_model_metadata without
an api_key (e.g. from insights._get_pricing), the function made
unauthenticated requests to the /models endpoint causing repeated 401
errors every 5 minutes (cache TTL). Now falls back to LITELLM_KEY env
var so requests to the proxy are authenticated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docker_network was added to DockerEnvironment (docker.py) and terminal_tool.py
but the config.yaml → TERMINAL_DOCKER_NETWORK env var bridge was missing in
both code paths:
- cli.py: used by `hermes chat`, `hermes model`, etc.
- gateway/run.py: used by `hermes gateway run`

Also add TERMINAL_DOCKER_NETWORK reading to _get_env_config() in
terminal_tool.py, and add docker_network to the container_config dict
that's built before calling _create_environment().

Without this, `docker_network: "hermes-net"` in config.yaml had no effect —
sandbox containers were always created on the default podman network and could
not resolve hermes-net hostnames like hermes-litellm.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ge config and invalid workdir bypass

- PR NousResearch#4350: Load config.yaml terminal block as fallback before hardcoded defaults
  - Fixes docker_image in config.yaml not being loaded
  - Adds cfg.get() fallbacks for all terminal config options

- PR NousResearch#4673: Don't clobber already-resolved absolute TERMINAL_CWD
  - Fixes invalid workdir bypassing terminal.cwd config
  - Skips config override when env var already has absolute path
- Install gnupg in system dependencies
- Enables GPG-signed emails from angelos-hermes@mailbox.org
- Supports git commit signing with GPG keys
- Rename Dockerfile to Containerfile (Podman convention)

Signed-off-by: Angelos <angelos-hermes@mailbox.org>
Previous run NousResearch#46 failed due to transient network issue.
This is a no-op commit to re-trigger the build pipeline.

Related: feat: Add gnupg package for GPG email signing
Previous commit had double-escaped backslashes (\\) which broke
the Dockerfile syntax. This fixes the RUN instructions to use
proper single backslashes for line continuation.

Fixes build failure in run NousResearch#47.
- GPG support moved to hermes-sandbox-image (where it belongs)
- Sandbox image is the correct location for agent tooling
- Reduces production container attack surface
- Follows separation of concerns:
  * hermes-agent: Production service container
  * hermes-sandbox: Ephemeral agent execution environment

Related: angelos/hermes-sandbox-image@4294f7f
angelos and others added 25 commits April 10, 2026 02:02
…rdrail

run_agent.py:3056 and test_agent_guardrails.py import
MAX_CONCURRENT_CHILDREN by name from delegate_tool. PR NousResearch#16 renamed it
to _DEFAULT_MAX_CONCURRENT_CHILDREN without adding the alias, breaking
every API call.
…coded 3

run_agent.py:_cap_delegate_task_calls was importing the hardcoded
MAX_CONCURRENT_CHILDREN (3) to pre-truncate delegate_task tool calls
at the agent-loop level. This ran BEFORE delegate_task() itself, so
the configurable delegation.max_concurrent_children in config.yaml
was dead on arrival — the guardrail silently dropped calls to 3
before the runtime check could read the config.

Now imports _get_max_concurrent_children() instead, which reads the
same config.yaml / env var / default chain. Both enforcement layers
respect the same configurable value.
…_max_concurrent_children() everywhere

No backwards-compat alias needed — all three consumers updated:
- run_agent.py (already done in previous commit)
- tests/run_agent/test_agent_guardrails.py
- tests/tools/test_delegate.py

The only exported name is now _get_max_concurrent_children() which
reads from config.yaml / env var / default.
The fork's execute() method uses threading, time, shlex, and
is_interrupted but the upstream merge took upstream's import block
which doesn't have them (upstream's _run_bash doesn't need them).

Added:
- import threading  (line 903: Thread for output draining)
- import time       (line 905: monotonic deadline)
- import shlex      (line 789: shell quoting)
- from tools.interrupt import is_interrupted  (line 908: interrupt check)

Verified via AST analysis — no other missing imports.
Two merge fallout fixes:
1. delegate_tool.py uses tool_error() 6 times but never imported it
   from tools.registry — every error path crashed with NameError.
2. test_batch_capped_at_3 expected the old silent-truncation behavior;
   updated to test the new error-on-excess behavior.
…d lifetime

Two changes to how sandbox containers are spawned:

1. Add --init to docker run. This uses tini as PID 1, which
   automatically reaps zombie child processes. Previously PID 1 was
   sleep(1) which doesn't call wait() — every background process that
   exited became a zombie, and the process tool reported them as
   "running" because zombie PIDs still exist in the process table.
   Fixes NousResearch#6908 (upstream).

2. Replace 'sleep 2h' with 'sleep infinity'. The fixed 2-hour lifetime
   was arbitrary and sometimes too short for long agent sessions. The
   idle reaper (terminal.lifetime_seconds, default 300s, configurable
   via config.yaml) already handles cleanup based on last activity —
   there's no reason for the container itself to have a fixed death
   timer. With sleep infinity, the container lives until the idle
   reaper kills it or the task ends.

Both changes are one line each. No config changes needed — the
existing terminal.lifetime_seconds config controls idle timeout.
The upstream merge (PR NousResearch#14) auto-resolved gateway/run.py's 1598-line
diff and silently dropped the entire self-nudge system (~55 lines in
gateway/run.py, ~37 in run_agent.py, ~5 in model_tools.py). This
broke notify_on_complete for background processes — the mechanism that
fires a hidden turn when a background process exits was gone.

Cherry-picked b8737bc ("feat(gateway): add one-shot self nudge tool")
on top of the merged state, resolving 3 conflicts in gateway/run.py
(media_message_callback + self_nudge_callback coexistence) and 1 in
tests/test_model_tools.py (kept both clarify + self_nudge tests).

The self-nudge system provides:
- _arm_self_nudge / _cancel_self_nudge / _fire_self_nudge in gateway
- self_nudge_callback on AIAgent for tool-initiated delayed turns
- self_nudge tool exposed on gateway platforms (not CLI)
- _pending_hidden_turns for injecting hidden messages on next turn
- Wired to notify_on_complete in terminal_tool.py via
  process_registry.pending_watchers
gateway/run.py uses uuid.uuid4() at lines 1139 (restart-resume) and
1263 (self-nudge) but uuid was never imported at module level — upstream
uses local 'import uuid as _uuid' inside functions, and our fork's
additions used bare uuid without adding the import. Self-nudge fired
but crashed with NameError: name 'uuid' is not defined.
The self-nudge path called _handle_message_with_agent directly without
sending a typing indicator first. The user saw the agent silently
process and respond with no 'typing...' feedback in Telegram.
_run_self_nudge_entry called _handle_message_with_agent but discarded
its return value. When streaming doesn't deliver (typical for self-nudge
turns since there's no user message to stream-edit), the agent's
response vanished — the model did the work but the user never saw the
result. Now captures the return and explicitly sends via adapter.send
if streaming didn't already deliver it.
Adds logger.info when a turn is routed to the cheap model, showing the
route label and first 80 chars of the user message. Added to both
gateway and CLI paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an optional 'model' parameter to delegate_task so the agent can
route subagents to a smaller/faster model for simple tasks (e.g.
summarization, formatting, lookups) while keeping the primary model
for complex reasoning.

Works at both levels:
- Top-level 'model' param for single-task delegation
- Per-task 'model' field in batch tasks array

The per-call model overrides delegation.model from config, which in
turn overrides inheriting the parent's model. Per-task model takes
precedence over top-level model in batch mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds tiered model selection for subagent delegation:
- Agent can pass model='small'/'medium'/'large' or a direct model name
- Tiers configured via delegation.model_tiers in config.yaml
- New list_models tool returns available models with tier assignments

Use cases:
- Delegate file exploration to a small/fast model
- Escalate to a large model when stuck on complex reasoning
- Spin up a peer review subagent on a stronger model
- Mixed batch: simple tasks on small, complex on default

Precedence: per-task model > top-level model param > delegation.model
config > inherit parent model. Tier names resolve to configured model
names; unknown tiers fall back to default.

Not yet upstreamed — local fork only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_models was only handled in the sequential dispatch path.
The concurrent path (used when multiple tools are called in one turn)
fell through to the default handler, causing the tool call to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_models was only in the 'delegation' toolset but composite toolsets
(hermes-cli, hermes-telegram, etc.) list tools directly without including
the delegation toolset. Added list_models alongside delegate_task in all
composite toolsets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oint

is_local_endpoint only matched IPs and localhost, missing Docker/Podman
DNS names like hermes-litellm. This caused stale stream timeouts (180s)
to fire on local LLM proxies instead of being auto-disabled.

Two fixes:
1. model.local_endpoints config: list of hostnames to treat as local
2. DNS resolution fallback: resolve hostname to IP, check if private

Not yet upstreamed — local fork only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hostnames without dots (e.g. hermes-litellm, ollama) are always on the
local network — Docker/Podman DNS, mDNS, or /etc/hosts. No need to
configure them explicitly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds max_context_tokens guard to smart routing: if conversation context
exceeds the threshold, stay on the primary model instead of routing to
the cheap model. The cheap model is meant to be fast — sending it a
large context defeats the purpose.

Changes:
- choose_cheap_model_route accepts context_tokens parameter
- Gateway estimates context from cached agent's session_prompt_tokens
  or from history length (4 chars ≈ 1 token)
- CLI estimates from conversation_history
- Log line now includes context token count

Config: smart_model_routing.max_context_tokens (default: 0 = disabled)

Not yet upstreamed — local fork only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When trim_context is enabled and context exceeds max_context_tokens,
trim conversation history from the head (keep most recent messages)
and still route to the cheap model — instead of falling back to the
primary model entirely.

For a simple "thanks!" in a 48K-token session, only the last ~32K
tokens of history are sent to the cheap model. The model can still
respond appropriately with recent context.

Config:
  smart_model_routing:
    max_context_tokens: 32000
    trim_context: true   # default: false (safe — skip route entirely)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reassigning the `history` parameter inside the trim block made Python
treat it as a local variable, causing UnboundLocalError on earlier
reads. Use _trimmed_history + _effective_history instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…routing

Lets the parent agent (or the LLM via tool-calling) pick a model per
subagent invocation, using the same resolution pipeline as the /model
slash command: aliases, direct mappings, catalog search, and credential
resolution. Per-task model beats top-level model, which beats
delegation.model config, which falls back to inheriting the parent.

This unlocks cost/speed/capability routing for subagent-driven
development — e.g. dispatch a haiku for a trivial lookup, a sonnet for
a moderate refactor, and glm-4.7 for a bulk research task, all inside
a single delegate_task batch call.

Changes:
- tools/delegate_tool.py
  - New _resolve_model_override() helper that wraps switch_model() and
    returns a credential bundle compatible with _build_child_agent's
    override_* params. Strips --global to ensure per-task overrides
    never persist to config.yaml.
  - delegate_task() gains an optional model= kwarg, threaded through
    task normalization and the child-build loop so each subagent can
    resolve credentials independently.
  - DELEGATE_TASK_SCHEMA advertises the new model field at the top
    level and inside each task object, with descriptions the LLM can
    use to decide when to route to which model.
  - Registry handler forwards args['model'] to delegate_task().

- tests/tools/test_delegate.py
  - TestResolveModelOverride covers bare name, --provider flag, the
    --global strip-but-ignore guarantee, switch_model failures, and
    empty input.
  - TestDelegateTaskModelOverride covers top-level override, per-task
    > top-level > delegation config precedence, no-override falls
    through to delegation config, bad model names surface as JSON
    errors, and the full registry dispatch path.

All 82 delegate tests pass (67 existing + 10 new + 5 toolset scope).
The initial schema said only "supports optional --provider flag" without
showing a concrete example. When asked to route a subagent through a
different provider, the LLM reached for the intuitively natural
'provider:model' colon-prefix syntax (e.g. 'openrouter:stepfun/step-3.5-flash')
— but colons in hermes are reserved for OpenRouter variant suffixes
(:free, :extended, :thinking, :fast), so the colon-prefix form was passed
raw to the parent's provider and rejected as an Unknown Model.

Fix:
- Top-level and per-task 'model' field descriptions now show three
  concrete syntax forms: bare ID, short alias, and '--provider <slug>'
  with worked examples (stepfun/step-3.5-flash --provider openrouter,
  claude-opus-4-6 --provider anthropic, deepseek-chat --provider deepseek).
- Valid provider slugs are enumerated so the LLM doesn't have to guess.
- The colon-prefix anti-pattern is explicitly called out as DO NOT with
  an example, since LLMs gravitate toward it. This keeps delegate_task
  consistent with the existing /model slash command, which also uses
  --provider exclusively (see hermes_cli/model_switch.py:16-18).
- Main description MODEL SELECTION bullet updated with the same examples.
- New TestDelegateRequirements.test_schema_documents_provider_switch_syntax
  regression guard asserts the concrete --provider example and
  colon-prefix anti-pattern stay in the schema across future refactors.

Behaviour unchanged; this is a schema-description-only fix. All 83
delegate tests pass.
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/tools Tool registry, model_tools, toolsets tool/delegate Subagent delegation labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have tool/delegate Subagent delegation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: delegation model tiers + list_models tool (builds on #7586)

4 participants