feat(delegate): model tiers + list_models tool (builds on #7586)#7957
Open
malaiwah wants to merge 131 commits into
Open
feat(delegate): model tiers + list_models tool (builds on #7586)#7957malaiwah wants to merge 131 commits into
malaiwah wants to merge 131 commits into
Conversation
- Add DEFAULT_ALLOWED_TOOLSETS including 'mcp' to enable MCP tools for subagents - Make BLOCKED_TOOLSET_NAMES configurable (was hardcoded) - Subagents now inherit MCP access from parent when available - Fixes subagent limitation where only terminal+process were available - Allows subagents to use SearXNG and Crawl4AI MCP servers
- Reorder imports to top of file (E402) - Add noqa comment for registry import (circular import requirement) - Readd missing constants: MAX_DEPTH, MAX_CONCURRENT_CHILDREN, DEFAULT_MAX_ITERATIONS - All ruff checks now passing
- Remove 'memory' from BLOCKED_TOOLSET_NAMES in delegate_tool.py - Add 'memory' to DEFAULT_ALLOWED_TOOLSETS for subagent access - Add DEFAULT_SUBAGENT_MEMORY_MODE = 'read_only' configuration - Modify memory_tool() to accept is_subagent and subagent_memory_mode params - Enforce memory write blocking in read_only mode for subagents - Support three modes: 'read_only', 'full', 'none' - Add comprehensive tests for subagent memory access - Maintains backward compatibility with existing memory tests Benefits: ✅ Subagents can now query Honcho observations dialectically ✅ Subagents can read MEMORY.md and USER.md for context ✅ Subagents blocked from writing (prevents memory pollution) ✅ Parent agent remains sole writer of memory ✅ Enables orchestrator/coordinator pattern with long-lived subagents ✅ Configurable per-subagent or global default
- docker.py: remove --pids-limit (unavailable without cgroup delegation), add _cgroup_limits_available() probe for --cpus/--memory - delegate_tool.py: add "browser" to DEFAULT_ALLOWED_TOOLSETS - Dockerfile: build from source, add podman-remote shim, wait-for-honcho - docker/wait-for-honcho.sh: poll Honcho API before starting gateway
…tation) - delegate_tool.py: Set skip_memory=False and pass subagent_memory_mode to child agents - delegate_tool.py: Add _is_subagent=True flag for memory tool access control - run_agent.py: Pass is_subagent and subagent_memory_mode to memory_tool calls - memory_tool.py: Enforce read-only mode for subagents (blocks add/replace/remove) - memory_tool.py: Support three modes: 'read_only' (default), 'full', 'none' Benefits: ✅ Subagents can read MEMORY.md and USER.md for context ✅ Subagents can query Honcho observations dialectically ✅ Subagents blocked from memory writes (prevents pollution) ✅ Parent agent remains sole writer of memory ✅ Configurable per-subagent via delegation config ✅ Enables orchestrator/coordinator pattern with long-lived subagents Tests: ✅ test_memory_subagent_readonly.py (4/4 passed) ✅ test_mcp_subagent_access.py (4/4 passed) ✅ Existing memory tests (33/33 passed) ✅ Existing delegate tests (5/5 passed)
- Added _cleanup_orphaned_containers() function to gateway/run.py - Automatically removes exited/dead/created hermes-* containers on startup - Prevents container accumulation from crashes, OOM kills, or manual stops - Logs cleanup activity with INFO level for visibility - Added comprehensive documentation in docs/CONTAINER_CLEANUP.md - Includes manual cleanup commands and CLI design proposal Benefits: ✅ No more manual container cleanup needed ✅ Recovers gracefully from crashes ✅ Reduces disk space usage from stale containers ✅ Improves system hygiene automatically ✅ Safe - only removes non-running containers Manual cleanup (if needed before deploying): podman ps -a --filter 'name=^hermes-' --filter 'status=exited' -q | xargs -r podman rm -f
Remove GitHub Actions workflows (deploy-site, docker-publish, tests, nix, supply-chain-audit, docs-site-checks) — these are for upstream's GitHub CI. Add .gitea/workflows/build-push.yml: builds the container image and pushes to Gitea's container registry on every push to main.
- run_agent.py: Add _shared_memory_store parameter to AIAgent.__init__() - run_agent.py: Use shared memory store when provided (subagents) - run_agent.py: Add _is_subagent flag for access control - delegate_tool.py: Pass parent's _memory_store to subagents - Subagents now share parent's memory store (read-only enforced) - Memory writes still blocked for subagents via memory_tool.py enforcement Benefits: ✅ Subagents can READ from MEMORY.md and USER.md ✅ Subagents can query Honcho observations ✅ Subagents blocked from memory writes (add/replace/remove) ✅ Parent remains sole writer of memory ✅ Shared store prevents duplicate memory initialization ✅ Enables orchestrator pattern with long-lived subagents Technical: - Subagents share parent's MemoryStore instance - Read-only enforcement in memory_tool.py still active - No duplicate memory loading for subagents - Memory stays consistent across parent + subagents
delegate_tool.py passes subagent_memory_mode and _is_subagent to AIAgent() but AIAgent.__init__ did not declare subagent_memory_mode, causing an unexpected keyword argument crash on every delegate_task call. Added the parameter to the signature and stored both self._is_subagent and self.subagent_memory_mode as instance attributes (previously only accessed via getattr with defaults, never stored).
…ra_body config - terminal_tool.py: add docker_forward_env to container_config dict passed to _create_environment — it was read from config but never propagated, so docker exec calls were built with no -e flags and credentials were never forwarded into sandbox containers - run_agent.py: add subagent_memory_mode param to AIAgent.__init__ — delegate_tool was passing it but AIAgent didn't accept it, crashing every delegate_task call; also store self._is_subagent and self._config_extra_body; add model.extra_body config support so extra_body fields (e.g. enable_thinking: false) are merged into every API call - tools/environments/docker.py: log exact docker exec command at WARNING level (secrets masked) for debugging env forwarding - Dockerfile: add logging to podman-remote shim — appends full command to /opt/data/logs/shim.log for introspection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g to debug - gateway/run.py: move _cleanup_orphaned_containers() before main() so it is defined before it is called; call it before asyncio.run() so cleanup happens at startup, not after the gateway exits - gateway/run.py: replace hardcoded "podman" with find_docker() so the function respects the configured docker/shim executable - docker.py: downgrade exec command log from WARNING to DEBUG (too noisy for normal operation; shim.log already captures all podman-remote calls) - tools/: delete leftover delegate_tool.py.patch artifact Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows config.yaml to specify which Docker/Podman network sandbox containers join via `terminal.docker_network`. When set, passes `--network <name>` to docker run so containers can resolve hostnames of other services on that network (e.g. hermes-litellm on hermes-net). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When pricing/cost estimation calls fetch_endpoint_model_metadata without an api_key (e.g. from insights._get_pricing), the function made unauthenticated requests to the /models endpoint causing repeated 401 errors every 5 minutes (cache TTL). Now falls back to LITELLM_KEY env var so requests to the proxy are authenticated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docker_network was added to DockerEnvironment (docker.py) and terminal_tool.py but the config.yaml → TERMINAL_DOCKER_NETWORK env var bridge was missing in both code paths: - cli.py: used by `hermes chat`, `hermes model`, etc. - gateway/run.py: used by `hermes gateway run` Also add TERMINAL_DOCKER_NETWORK reading to _get_env_config() in terminal_tool.py, and add docker_network to the container_config dict that's built before calling _create_environment(). Without this, `docker_network: "hermes-net"` in config.yaml had no effect — sandbox containers were always created on the default podman network and could not resolve hermes-net hostnames like hermes-litellm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ge config and invalid workdir bypass - PR NousResearch#4350: Load config.yaml terminal block as fallback before hardcoded defaults - Fixes docker_image in config.yaml not being loaded - Adds cfg.get() fallbacks for all terminal config options - PR NousResearch#4673: Don't clobber already-resolved absolute TERMINAL_CWD - Fixes invalid workdir bypassing terminal.cwd config - Skips config override when env var already has absolute path
- Install gnupg in system dependencies - Enables GPG-signed emails from angelos-hermes@mailbox.org - Supports git commit signing with GPG keys - Rename Dockerfile to Containerfile (Podman convention) Signed-off-by: Angelos <angelos-hermes@mailbox.org>
Previous run NousResearch#46 failed due to transient network issue. This is a no-op commit to re-trigger the build pipeline. Related: feat: Add gnupg package for GPG email signing
Previous commit had double-escaped backslashes (\\) which broke the Dockerfile syntax. This fixes the RUN instructions to use proper single backslashes for line continuation. Fixes build failure in run NousResearch#47.
- GPG support moved to hermes-sandbox-image (where it belongs) - Sandbox image is the correct location for agent tooling - Reduces production container attack surface - Follows separation of concerns: * hermes-agent: Production service container * hermes-sandbox: Ephemeral agent execution environment Related: angelos/hermes-sandbox-image@4294f7f
…rdrail run_agent.py:3056 and test_agent_guardrails.py import MAX_CONCURRENT_CHILDREN by name from delegate_tool. PR NousResearch#16 renamed it to _DEFAULT_MAX_CONCURRENT_CHILDREN without adding the alias, breaking every API call.
…coded 3 run_agent.py:_cap_delegate_task_calls was importing the hardcoded MAX_CONCURRENT_CHILDREN (3) to pre-truncate delegate_task tool calls at the agent-loop level. This ran BEFORE delegate_task() itself, so the configurable delegation.max_concurrent_children in config.yaml was dead on arrival — the guardrail silently dropped calls to 3 before the runtime check could read the config. Now imports _get_max_concurrent_children() instead, which reads the same config.yaml / env var / default chain. Both enforcement layers respect the same configurable value.
…_max_concurrent_children() everywhere No backwards-compat alias needed — all three consumers updated: - run_agent.py (already done in previous commit) - tests/run_agent/test_agent_guardrails.py - tests/tools/test_delegate.py The only exported name is now _get_max_concurrent_children() which reads from config.yaml / env var / default.
The fork's execute() method uses threading, time, shlex, and is_interrupted but the upstream merge took upstream's import block which doesn't have them (upstream's _run_bash doesn't need them). Added: - import threading (line 903: Thread for output draining) - import time (line 905: monotonic deadline) - import shlex (line 789: shell quoting) - from tools.interrupt import is_interrupted (line 908: interrupt check) Verified via AST analysis — no other missing imports.
Two merge fallout fixes: 1. delegate_tool.py uses tool_error() 6 times but never imported it from tools.registry — every error path crashed with NameError. 2. test_batch_capped_at_3 expected the old silent-truncation behavior; updated to test the new error-on-excess behavior.
…d lifetime Two changes to how sandbox containers are spawned: 1. Add --init to docker run. This uses tini as PID 1, which automatically reaps zombie child processes. Previously PID 1 was sleep(1) which doesn't call wait() — every background process that exited became a zombie, and the process tool reported them as "running" because zombie PIDs still exist in the process table. Fixes NousResearch#6908 (upstream). 2. Replace 'sleep 2h' with 'sleep infinity'. The fixed 2-hour lifetime was arbitrary and sometimes too short for long agent sessions. The idle reaper (terminal.lifetime_seconds, default 300s, configurable via config.yaml) already handles cleanup based on last activity — there's no reason for the container itself to have a fixed death timer. With sleep infinity, the container lives until the idle reaper kills it or the task ends. Both changes are one line each. No config changes needed — the existing terminal.lifetime_seconds config controls idle timeout.
The upstream merge (PR NousResearch#14) auto-resolved gateway/run.py's 1598-line diff and silently dropped the entire self-nudge system (~55 lines in gateway/run.py, ~37 in run_agent.py, ~5 in model_tools.py). This broke notify_on_complete for background processes — the mechanism that fires a hidden turn when a background process exits was gone. Cherry-picked b8737bc ("feat(gateway): add one-shot self nudge tool") on top of the merged state, resolving 3 conflicts in gateway/run.py (media_message_callback + self_nudge_callback coexistence) and 1 in tests/test_model_tools.py (kept both clarify + self_nudge tests). The self-nudge system provides: - _arm_self_nudge / _cancel_self_nudge / _fire_self_nudge in gateway - self_nudge_callback on AIAgent for tool-initiated delayed turns - self_nudge tool exposed on gateway platforms (not CLI) - _pending_hidden_turns for injecting hidden messages on next turn - Wired to notify_on_complete in terminal_tool.py via process_registry.pending_watchers
gateway/run.py uses uuid.uuid4() at lines 1139 (restart-resume) and 1263 (self-nudge) but uuid was never imported at module level — upstream uses local 'import uuid as _uuid' inside functions, and our fork's additions used bare uuid without adding the import. Self-nudge fired but crashed with NameError: name 'uuid' is not defined.
The self-nudge path called _handle_message_with_agent directly without sending a typing indicator first. The user saw the agent silently process and respond with no 'typing...' feedback in Telegram.
_run_self_nudge_entry called _handle_message_with_agent but discarded its return value. When streaming doesn't deliver (typical for self-nudge turns since there's no user message to stream-edit), the agent's response vanished — the model did the work but the user never saw the result. Now captures the return and explicitly sends via adapter.send if streaming didn't already deliver it.
Adds logger.info when a turn is routed to the cheap model, showing the route label and first 80 chars of the user message. Added to both gateway and CLI paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an optional 'model' parameter to delegate_task so the agent can route subagents to a smaller/faster model for simple tasks (e.g. summarization, formatting, lookups) while keeping the primary model for complex reasoning. Works at both levels: - Top-level 'model' param for single-task delegation - Per-task 'model' field in batch tasks array The per-call model overrides delegation.model from config, which in turn overrides inheriting the parent's model. Per-task model takes precedence over top-level model in batch mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds tiered model selection for subagent delegation: - Agent can pass model='small'/'medium'/'large' or a direct model name - Tiers configured via delegation.model_tiers in config.yaml - New list_models tool returns available models with tier assignments Use cases: - Delegate file exploration to a small/fast model - Escalate to a large model when stuck on complex reasoning - Spin up a peer review subagent on a stronger model - Mixed batch: simple tasks on small, complex on default Precedence: per-task model > top-level model param > delegation.model config > inherit parent model. Tier names resolve to configured model names; unknown tiers fall back to default. Not yet upstreamed — local fork only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_models was only handled in the sequential dispatch path. The concurrent path (used when multiple tools are called in one turn) fell through to the default handler, causing the tool call to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_models was only in the 'delegation' toolset but composite toolsets (hermes-cli, hermes-telegram, etc.) list tools directly without including the delegation toolset. Added list_models alongside delegate_task in all composite toolsets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oint is_local_endpoint only matched IPs and localhost, missing Docker/Podman DNS names like hermes-litellm. This caused stale stream timeouts (180s) to fire on local LLM proxies instead of being auto-disabled. Two fixes: 1. model.local_endpoints config: list of hostnames to treat as local 2. DNS resolution fallback: resolve hostname to IP, check if private Not yet upstreamed — local fork only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hostnames without dots (e.g. hermes-litellm, ollama) are always on the local network — Docker/Podman DNS, mDNS, or /etc/hosts. No need to configure them explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds max_context_tokens guard to smart routing: if conversation context exceeds the threshold, stay on the primary model instead of routing to the cheap model. The cheap model is meant to be fast — sending it a large context defeats the purpose. Changes: - choose_cheap_model_route accepts context_tokens parameter - Gateway estimates context from cached agent's session_prompt_tokens or from history length (4 chars ≈ 1 token) - CLI estimates from conversation_history - Log line now includes context token count Config: smart_model_routing.max_context_tokens (default: 0 = disabled) Not yet upstreamed — local fork only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When trim_context is enabled and context exceeds max_context_tokens,
trim conversation history from the head (keep most recent messages)
and still route to the cheap model — instead of falling back to the
primary model entirely.
For a simple "thanks!" in a 48K-token session, only the last ~32K
tokens of history are sent to the cheap model. The model can still
respond appropriately with recent context.
Config:
smart_model_routing:
max_context_tokens: 32000
trim_context: true # default: false (safe — skip route entirely)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reassigning the `history` parameter inside the trim block made Python treat it as a local variable, causing UnboundLocalError on earlier reads. Use _trimmed_history + _effective_history instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…routing
Lets the parent agent (or the LLM via tool-calling) pick a model per
subagent invocation, using the same resolution pipeline as the /model
slash command: aliases, direct mappings, catalog search, and credential
resolution. Per-task model beats top-level model, which beats
delegation.model config, which falls back to inheriting the parent.
This unlocks cost/speed/capability routing for subagent-driven
development — e.g. dispatch a haiku for a trivial lookup, a sonnet for
a moderate refactor, and glm-4.7 for a bulk research task, all inside
a single delegate_task batch call.
Changes:
- tools/delegate_tool.py
- New _resolve_model_override() helper that wraps switch_model() and
returns a credential bundle compatible with _build_child_agent's
override_* params. Strips --global to ensure per-task overrides
never persist to config.yaml.
- delegate_task() gains an optional model= kwarg, threaded through
task normalization and the child-build loop so each subagent can
resolve credentials independently.
- DELEGATE_TASK_SCHEMA advertises the new model field at the top
level and inside each task object, with descriptions the LLM can
use to decide when to route to which model.
- Registry handler forwards args['model'] to delegate_task().
- tests/tools/test_delegate.py
- TestResolveModelOverride covers bare name, --provider flag, the
--global strip-but-ignore guarantee, switch_model failures, and
empty input.
- TestDelegateTaskModelOverride covers top-level override, per-task
> top-level > delegation config precedence, no-override falls
through to delegation config, bad model names surface as JSON
errors, and the full registry dispatch path.
All 82 delegate tests pass (67 existing + 10 new + 5 toolset scope).
The initial schema said only "supports optional --provider flag" without showing a concrete example. When asked to route a subagent through a different provider, the LLM reached for the intuitively natural 'provider:model' colon-prefix syntax (e.g. 'openrouter:stepfun/step-3.5-flash') — but colons in hermes are reserved for OpenRouter variant suffixes (:free, :extended, :thinking, :fast), so the colon-prefix form was passed raw to the parent's provider and rejected as an Unknown Model. Fix: - Top-level and per-task 'model' field descriptions now show three concrete syntax forms: bare ID, short alias, and '--provider <slug>' with worked examples (stepfun/step-3.5-flash --provider openrouter, claude-opus-4-6 --provider anthropic, deepseek-chat --provider deepseek). - Valid provider slugs are enumerated so the LLM doesn't have to guess. - The colon-prefix anti-pattern is explicitly called out as DO NOT with an example, since LLMs gravitate toward it. This keeps delegate_task consistent with the existing /model slash command, which also uses --provider exclusively (see hermes_cli/model_switch.py:16-18). - Main description MODEL SELECTION bullet updated with the same examples. - New TestDelegateRequirements.test_schema_documents_provider_switch_syntax regression guard asserts the concrete --provider example and colon-prefix anti-pattern stay in the schema across future refactors. Behaviour unchanged; this is a schema-description-only fix. All 83 delegate tests pass.
23 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on @Labhund's per-task model parameter (#7586) with two additions:
small/medium/largemapped to configured modelslist_modelstool — agent discovers available models and tiers at runtimeWhat This Adds (on top of #7586)
Model Tiers
The agent passes tier names instead of deployment-specific model IDs:
Tier names resolve via
_resolve_model_or_tier()before entering #7586's_resolve_model_override()pipeline. Unknown tiers fall back to the default model.list_modelsToolReturns available models with tier assignments, context lengths, and providers. Registered in the
delegationtoolset.{ "models": [ {"name": "qwen35-397b", "provider": "litellm", "context_length": 524288, "is_default": true}, {"name": "gemma4-nothink", "tier": "small", ...}, {"name": "claude-sonnet-4-6", "tier": "large", ...} ], "tiers": {"small": "gemma4-nothink", "large": "claude-sonnet-4-6"} }Why Tiers Matter
With #7586, the agent needs to know exact model names per deployment. Tiers abstract this: a skill or system prompt can say "use small for lookups" without knowing whether "small" is Gemini Flash, Gemma 4, or Haiku on this particular instance.
Credits
Foundation: @Labhund's per-task model parameter and
_resolve_model_override()from #7586. This PR cherry-picks those commits and adds the tier + discovery layer on top.Fixes #7929
Test plan
model="small"resolves to configured tier modellist_modelsreturns models with tier labelsmodel_tiersis unconfigured🤖 Generated with Claude Code