Skip to content

feat(tools): add delegation model pool with per-call model/provider override#3794

Closed
HenkDz wants to merge 19 commits into
NousResearch:mainfrom
HenkDz:feat/delegate-model-provider
Closed

feat(tools): add delegation model pool with per-call model/provider override#3794
HenkDz wants to merge 19 commits into
NousResearch:mainfrom
HenkDz:feat/delegate-model-provider

Conversation

@HenkDz

@HenkDz HenkDz commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Add intelligent model routing for subagent delegation via a configurable model pool and per-call overrides.

Problem: Users with access to multiple models (e.g., via OpenRouter or Z.AI) had no way to control which model a subagent uses. The LLM always inherited the parent model, or required brittle manual overrides.

Solution:

  1. Delegation model pool (delegation.pool): Users define a list of models with strengths. The pool is injected into the delegate_task tool description, so the LLM reads model capabilities and picks the best fit automatically. Models outside the pool are rejected (cost control).

  2. Per-call model/provider override: The LLM can also explicitly set model and provider params on delegate_task for one-off overrides.

  3. Subagent spawn logging: Logs which model/provider each subagent runs on for visibility.

Changes

File Change
tools/delegate_tool.py Pool validation, dynamic schema injection, per-call model/provider params, spawn logging
hermes_cli/config.py Add pool to DEFAULT_CONFIG delegation section
cli-config.yaml.example Document pool configuration with examples
website/docs/user-guide/configuration.md Add "Delegation Model Pool" docs section

Config example

delegation:
  provider: zai
  model: glm-5-turbo           # fallback when no model picked
  pool:
    - model: glm-5.1
      strengths: coding, debugging, implementation
    - model: glm-5
      strengths: research, analysis, reasoning, math
    - model: glm-5-turbo
      strengths: general-purpose, writing, quick tasks
    # Cross-provider:
    # - model: deepseek-r1
    #   provider: openrouter
    #   strengths: math, formal proofs

Backward compatibility

  • No pool configured = no restrictions (all models allowed, same as before)
  • Per-call overrides are optional (default behavior unchanged)
  • No breaking changes to existing configs

Tests

  • 787 passed, 1 pre-existing failure (env-dependent test_child_inherits_runtime_credentials — reads live user config, fails locally but passes on CI)
  • All delegate credential resolution tests pass (18/18)
  • Syntax verified via ast.parse

HenkDz and others added 19 commits April 4, 2026 09:37
- tests/gateway/test_hooks.py: filter out (builtin) hooks from count
  assertions in TestDiscoverAndLoad. The boot-md builtin hook was
  added but tests still expected 0/1/2 hooks instead of 1/2/3.
- tests/test_plugins_cmd.py: test_none_falls_through_to_list was
  mocking cmd_list but the actual dispatch calls cmd_toggle (changed
  in PR NousResearch#3747 enable/disable feature).
…(salvage NousResearch#4400) (NousResearch#4419)

Adds two Camofox features:

1. Persistent browser sessions: new `browser.camofox.managed_persistence`
   config option. When enabled, Hermes sends a deterministic profile-scoped
   userId to Camofox so the server maps it to a persistent browser profile
   directory. Cookies, logins, and browser state survive across restarts.
   Default remains ephemeral (random userId per session).

2. VNC URL discovery: Camofox /health endpoint returns vncPort when running
   in headed mode. Hermes constructs the VNC URL and includes it in navigate
   responses so the agent can share it with users.

Also fixes camofox_vision bug where call_llm response object was passed
directly to json.dumps instead of extracting .choices[0].message.content.

Changes from original PR:
- Removed browser_evaluate tool (separate feature, needs own PR)
- Removed snapshot truncation limit change (unrelated)
- Config.yaml only for managed_persistence (no env var, no version bump)
- Rewrote tests to use config mock instead of env var
- Reverted package-lock.json churn

Co-authored-by: analista <psikonetik@gmail.com.com>
…ousResearch#4419) (NousResearch#4440)

PR NousResearch#4419 was based on pre-credential-pools main where _config_version was 10.
The squash merge downgraded it from 11 (set by NousResearch#2647) back to 10.
Also fixes the test assertion.
By default 'hermes gateway run' now prints WARNING+ to stderr so
connection errors and startup failures are visible in the terminal
without having to tail ~/.hermes/logs/gateway.log.

- gateway/run.py: start_gateway() accepts verbosity: Optional[int]=0.
  When not None, attaches a StreamHandler to stderr with level mapped
  from the count (0=WARNING, 1=INFO, 2+=DEBUG). Root logger level is
  also lowered when DEBUG is requested so records are not swallowed.

- hermes_cli/gateway.py: run_gateway() gains verbose: int and
  quiet: bool params. -q translates to verbosity=None (no stderr
  handler). Wired through gateway_command().

- hermes_cli/main.py: -v changed from store_true to action=count so
  -v/-vv/-vvv each increment the level. -q/--quiet added as a new flag.

Behaviour summary:
  hermes gateway run        -> WARNING+ on stderr (default)
  hermes gateway run -q     -> silent
  hermes gateway run -v     -> INFO+
  hermes gateway run -vv    -> DEBUG
…mock

- stderr handler now uses RedactingFormatter to match file handlers
- restart path uses verbose=0 (int) instead of verbose=False (bool)
- test mock updated with new run_gateway(verbose, quiet, replace) signature
The original PR excluded auth.json from _DEFAULT_EXPORT_EXCLUDE_ROOT and
filtered both auth.json and .env from named profile exports, but missed
adding .env to the default profile exclusion set. Default exports would
still leak .env containing API keys.

Added .env to _DEFAULT_EXPORT_EXCLUDE_ROOT, added test coverage, and
updated the existing test that incorrectly asserted .env presence.
…inuity

Allow callers to pass X-Hermes-Session-Id in request headers to continue
an existing conversation. When provided, history is loaded from SessionDB
instead of the request body, and the session_id is echoed in the response
header. Without the header, existing behavior is preserved (new uuid per
request).

This enables web UI clients to maintain thread continuity without modifying
any session state themselves — the same mechanism the gateway uses for IM
platforms (Telegram, Discord, etc.).
Reuse a single SessionDB across requests by caching on self._session_db
with lazy initialization. Avoids creating a new SQLite connection per
request when X-Hermes-Session-Id is used. Updated tests to set
adapter._session_db directly instead of patching the constructor.
…M calls

Three exfiltration vectors closed:

1. Browser URL exfil — agent could embed secrets in URL params and
   navigate to attacker-controlled server. Now scans URLs for known
   API key patterns before navigating (browser_navigate, web_extract).

2. Browser snapshot leak — page displaying env vars or API keys would
   send secrets to auxiliary LLM via _extract_relevant_content before
   run_agent.py's redaction layer sees the result. Now redacts snapshot
   text before the auxiliary call.

3. Camofox annotation leak — accessibility tree text sent to vision
   LLM could contain secrets visible on screen. Now redacts annotation
   context before the vision call.

10 new tests covering URL blocking, snapshot redaction, and annotation
redaction for both browser and camofox backends.
LLM responses from browser snapshot extraction and vision analysis
could echo back secrets that appeared on screen or in page content.
Input redaction alone is insufficient — the LLM may reproduce secrets
it read from screenshots (which cannot be text-redacted).

Now redact outputs from:
- _extract_relevant_content (auxiliary LLM response)
- browser_vision (vision LLM response)
- camofox_vision (vision LLM response)
The original test file had mock secrets corrupted by secret-redaction
tooling before commit — the test values (sk-ant...l012) didn't actually
trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against
values that never appeared in the input.

- Replace truncated mock values with proper fake keys built via string
  concatenation (avoids tool redaction during file writes)
- Add _ensure_redaction_enabled autouse fixture to patch the module-level
  _REDACT_ENABLED constant, matching the pattern from test_redact.py
…hes on restart (NousResearch#4481)

* fix: force-close TCP sockets on client cleanup, detect and recover dead connections

When a provider drops connections mid-stream (e.g. OpenRouter outage),
httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These
zombie connections accumulate and can prevent recovery without restarting.

Changes:
- _force_close_tcp_sockets: walks the httpx connection pool and issues
  socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket
  when a client is closed, preventing CLOSE-WAIT accumulation
- _cleanup_dead_connections: probes the primary client's pool for dead
  sockets (recv MSG_PEEK), rebuilds the client if any are found
- Pre-turn health check at the start of each run_conversation call that
  auto-recovers with a user-facing status message
- Primary client rebuild after stale stream detection to purge pool
- User-facing messages on streaming connection failures:
  "Connection to provider dropped — Reconnecting (attempt 2/3)"
  "Connection failed after 3 attempts — try again in a moment"

Made-with: Cursor

* fix: pool entry missing base_url for openrouter, clean error messages

- _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback
  when pool entry has no runtime_base_url (pool entries from auth.json
  credential_pool often omit base_url)
- Replace Rich console.print for auth errors with plain print() to
  prevent ANSI escape code mangling through prompt_toolkit's stdout patch
- Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT
  accumulation after provider outages
- Pre-turn dead connection detection with auto-recovery and user message
- Primary client rebuild after stale stream detection
- User-facing status messages on streaming connection failures/retries

Made-with: Cursor

* fix(gateway): persist memory flush state to prevent redundant re-flushes on restart

The _session_expiry_watcher tracked flushed sessions in an in-memory set
(_pre_flushed_sessions) that was lost on gateway restart. Expired sessions
remained in sessions.json and were re-discovered every restart, causing
redundant AIAgent runs that burned API credits and blocked the event loop.

Fix: Add a memory_flushed boolean field to SessionEntry, persisted in
sessions.json. The watcher sets it after a successful flush. On restart,
the flag survives and the watcher skips already-flushed sessions.

- Add memory_flushed field to SessionEntry with to_dict/from_dict support
- Old sessions.json entries without the field default to False (backward compat)
- Remove the ephemeral _pre_flushed_sessions set from SessionStore
- Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior
)

OpenAI's newer models (GPT-5, Codex) give stronger instruction-following
weight to the 'developer' role vs 'system'. Swap the role at the API
boundary in _build_api_kwargs() for the chat_completions path so internal
message representation stays consistent ('system' everywhere).

Applies regardless of provider — OpenRouter, Nous portal, direct, etc.
The codex_responses path (direct OpenAI) uses 'instructions' instead of
message roles, so it's unaffected.

DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching
model name substrings: ('gpt-5', 'codex').
@HenkDz HenkDz force-pushed the feat/delegate-model-provider branch from efd598d to 070d12c Compare April 4, 2026 08:38
@HenkDz HenkDz closed this Apr 5, 2026
@HenkDz HenkDz deleted the feat/delegate-model-provider branch April 5, 2026 08:35
@HenkDz

HenkDz commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

Closed in favor of #5229 — same feature (per-call model/provider override on delegate_task), clean rebase on latest main without the model pool concept.

@HenkDz

HenkDz commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #5229 — same feature (delegation model pool + per-call model/provider override), clean rebase on latest main.

HenkDz added a commit to HenkDz/hermes-agent that referenced this pull request Apr 5, 2026
…verride

Add intelligent model routing for subagent delegation via a configurable
model pool and per-call overrides.

- delegation.pool: users define models with strengths, injected into
  delegate_task schema so the LLM picks the best model per task
- model param: per-call or per-task (batch mode) model override
- provider param: per-call provider override with full credential resolution
- Pool validation: if a model is set but not in pool, falls back with warning
- Dynamic schema: _build_delegate_schema() reads pool from config at load time

Resolution order (highest to lowest):
  1. Per-call model param (top-level, applies to all tasks)
  2. Per-task model field (batch mode only)
  3. Pool validation — if model set but not in pool, falls back with warning
  4. delegation.model from config (fallback default)
  5. Parent agent's model (inherited)

Supersedes NousResearch#3794 (clean rebase on latest main).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants