feat(tools): add delegation model pool with per-call model/provider override#3794
Closed
HenkDz wants to merge 19 commits into
Closed
feat(tools): add delegation model pool with per-call model/provider override#3794HenkDz wants to merge 19 commits into
HenkDz wants to merge 19 commits into
Conversation
929dc6a to
765007b
Compare
4c96fdd to
8b4c473
Compare
- tests/gateway/test_hooks.py: filter out (builtin) hooks from count assertions in TestDiscoverAndLoad. The boot-md builtin hook was added but tests still expected 0/1/2 hooks instead of 1/2/3. - tests/test_plugins_cmd.py: test_none_falls_through_to_list was mocking cmd_list but the actual dispatch calls cmd_toggle (changed in PR NousResearch#3747 enable/disable feature).
…(salvage NousResearch#4400) (NousResearch#4419) Adds two Camofox features: 1. Persistent browser sessions: new `browser.camofox.managed_persistence` config option. When enabled, Hermes sends a deterministic profile-scoped userId to Camofox so the server maps it to a persistent browser profile directory. Cookies, logins, and browser state survive across restarts. Default remains ephemeral (random userId per session). 2. VNC URL discovery: Camofox /health endpoint returns vncPort when running in headed mode. Hermes constructs the VNC URL and includes it in navigate responses so the agent can share it with users. Also fixes camofox_vision bug where call_llm response object was passed directly to json.dumps instead of extracting .choices[0].message.content. Changes from original PR: - Removed browser_evaluate tool (separate feature, needs own PR) - Removed snapshot truncation limit change (unrelated) - Config.yaml only for managed_persistence (no env var, no version bump) - Rewrote tests to use config mock instead of env var - Reverted package-lock.json churn Co-authored-by: analista <psikonetik@gmail.com.com>
…ousResearch#4419) (NousResearch#4440) PR NousResearch#4419 was based on pre-credential-pools main where _config_version was 10. The squash merge downgraded it from 11 (set by NousResearch#2647) back to 10. Also fixes the test assertion.
By default 'hermes gateway run' now prints WARNING+ to stderr so connection errors and startup failures are visible in the terminal without having to tail ~/.hermes/logs/gateway.log. - gateway/run.py: start_gateway() accepts verbosity: Optional[int]=0. When not None, attaches a StreamHandler to stderr with level mapped from the count (0=WARNING, 1=INFO, 2+=DEBUG). Root logger level is also lowered when DEBUG is requested so records are not swallowed. - hermes_cli/gateway.py: run_gateway() gains verbose: int and quiet: bool params. -q translates to verbosity=None (no stderr handler). Wired through gateway_command(). - hermes_cli/main.py: -v changed from store_true to action=count so -v/-vv/-vvv each increment the level. -q/--quiet added as a new flag. Behaviour summary: hermes gateway run -> WARNING+ on stderr (default) hermes gateway run -q -> silent hermes gateway run -v -> INFO+ hermes gateway run -vv -> DEBUG
…mock - stderr handler now uses RedactingFormatter to match file handlers - restart path uses verbose=0 (int) instead of verbose=False (bool) - test mock updated with new run_gateway(verbose, quiet, replace) signature
The original PR excluded auth.json from _DEFAULT_EXPORT_EXCLUDE_ROOT and filtered both auth.json and .env from named profile exports, but missed adding .env to the default profile exclusion set. Default exports would still leak .env containing API keys. Added .env to _DEFAULT_EXPORT_EXCLUDE_ROOT, added test coverage, and updated the existing test that incorrectly asserted .env presence.
…inuity Allow callers to pass X-Hermes-Session-Id in request headers to continue an existing conversation. When provided, history is loaded from SessionDB instead of the request body, and the session_id is echoed in the response header. Without the header, existing behavior is preserved (new uuid per request). This enables web UI clients to maintain thread continuity without modifying any session state themselves — the same mechanism the gateway uses for IM platforms (Telegram, Discord, etc.).
Reuse a single SessionDB across requests by caching on self._session_db with lazy initialization. Avoids creating a new SQLite connection per request when X-Hermes-Session-Id is used. Updated tests to set adapter._session_db directly instead of patching the constructor.
…M calls Three exfiltration vectors closed: 1. Browser URL exfil — agent could embed secrets in URL params and navigate to attacker-controlled server. Now scans URLs for known API key patterns before navigating (browser_navigate, web_extract). 2. Browser snapshot leak — page displaying env vars or API keys would send secrets to auxiliary LLM via _extract_relevant_content before run_agent.py's redaction layer sees the result. Now redacts snapshot text before the auxiliary call. 3. Camofox annotation leak — accessibility tree text sent to vision LLM could contain secrets visible on screen. Now redacts annotation context before the vision call. 10 new tests covering URL blocking, snapshot redaction, and annotation redaction for both browser and camofox backends.
LLM responses from browser snapshot extraction and vision analysis could echo back secrets that appeared on screen or in page content. Input redaction alone is insufficient — the LLM may reproduce secrets it read from screenshots (which cannot be text-redacted). Now redact outputs from: - _extract_relevant_content (auxiliary LLM response) - browser_vision (vision LLM response) - camofox_vision (vision LLM response)
The original test file had mock secrets corrupted by secret-redaction tooling before commit — the test values (sk-ant...l012) didn't actually trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against values that never appeared in the input. - Replace truncated mock values with proper fake keys built via string concatenation (avoids tool redaction during file writes) - Add _ensure_redaction_enabled autouse fixture to patch the module-level _REDACT_ENABLED constant, matching the pattern from test_redact.py
…hes on restart (NousResearch#4481) * fix: force-close TCP sockets on client cleanup, detect and recover dead connections When a provider drops connections mid-stream (e.g. OpenRouter outage), httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These zombie connections accumulate and can prevent recovery without restarting. Changes: - _force_close_tcp_sockets: walks the httpx connection pool and issues socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket when a client is closed, preventing CLOSE-WAIT accumulation - _cleanup_dead_connections: probes the primary client's pool for dead sockets (recv MSG_PEEK), rebuilds the client if any are found - Pre-turn health check at the start of each run_conversation call that auto-recovers with a user-facing status message - Primary client rebuild after stale stream detection to purge pool - User-facing messages on streaming connection failures: "Connection to provider dropped — Reconnecting (attempt 2/3)" "Connection failed after 3 attempts — try again in a moment" Made-with: Cursor * fix: pool entry missing base_url for openrouter, clean error messages - _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback when pool entry has no runtime_base_url (pool entries from auth.json credential_pool often omit base_url) - Replace Rich console.print for auth errors with plain print() to prevent ANSI escape code mangling through prompt_toolkit's stdout patch - Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT accumulation after provider outages - Pre-turn dead connection detection with auto-recovery and user message - Primary client rebuild after stale stream detection - User-facing status messages on streaming connection failures/retries Made-with: Cursor * fix(gateway): persist memory flush state to prevent redundant re-flushes on restart The _session_expiry_watcher tracked flushed sessions in an in-memory set (_pre_flushed_sessions) that was lost on gateway restart. Expired sessions remained in sessions.json and were re-discovered every restart, causing redundant AIAgent runs that burned API credits and blocked the event loop. Fix: Add a memory_flushed boolean field to SessionEntry, persisted in sessions.json. The watcher sets it after a successful flush. On restart, the flag survives and the watcher skips already-flushed sessions. - Add memory_flushed field to SessionEntry with to_dict/from_dict support - Old sessions.json entries without the field default to False (backward compat) - Remove the ephemeral _pre_flushed_sessions set from SessionStore - Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior
) OpenAI's newer models (GPT-5, Codex) give stronger instruction-following weight to the 'developer' role vs 'system'. Swap the role at the API boundary in _build_api_kwargs() for the chat_completions path so internal message representation stays consistent ('system' everywhere). Applies regardless of provider — OpenRouter, Nous portal, direct, etc. The codex_responses path (direct OpenAI) uses 'instructions' instead of message roles, so it's unaffected. DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching model name substrings: ('gpt-5', 'codex').
efd598d to
070d12c
Compare
Contributor
Author
|
Closed in favor of #5229 — same feature (per-call model/provider override on delegate_task), clean rebase on latest main without the model pool concept. |
Contributor
Author
|
Superseded by #5229 — same feature (delegation model pool + per-call model/provider override), clean rebase on latest main. |
HenkDz
added a commit
to HenkDz/hermes-agent
that referenced
this pull request
Apr 5, 2026
…verride Add intelligent model routing for subagent delegation via a configurable model pool and per-call overrides. - delegation.pool: users define models with strengths, injected into delegate_task schema so the LLM picks the best model per task - model param: per-call or per-task (batch mode) model override - provider param: per-call provider override with full credential resolution - Pool validation: if a model is set but not in pool, falls back with warning - Dynamic schema: _build_delegate_schema() reads pool from config at load time Resolution order (highest to lowest): 1. Per-call model param (top-level, applies to all tasks) 2. Per-task model field (batch mode only) 3. Pool validation — if model set but not in pool, falls back with warning 4. delegation.model from config (fallback default) 5. Parent agent's model (inherited) Supersedes NousResearch#3794 (clean rebase on latest main).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add intelligent model routing for subagent delegation via a configurable model pool and per-call overrides.
Problem: Users with access to multiple models (e.g., via OpenRouter or Z.AI) had no way to control which model a subagent uses. The LLM always inherited the parent model, or required brittle manual overrides.
Solution:
Delegation model pool (
delegation.pool): Users define a list of models with strengths. The pool is injected into thedelegate_tasktool description, so the LLM reads model capabilities and picks the best fit automatically. Models outside the pool are rejected (cost control).Per-call model/provider override: The LLM can also explicitly set
modelandproviderparams ondelegate_taskfor one-off overrides.Subagent spawn logging: Logs which model/provider each subagent runs on for visibility.
Changes
tools/delegate_tool.pyhermes_cli/config.pypoolto DEFAULT_CONFIG delegation sectioncli-config.yaml.examplewebsite/docs/user-guide/configuration.mdConfig example
Backward compatibility
Tests
test_child_inherits_runtime_credentials— reads live user config, fails locally but passes on CI)ast.parse