doctor: surface active HERMES_TOOLS_SUBSET at boot (#75/#87 follow-up)#96
Merged
Conversation
Operators who narrow the tool surface via HERMES_TOOLS_SUBSET can now confirm at ``hermes doctor`` time exactly which tools the filter parsed to. Catches two failure modes that previously required a separate ``hermes mcp list`` diff: 1. Operator typoed a tool name → still in the parsed list (no cross-check), but the diff against ``hermes mcp list`` is now trivial. 2. Operator forgot the ``mcp_<server>_<tool>`` prefix for MCP tools → no entry uses ``mcp_`` prefix but entries look structured → info reminder fires. Silent when env var is unset/empty (silent-when-irrelevant pattern from the #88/#53/#54 doctor probes). When set, surfaces: * check_ok with the count + a sample of names (first 6, then ``+N more`` suffix to keep the row readable); * check_info reminder when zero entries use the mcp_ prefix but some look structured (the most common parse-correctly-but- filter-nothing failure mode). Cross-check against the live MCP registry was considered + rejected for this PR — it would require spinning up ``create_mcp_server()`` at probe time. Operators can ``hermes mcp list`` separately if they want the full diff. Filing as opportunistic follow-up if demand shows up. ## Tests - 8 new tests in tests/hermes_cli/test_doctor_tools_subset_probe.py: silent-when-unset / silent-when-empty / silent-when-whitespace / count-and-sample-shown / long-list-truncated / mcp-prefix-reminder / no-reminder-when-mcp-present / no-reminder-when-only-simple-bare- names. - 30 total green across affected suites (probe + provider-env-probe + mcp_subset_filter). No regression.
PowerCreek
added a commit
that referenced
this pull request
May 25, 2026
…vances #89 Direction A) (#98) Operator-supplied intent override (Option A3 from #97). When ``HERMES_INTENT_OVERRIDE=code`` is set, the system prompt's ``stable`` layer narrows for tool-call-heavy traffic — addresses #89's prompt-saturation symptom on mid-tier coding models. ## What narrows under code intent | Block | Action | Why | |---|---|---| | SOUL.md | Skip | Largest single contributor; falls back to short DEFAULT_AGENT_IDENTITY floor | | HERMES_AGENT_HELP_GUIDANCE | Skip | Off-topic for tool-call traffic | | SKILLS_GUIDANCE | Skip | Per-tool block, off-topic for code | | KANBAN_GUIDANCE | Skip | Worker-lifecycle, off-topic for code | | SESSION_SEARCH_GUIDANCE | Skip | Off-topic for code | | skills_prompt (the big one) | Skip | Biggest contributor when many skills loaded | | MEMORY_GUIDANCE | **Keep** | Small + sometimes useful even for code | | TOOL_USE_ENFORCEMENT_GUIDANCE | **Keep** | Critical for tool emission | | Per-model operational guidance | **Keep** | Model-quality-specific | | Env / platform hints | **Keep** | Execution-environment essentials | | nous-subscription + computer-use + alibaba | **Keep** | Operational invariants | | ``context`` + ``volatile`` layers | **Untouched** | Out of scope per #97 | Other intents (``confer`` / ``planning`` / ``exploration`` / ``refinement`` / ``generic``) are recognized as valid but pass through without narrowing in v1 (keeps the door open for per- intent shape later). ## Intent vocabulary Matches devagentic#240's ``intent_classifier`` 6-key enum exactly, so the same operator-side classifier that's wired into devagentic's R5 dispatch hook can also drive hermes-side prompt narrowing without a second vocabulary. ## Doctor probe New ``_check_intent_override_env`` probe surfaces the active override at ``hermes doctor`` time — silent when unset, check_ok when valid (with a narrowing-active note for ``code``), check_warn with the full valid-keys list when typo'd. Mirrors the silent- when-irrelevant pattern from PR #95 / #96. ## Tests - 22 new prompt-narrowing tests in ``tests/agent/test_system_prompt_intent_override.py``: resolver enum + normalization (5), per-section drops under code (7), pass-through for non-code intents (5), typo falls back (1), byte-count regression (1), default-still-includes counter-case (1), case-insensitive (1), runtime-vs-doctor-config sanity (1). - 6 new doctor-probe tests in ``tests/hermes_cli/test_doctor_intent_override_probe.py``: silent-when-unset / silent-when-empty / code-ok-with-narrowing-note / non-code-valid-pass-through / typo-warn-with-valid-sample / case-insensitive. - 258 total green across affected suites (system-prompt + prompt- builder + restore + doctor + provider-env + tools-subset). No regression in the existing prompt-shape pins. ## Composition note Option A1 (port classifier) + A2 (devagentic GraphQL surface) are deferred per the #97 sequencing — A3 unblocks deployment-specific narrowing immediately; A1/A2 only matter when dynamic per-turn classification is needed on the hermes side. The classifier output on the devagentic side (NousResearch#240) drives R5 dispatch decisions there.
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
#115) (#116) Companion to devagentic#315 (initiative preamble). When operator sets ``HERMES_TOOL_USE_ENFORCEMENT=required``, the chat_completions transport injects ``tool_choice: "required"`` on every dispatch where tools are attached — the model-layer enforcement that closes the gap devagentic#315's soft-signal preamble leaves open. ## Behavior - Unset / empty / unknown value → default behavior unchanged (no ``tool_choice`` injected by hermes) - ``HERMES_TOOL_USE_ENFORCEMENT=required`` + tools attached → ``tool_choice: "required"`` set on the API kwargs - Tools NOT attached → no injection (sending ``tool_choice=required`` with empty tools is a 400 on most providers) - Caller-supplied ``tool_choice`` already on kwargs → no override (the dispatcher-tier signal wins; env is a session-tier default) Per devagentic#203 §1.3 — hermes owns model-call-shape decisions (per-call enforcement). Devagentic's models.json ``default_tool_choice`` is the dispatcher-tier default; this env is the session-tier override. ## Where it fires Both build_kwargs paths in ``chat_completions.py``: - Legacy fallback path (unregistered providers) - Provider-profile path (known providers via providers/ registry) Shared helper ``_maybe_inject_required_tool_choice(api_kwargs, tools)`` keeps the two sites in sync. ## Doctor probe New ``_check_tool_use_enforcement_env`` surfaces the active setting — silent when unset, ``check_ok`` on ``required``, ``check_warn`` with valid-values hint on typos. Mirrors the silent-when-irrelevant pattern from #95 / #96 / persona-deferred. ## Tests - 18 new tests in tests/agent/test_tool_use_enforcement.py: resolver returns None/required/case-insensitive/unknown (8 parametrized), injection happy path (1), no-inject-when-unset (1), no-inject-when-no-tools (1 covering both None and empty list), does-not-clobber-existing-tool_choice (1), no-inject-on-unknown (1), doctor silent-when-unset (1), doctor check_ok on required (1), doctor check_warn on unknown (1). - 128 total green across affected suites (new + doctor + provider/ intent/persona/tools-subset probes). No regression. ## Sequencing per #115 body The issue says "Land after devagentic#315 Phase 1 has deployed + been observed. If the preamble alone closes the reliability gap to operator satisfaction, this issue may not need to ship." This PR ships the env-knob in opt-in OFF-by-default mode, so: - Operators can enable it the moment they observe NousResearch#315's preamble is insufficient (no further hermes-side dev cycle needed) - Default behavior unchanged → zero risk to non-client-tier sessions - Doctor probe surfaces the active state so operators can confirm enablement at boot Saves the round-trip of waiting + then dev'ing once the signal arrives.
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
After v0.18.4's tool_call recovery (#124) landed, the next-level bug surfaced in sandbox field-test: model calls a tool name hermes' worker didn't register, the invalid_tool_call retry path fires, but its verbose-only print is invisible in default runs. Combined with model hallucination ("the file has been created..." narration on the NEXT turn), the mismatch becomes invisible — operators see model narration, not the underlying tool-name mismatch. ## Fix Upgrade conversation_loop.py:3219's verbose-only print to: 1. ``logger.warning`` with the invented name + count + first 10 registered names + model + provider for cross-system log correlation 2. ``agent._emit_status`` surfacing the mismatch in the user- facing stream Operator immediately sees: - WHICH name the model invented - HOW MANY tools the worker has registered - WHICH tools (sample) ARE registered - Across which retry of 3 No behavior change — existing invalid_tool_call retry semantics unchanged. Pure observability boost. ## Tests - 3 new source-level tests in tests/agent/test_loud_invalid_tool_call.py: patch-landed, emit_status template includes name + count, warning includes model + provider for correlation. - 20 total green across affected suites — no regression. ## Composition Same observability family as the #95 / #96 doctor probes. Helps operators distinguish "hermes ate the tool_call" from "sandbox toolset doesn't expose what the model is calling".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Companion to PR #95 (HERMES_DEFAULT_PROVIDER probe). Operators who narrow the tool surface via
HERMES_TOOLS_SUBSET(#75/#87) now see athermes doctortime exactly which tools the filter parsed to — catches two failure modes that previously required a separatehermes mcp listdiff.Behavior
check_okwith the count + a sample of names (first 6,+N moresuffix to keep the row readable). Operator confirms the filter parsed as expected.check_inforeminder when zero entries use themcp_prefix but some entries look structured (have an underscore). Most common failure mode: operator forgot themcp_<server>_<tool>prefix for MCP tools, and silently filters nothing.Out of scope (for this PR)
Cross-checking the parsed names against the live MCP tool registry would catch typos directly — but requires spinning up
create_mcp_server()at probe time. Operators who want the cross-check can runhermes mcp listseparately. Filing as opportunistic follow-up if demand surfaces.Test plan
HERMES_TOOLS_SUBSET=silo_query,confer_run hermes doctorshows row + reminder