doctor: surface active HERMES_TOOLS_SUBSET at boot (#75/#87 follow-up) by PowerCreek · Pull Request #96 · TechDevGroup/hermes-agent

PowerCreek · 2026-05-25T04:25:32Z

Summary

Companion to PR #95 (HERMES_DEFAULT_PROVIDER probe). Operators who narrow the tool surface via HERMES_TOOLS_SUBSET (#75/#87) now see at hermes doctor time exactly which tools the filter parsed to — catches two failure modes that previously required a separate hermes mcp list diff.

Behavior

Silent when env var is unset/empty/whitespace (silent-when-irrelevant pattern).
check_ok with the count + a sample of names (first 6, +N more suffix to keep the row readable). Operator confirms the filter parsed as expected.
check_info reminder when zero entries use the mcp_ prefix but some entries look structured (have an underscore). Most common failure mode: operator forgot the mcp_<server>_<tool> prefix for MCP tools, and silently filters nothing.

Out of scope (for this PR)

Cross-checking the parsed names against the live MCP tool registry would catch typos directly — but requires spinning up create_mcp_server() at probe time. Operators who want the cross-check can run hermes mcp list separately. Filing as opportunistic follow-up if demand surfaces.

Test plan

8 new tests pass (silent-when-unset/empty/whitespace, count+sample, long-list-truncated, mcp-prefix-reminder fires, doesn't fire when prefix present, doesn't fire on bare simple names)
30 total green across affected suites — no regression
After merge: HERMES_TOOLS_SUBSET=silo_query,confer_run hermes doctor shows row + reminder

Operators who narrow the tool surface via HERMES_TOOLS_SUBSET can now confirm at ``hermes doctor`` time exactly which tools the filter parsed to. Catches two failure modes that previously required a separate ``hermes mcp list`` diff: 1. Operator typoed a tool name → still in the parsed list (no cross-check), but the diff against ``hermes mcp list`` is now trivial. 2. Operator forgot the ``mcp_<server>_<tool>`` prefix for MCP tools → no entry uses ``mcp_`` prefix but entries look structured → info reminder fires. Silent when env var is unset/empty (silent-when-irrelevant pattern from the #88/#53/#54 doctor probes). When set, surfaces: * check_ok with the count + a sample of names (first 6, then ``+N more`` suffix to keep the row readable); * check_info reminder when zero entries use the mcp_ prefix but some look structured (the most common parse-correctly-but- filter-nothing failure mode). Cross-check against the live MCP registry was considered + rejected for this PR — it would require spinning up ``create_mcp_server()`` at probe time. Operators can ``hermes mcp list`` separately if they want the full diff. Filing as opportunistic follow-up if demand shows up. ## Tests - 8 new tests in tests/hermes_cli/test_doctor_tools_subset_probe.py: silent-when-unset / silent-when-empty / silent-when-whitespace / count-and-sample-shown / long-list-truncated / mcp-prefix-reminder / no-reminder-when-mcp-present / no-reminder-when-only-simple-bare- names. - 30 total green across affected suites (probe + provider-env-probe + mcp_subset_filter). No regression.

…vances #89 Direction A) (#98) Operator-supplied intent override (Option A3 from #97). When ``HERMES_INTENT_OVERRIDE=code`` is set, the system prompt's ``stable`` layer narrows for tool-call-heavy traffic — addresses #89's prompt-saturation symptom on mid-tier coding models. ## What narrows under code intent | Block | Action | Why | |---|---|---| | SOUL.md | Skip | Largest single contributor; falls back to short DEFAULT_AGENT_IDENTITY floor | | HERMES_AGENT_HELP_GUIDANCE | Skip | Off-topic for tool-call traffic | | SKILLS_GUIDANCE | Skip | Per-tool block, off-topic for code | | KANBAN_GUIDANCE | Skip | Worker-lifecycle, off-topic for code | | SESSION_SEARCH_GUIDANCE | Skip | Off-topic for code | | skills_prompt (the big one) | Skip | Biggest contributor when many skills loaded | | MEMORY_GUIDANCE | **Keep** | Small + sometimes useful even for code | | TOOL_USE_ENFORCEMENT_GUIDANCE | **Keep** | Critical for tool emission | | Per-model operational guidance | **Keep** | Model-quality-specific | | Env / platform hints | **Keep** | Execution-environment essentials | | nous-subscription + computer-use + alibaba | **Keep** | Operational invariants | | ``context`` + ``volatile`` layers | **Untouched** | Out of scope per #97 | Other intents (``confer`` / ``planning`` / ``exploration`` / ``refinement`` / ``generic``) are recognized as valid but pass through without narrowing in v1 (keeps the door open for per- intent shape later). ## Intent vocabulary Matches devagentic#240's ``intent_classifier`` 6-key enum exactly, so the same operator-side classifier that's wired into devagentic's R5 dispatch hook can also drive hermes-side prompt narrowing without a second vocabulary. ## Doctor probe New ``_check_intent_override_env`` probe surfaces the active override at ``hermes doctor`` time — silent when unset, check_ok when valid (with a narrowing-active note for ``code``), check_warn with the full valid-keys list when typo'd. Mirrors the silent- when-irrelevant pattern from PR #95 / #96. ## Tests - 22 new prompt-narrowing tests in ``tests/agent/test_system_prompt_intent_override.py``: resolver enum + normalization (5), per-section drops under code (7), pass-through for non-code intents (5), typo falls back (1), byte-count regression (1), default-still-includes counter-case (1), case-insensitive (1), runtime-vs-doctor-config sanity (1). - 6 new doctor-probe tests in ``tests/hermes_cli/test_doctor_intent_override_probe.py``: silent-when-unset / silent-when-empty / code-ok-with-narrowing-note / non-code-valid-pass-through / typo-warn-with-valid-sample / case-insensitive. - 258 total green across affected suites (system-prompt + prompt- builder + restore + doctor + provider-env + tools-subset). No regression in the existing prompt-shape pins. ## Composition note Option A1 (port classifier) + A2 (devagentic GraphQL surface) are deferred per the #97 sequencing — A3 unblocks deployment-specific narrowing immediately; A1/A2 only matter when dynamic per-turn classification is needed on the hermes side. The classifier output on the devagentic side (NousResearch#240) drives R5 dispatch decisions there.

#115) (#116) Companion to devagentic#315 (initiative preamble). When operator sets ``HERMES_TOOL_USE_ENFORCEMENT=required``, the chat_completions transport injects ``tool_choice: "required"`` on every dispatch where tools are attached — the model-layer enforcement that closes the gap devagentic#315's soft-signal preamble leaves open. ## Behavior - Unset / empty / unknown value → default behavior unchanged (no ``tool_choice`` injected by hermes) - ``HERMES_TOOL_USE_ENFORCEMENT=required`` + tools attached → ``tool_choice: "required"`` set on the API kwargs - Tools NOT attached → no injection (sending ``tool_choice=required`` with empty tools is a 400 on most providers) - Caller-supplied ``tool_choice`` already on kwargs → no override (the dispatcher-tier signal wins; env is a session-tier default) Per devagentic#203 §1.3 — hermes owns model-call-shape decisions (per-call enforcement). Devagentic's models.json ``default_tool_choice`` is the dispatcher-tier default; this env is the session-tier override. ## Where it fires Both build_kwargs paths in ``chat_completions.py``: - Legacy fallback path (unregistered providers) - Provider-profile path (known providers via providers/ registry) Shared helper ``_maybe_inject_required_tool_choice(api_kwargs, tools)`` keeps the two sites in sync. ## Doctor probe New ``_check_tool_use_enforcement_env`` surfaces the active setting — silent when unset, ``check_ok`` on ``required``, ``check_warn`` with valid-values hint on typos. Mirrors the silent-when-irrelevant pattern from #95 / #96 / persona-deferred. ## Tests - 18 new tests in tests/agent/test_tool_use_enforcement.py: resolver returns None/required/case-insensitive/unknown (8 parametrized), injection happy path (1), no-inject-when-unset (1), no-inject-when-no-tools (1 covering both None and empty list), does-not-clobber-existing-tool_choice (1), no-inject-on-unknown (1), doctor silent-when-unset (1), doctor check_ok on required (1), doctor check_warn on unknown (1). - 128 total green across affected suites (new + doctor + provider/ intent/persona/tools-subset probes). No regression. ## Sequencing per #115 body The issue says "Land after devagentic#315 Phase 1 has deployed + been observed. If the preamble alone closes the reliability gap to operator satisfaction, this issue may not need to ship." This PR ships the env-knob in opt-in OFF-by-default mode, so: - Operators can enable it the moment they observe NousResearch#315's preamble is insufficient (no further hermes-side dev cycle needed) - Default behavior unchanged → zero risk to non-client-tier sessions - Doctor probe surfaces the active state so operators can confirm enablement at boot Saves the round-trip of waiting + then dev'ing once the signal arrives.

After v0.18.4's tool_call recovery (#124) landed, the next-level bug surfaced in sandbox field-test: model calls a tool name hermes' worker didn't register, the invalid_tool_call retry path fires, but its verbose-only print is invisible in default runs. Combined with model hallucination ("the file has been created..." narration on the NEXT turn), the mismatch becomes invisible — operators see model narration, not the underlying tool-name mismatch. ## Fix Upgrade conversation_loop.py:3219's verbose-only print to: 1. ``logger.warning`` with the invented name + count + first 10 registered names + model + provider for cross-system log correlation 2. ``agent._emit_status`` surfacing the mismatch in the user- facing stream Operator immediately sees: - WHICH name the model invented - HOW MANY tools the worker has registered - WHICH tools (sample) ARE registered - Across which retry of 3 No behavior change — existing invalid_tool_call retry semantics unchanged. Pure observability boost. ## Tests - 3 new source-level tests in tests/agent/test_loud_invalid_tool_call.py: patch-landed, emit_status template includes name + count, warning includes model + provider for correlation. - 20 total green across affected suites — no regression. ## Composition Same observability family as the #95 / #96 doctor probes. Helps operators distinguish "hermes ate the tool_call" from "sandbox toolset doesn't expose what the model is calling".

PowerCreek merged commit 0a84b64 into main May 25, 2026

PowerCreek deleted the doctor-tools-subset-probe branch May 25, 2026 04:25

This was referenced May 25, 2026

Direction A scoping: per-intent system-prompt narrowing (#89 follow-up; composes with devagentic#237) #97

Closed

feat(system-prompt): HERMES_INTENT_OVERRIDE narrowing (closes #97, advances #89 Direction A) #98

Merged

PowerCreek mentioned this pull request May 27, 2026

diagnose: model calls write_file but tool not registered — narration covers up the mismatch #127

Closed

PowerCreek mentioned this pull request May 27, 2026

diagnose: tool_calls dispatched but side-effect doesn't fire — need entry/return WARNs in tool_executor #130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doctor: surface active HERMES_TOOLS_SUBSET at boot (#75/#87 follow-up)#96

doctor: surface active HERMES_TOOLS_SUBSET at boot (#75/#87 follow-up)#96
PowerCreek merged 1 commit into
mainfrom
doctor-tools-subset-probe

PowerCreek commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PowerCreek commented May 25, 2026

Summary

Behavior

Out of scope (for this PR)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant