doctor: validate HERMES_DEFAULT_PROVIDER + HERMES_INFERENCE_PROVIDER env vars#95
Merged
Conversation
…env vars Surfaces typos in either env var at ``hermes doctor`` time instead of letting the worker silently fall through to ``auto`` mid-session. Closes a follow-up gap from #70 — the env var I shipped in PR #91 had no boot-time validation, so a typo like ``devagentic-locol`` would silently fail downstream in resolve_requested_provider's auto-detect fallthrough. ## Behavior - **Silent** when neither env var is set — most operators don't pin the provider via env, no row each run (silent-when-irrelevant pattern from the #88/#53 doctor probes). - **check_ok** when the env var matches a known provider name — surfaces deployment pins so operators can confirm they're live. - **check_warn** with a sample of valid names when the env var doesn't match. Diagnosed-not-blocked: a custom provider name set outside the registry's view is still legal, just noisy. Known-provider set is the union of: - ``providers.list_providers()`` (plugin-registered, e.g. devagentic-local) - ``hermes_cli.auth.PROVIDER_REGISTRY`` (built-in) - standard aliases (``openrouter`` / ``custom`` / ``auto`` / ``anthropic`` / ``openai``) Both env vars are checked independently — a partial mismatch (one valid, one typo) surfaces precisely. ## Tests - 8 new tests in tests/hermes_cli/test_doctor_provider_env_probe.py: silent-when-unset / silent-when-empty / known-name-ok / typo-warn- with-sample / both-env-vars-checked-independently / case-insens / plugin-registered-name-known / providers-import-failure-doesnt- crash. - 96 total green across affected suites (probe + doctor + acp probe + resolver). No regression. ## Composition - Pairs with PR #91 (HERMES_DEFAULT_PROVIDER env var) — closes the validation gap I left in that ship. - Follows the silent-when-irrelevant probe pattern from PR #53 (_check_acp_installation) and the diagnostic-loudness wave that established it.
This was referenced May 25, 2026
PowerCreek
added a commit
that referenced
this pull request
May 25, 2026
…vances #89 Direction A) (#98) Operator-supplied intent override (Option A3 from #97). When ``HERMES_INTENT_OVERRIDE=code`` is set, the system prompt's ``stable`` layer narrows for tool-call-heavy traffic — addresses #89's prompt-saturation symptom on mid-tier coding models. ## What narrows under code intent | Block | Action | Why | |---|---|---| | SOUL.md | Skip | Largest single contributor; falls back to short DEFAULT_AGENT_IDENTITY floor | | HERMES_AGENT_HELP_GUIDANCE | Skip | Off-topic for tool-call traffic | | SKILLS_GUIDANCE | Skip | Per-tool block, off-topic for code | | KANBAN_GUIDANCE | Skip | Worker-lifecycle, off-topic for code | | SESSION_SEARCH_GUIDANCE | Skip | Off-topic for code | | skills_prompt (the big one) | Skip | Biggest contributor when many skills loaded | | MEMORY_GUIDANCE | **Keep** | Small + sometimes useful even for code | | TOOL_USE_ENFORCEMENT_GUIDANCE | **Keep** | Critical for tool emission | | Per-model operational guidance | **Keep** | Model-quality-specific | | Env / platform hints | **Keep** | Execution-environment essentials | | nous-subscription + computer-use + alibaba | **Keep** | Operational invariants | | ``context`` + ``volatile`` layers | **Untouched** | Out of scope per #97 | Other intents (``confer`` / ``planning`` / ``exploration`` / ``refinement`` / ``generic``) are recognized as valid but pass through without narrowing in v1 (keeps the door open for per- intent shape later). ## Intent vocabulary Matches devagentic#240's ``intent_classifier`` 6-key enum exactly, so the same operator-side classifier that's wired into devagentic's R5 dispatch hook can also drive hermes-side prompt narrowing without a second vocabulary. ## Doctor probe New ``_check_intent_override_env`` probe surfaces the active override at ``hermes doctor`` time — silent when unset, check_ok when valid (with a narrowing-active note for ``code``), check_warn with the full valid-keys list when typo'd. Mirrors the silent- when-irrelevant pattern from PR #95 / #96. ## Tests - 22 new prompt-narrowing tests in ``tests/agent/test_system_prompt_intent_override.py``: resolver enum + normalization (5), per-section drops under code (7), pass-through for non-code intents (5), typo falls back (1), byte-count regression (1), default-still-includes counter-case (1), case-insensitive (1), runtime-vs-doctor-config sanity (1). - 6 new doctor-probe tests in ``tests/hermes_cli/test_doctor_intent_override_probe.py``: silent-when-unset / silent-when-empty / code-ok-with-narrowing-note / non-code-valid-pass-through / typo-warn-with-valid-sample / case-insensitive. - 258 total green across affected suites (system-prompt + prompt- builder + restore + doctor + provider-env + tools-subset). No regression in the existing prompt-shape pins. ## Composition note Option A1 (port classifier) + A2 (devagentic GraphQL surface) are deferred per the #97 sequencing — A3 unblocks deployment-specific narrowing immediately; A1/A2 only matter when dynamic per-turn classification is needed on the hermes side. The classifier output on the devagentic side (NousResearch#240) drives R5 dispatch decisions there.
This was referenced May 26, 2026
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
#115) (#116) Companion to devagentic#315 (initiative preamble). When operator sets ``HERMES_TOOL_USE_ENFORCEMENT=required``, the chat_completions transport injects ``tool_choice: "required"`` on every dispatch where tools are attached — the model-layer enforcement that closes the gap devagentic#315's soft-signal preamble leaves open. ## Behavior - Unset / empty / unknown value → default behavior unchanged (no ``tool_choice`` injected by hermes) - ``HERMES_TOOL_USE_ENFORCEMENT=required`` + tools attached → ``tool_choice: "required"`` set on the API kwargs - Tools NOT attached → no injection (sending ``tool_choice=required`` with empty tools is a 400 on most providers) - Caller-supplied ``tool_choice`` already on kwargs → no override (the dispatcher-tier signal wins; env is a session-tier default) Per devagentic#203 §1.3 — hermes owns model-call-shape decisions (per-call enforcement). Devagentic's models.json ``default_tool_choice`` is the dispatcher-tier default; this env is the session-tier override. ## Where it fires Both build_kwargs paths in ``chat_completions.py``: - Legacy fallback path (unregistered providers) - Provider-profile path (known providers via providers/ registry) Shared helper ``_maybe_inject_required_tool_choice(api_kwargs, tools)`` keeps the two sites in sync. ## Doctor probe New ``_check_tool_use_enforcement_env`` surfaces the active setting — silent when unset, ``check_ok`` on ``required``, ``check_warn`` with valid-values hint on typos. Mirrors the silent-when-irrelevant pattern from #95 / #96 / persona-deferred. ## Tests - 18 new tests in tests/agent/test_tool_use_enforcement.py: resolver returns None/required/case-insensitive/unknown (8 parametrized), injection happy path (1), no-inject-when-unset (1), no-inject-when-no-tools (1 covering both None and empty list), does-not-clobber-existing-tool_choice (1), no-inject-on-unknown (1), doctor silent-when-unset (1), doctor check_ok on required (1), doctor check_warn on unknown (1). - 128 total green across affected suites (new + doctor + provider/ intent/persona/tools-subset probes). No regression. ## Sequencing per #115 body The issue says "Land after devagentic#315 Phase 1 has deployed + been observed. If the preamble alone closes the reliability gap to operator satisfaction, this issue may not need to ship." This PR ships the env-knob in opt-in OFF-by-default mode, so: - Operators can enable it the moment they observe NousResearch#315's preamble is insufficient (no further hermes-side dev cycle needed) - Default behavior unchanged → zero risk to non-client-tier sessions - Doctor probe surfaces the active state so operators can confirm enablement at boot Saves the round-trip of waiting + then dev'ing once the signal arrives.
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
After v0.18.4's tool_call recovery (#124) landed, the next-level bug surfaced in sandbox field-test: model calls a tool name hermes' worker didn't register, the invalid_tool_call retry path fires, but its verbose-only print is invisible in default runs. Combined with model hallucination ("the file has been created..." narration on the NEXT turn), the mismatch becomes invisible — operators see model narration, not the underlying tool-name mismatch. ## Fix Upgrade conversation_loop.py:3219's verbose-only print to: 1. ``logger.warning`` with the invented name + count + first 10 registered names + model + provider for cross-system log correlation 2. ``agent._emit_status`` surfacing the mismatch in the user- facing stream Operator immediately sees: - WHICH name the model invented - HOW MANY tools the worker has registered - WHICH tools (sample) ARE registered - Across which retry of 3 No behavior change — existing invalid_tool_call retry semantics unchanged. Pure observability boost. ## Tests - 3 new source-level tests in tests/agent/test_loud_invalid_tool_call.py: patch-landed, emit_status template includes name + count, warning includes model + provider for correlation. - 20 total green across affected suites — no regression. ## Composition Same observability family as the #95 / #96 doctor probes. Helps operators distinguish "hermes ate the tool_call" from "sandbox toolset doesn't expose what the model is calling".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Synergy / self-audit ship: closes a validation gap I left in PR #91 (the
HERMES_DEFAULT_PROVIDERenv var). A typo likedevagentic-locolpreviously fell through to theautofloor silently — operators discovered the problem mid-session by debugging confusing downstream errors. This adds a boot-time check.Behavior
mcp_serve.list_tools()despite create_mcp_server() registration #88/doctor: ACP installation probe (silent on success, fail+issue on missing) #53/doctor: extend tool-unavailable hint with web toolset #54 doctor probes).check_okwhen the env var matches a known provider name — surfaces deployment pins so operators can confirm they're live and correct.check_warnwith a sample of valid names when the env var doesn't match. Diagnosed-not-blocked: a custom provider name set outside the registry's view is still legal; just noisy.Known-provider set is the union of:
providers.list_providers()— plugin-registered (devagentic-local, etc.)hermes_cli.auth.PROVIDER_REGISTRY— built-inopenrouter/custom/auto/anthropic/openai)Both env vars are checked independently — a partial mismatch (one valid, one typo) surfaces precisely.
Test plan
tests/hermes_cli/test_doctor_provider_env_probe.py: silent-when-unset / silent-when-empty / known-name-ok / typo-warn-with-sample / both-checked-independently / case-insensitive / plugin-registered-name-known / providers-import-failure-doesnt-crashHERMES_DEFAULT_PROVIDER=typo hermes doctorshows the warn rowComposition
_check_acp_installation) and the diagnostic-loudness wave that established it.