Skip to content

doctor: validate HERMES_DEFAULT_PROVIDER + HERMES_INFERENCE_PROVIDER env vars#95

Merged
PowerCreek merged 1 commit into
mainfrom
doctor-provider-env-probe
May 25, 2026
Merged

doctor: validate HERMES_DEFAULT_PROVIDER + HERMES_INFERENCE_PROVIDER env vars#95
PowerCreek merged 1 commit into
mainfrom
doctor-provider-env-probe

Conversation

@PowerCreek

Copy link
Copy Markdown

Summary

Synergy / self-audit ship: closes a validation gap I left in PR #91 (the HERMES_DEFAULT_PROVIDER env var). A typo like devagentic-locol previously fell through to the auto floor silently — operators discovered the problem mid-session by debugging confusing downstream errors. This adds a boot-time check.

Behavior

Known-provider set is the union of:

  • providers.list_providers() — plugin-registered (devagentic-local, etc.)
  • hermes_cli.auth.PROVIDER_REGISTRY — built-in
  • Standard aliases (openrouter / custom / auto / anthropic / openai)

Both env vars are checked independently — a partial mismatch (one valid, one typo) surfaces precisely.

Test plan

  • 8 new tests in tests/hermes_cli/test_doctor_provider_env_probe.py: silent-when-unset / silent-when-empty / known-name-ok / typo-warn-with-sample / both-checked-independently / case-insensitive / plugin-registered-name-known / providers-import-failure-doesnt-crash
  • 96 total green across affected suites (probe + doctor + acp probe + resolver). No regression.
  • After merge: HERMES_DEFAULT_PROVIDER=typo hermes doctor shows the warn row

Composition

…env vars

Surfaces typos in either env var at ``hermes doctor`` time instead
of letting the worker silently fall through to ``auto`` mid-session.
Closes a follow-up gap from #70 — the env var I shipped in PR #91
had no boot-time validation, so a typo like ``devagentic-locol``
would silently fail downstream in resolve_requested_provider's
auto-detect fallthrough.

## Behavior

- **Silent** when neither env var is set — most operators don't pin
  the provider via env, no row each run (silent-when-irrelevant
  pattern from the #88/#53 doctor probes).
- **check_ok** when the env var matches a known provider name —
  surfaces deployment pins so operators can confirm they're live.
- **check_warn** with a sample of valid names when the env var
  doesn't match. Diagnosed-not-blocked: a custom provider name set
  outside the registry's view is still legal, just noisy.

Known-provider set is the union of:
- ``providers.list_providers()`` (plugin-registered, e.g.
  devagentic-local)
- ``hermes_cli.auth.PROVIDER_REGISTRY`` (built-in)
- standard aliases (``openrouter`` / ``custom`` / ``auto`` /
  ``anthropic`` / ``openai``)

Both env vars are checked independently — a partial mismatch (one
valid, one typo) surfaces precisely.

## Tests

- 8 new tests in tests/hermes_cli/test_doctor_provider_env_probe.py:
  silent-when-unset / silent-when-empty / known-name-ok / typo-warn-
  with-sample / both-env-vars-checked-independently / case-insens
  / plugin-registered-name-known / providers-import-failure-doesnt-
  crash.
- 96 total green across affected suites (probe + doctor + acp probe
  + resolver). No regression.

## Composition

- Pairs with PR #91 (HERMES_DEFAULT_PROVIDER env var) — closes the
  validation gap I left in that ship.
- Follows the silent-when-irrelevant probe pattern from PR #53
  (_check_acp_installation) and the diagnostic-loudness wave that
  established it.
@PowerCreek PowerCreek merged commit 9a9c25d into main May 25, 2026
@PowerCreek PowerCreek deleted the doctor-provider-env-probe branch May 25, 2026 03:52
PowerCreek added a commit that referenced this pull request May 25, 2026
…vances #89 Direction A) (#98)

Operator-supplied intent override (Option A3 from #97). When
``HERMES_INTENT_OVERRIDE=code`` is set, the system prompt's
``stable`` layer narrows for tool-call-heavy traffic — addresses
#89's prompt-saturation symptom on mid-tier coding models.

## What narrows under code intent

| Block | Action | Why |
|---|---|---|
| SOUL.md | Skip | Largest single contributor; falls back to short DEFAULT_AGENT_IDENTITY floor |
| HERMES_AGENT_HELP_GUIDANCE | Skip | Off-topic for tool-call traffic |
| SKILLS_GUIDANCE | Skip | Per-tool block, off-topic for code |
| KANBAN_GUIDANCE | Skip | Worker-lifecycle, off-topic for code |
| SESSION_SEARCH_GUIDANCE | Skip | Off-topic for code |
| skills_prompt (the big one) | Skip | Biggest contributor when many skills loaded |
| MEMORY_GUIDANCE | **Keep** | Small + sometimes useful even for code |
| TOOL_USE_ENFORCEMENT_GUIDANCE | **Keep** | Critical for tool emission |
| Per-model operational guidance | **Keep** | Model-quality-specific |
| Env / platform hints | **Keep** | Execution-environment essentials |
| nous-subscription + computer-use + alibaba | **Keep** | Operational invariants |
| ``context`` + ``volatile`` layers | **Untouched** | Out of scope per #97 |

Other intents (``confer`` / ``planning`` / ``exploration`` /
``refinement`` / ``generic``) are recognized as valid but pass
through without narrowing in v1 (keeps the door open for per-
intent shape later).

## Intent vocabulary

Matches devagentic#240's ``intent_classifier`` 6-key enum exactly,
so the same operator-side classifier that's wired into devagentic's
R5 dispatch hook can also drive hermes-side prompt narrowing
without a second vocabulary.

## Doctor probe

New ``_check_intent_override_env`` probe surfaces the active
override at ``hermes doctor`` time — silent when unset, check_ok
when valid (with a narrowing-active note for ``code``), check_warn
with the full valid-keys list when typo'd. Mirrors the silent-
when-irrelevant pattern from PR #95 / #96.

## Tests

- 22 new prompt-narrowing tests in
  ``tests/agent/test_system_prompt_intent_override.py``: resolver
  enum + normalization (5), per-section drops under code (7),
  pass-through for non-code intents (5), typo falls back (1),
  byte-count regression (1), default-still-includes counter-case (1),
  case-insensitive (1), runtime-vs-doctor-config sanity (1).
- 6 new doctor-probe tests in
  ``tests/hermes_cli/test_doctor_intent_override_probe.py``:
  silent-when-unset / silent-when-empty / code-ok-with-narrowing-note /
  non-code-valid-pass-through / typo-warn-with-valid-sample /
  case-insensitive.
- 258 total green across affected suites (system-prompt + prompt-
  builder + restore + doctor + provider-env + tools-subset). No
  regression in the existing prompt-shape pins.

## Composition note

Option A1 (port classifier) + A2 (devagentic GraphQL surface) are
deferred per the #97 sequencing — A3 unblocks deployment-specific
narrowing immediately; A1/A2 only matter when dynamic per-turn
classification is needed on the hermes side. The classifier output
on the devagentic side (NousResearch#240) drives R5 dispatch decisions there.
PowerCreek added a commit that referenced this pull request May 27, 2026
#115) (#116)

Companion to devagentic#315 (initiative preamble). When operator
sets ``HERMES_TOOL_USE_ENFORCEMENT=required``, the chat_completions
transport injects ``tool_choice: "required"`` on every dispatch
where tools are attached — the model-layer enforcement that closes
the gap devagentic#315's soft-signal preamble leaves open.

## Behavior

- Unset / empty / unknown value → default behavior unchanged (no
  ``tool_choice`` injected by hermes)
- ``HERMES_TOOL_USE_ENFORCEMENT=required`` + tools attached →
  ``tool_choice: "required"`` set on the API kwargs
- Tools NOT attached → no injection (sending ``tool_choice=required``
  with empty tools is a 400 on most providers)
- Caller-supplied ``tool_choice`` already on kwargs → no override
  (the dispatcher-tier signal wins; env is a session-tier default)

Per devagentic#203 §1.3 — hermes owns model-call-shape decisions
(per-call enforcement). Devagentic's models.json
``default_tool_choice`` is the dispatcher-tier default; this env is
the session-tier override.

## Where it fires

Both build_kwargs paths in ``chat_completions.py``:
- Legacy fallback path (unregistered providers)
- Provider-profile path (known providers via providers/ registry)

Shared helper ``_maybe_inject_required_tool_choice(api_kwargs,
tools)`` keeps the two sites in sync.

## Doctor probe

New ``_check_tool_use_enforcement_env`` surfaces the active setting
— silent when unset, ``check_ok`` on ``required``, ``check_warn``
with valid-values hint on typos. Mirrors the silent-when-irrelevant
pattern from #95 / #96 / persona-deferred.

## Tests

- 18 new tests in tests/agent/test_tool_use_enforcement.py:
  resolver returns None/required/case-insensitive/unknown (8
  parametrized), injection happy path (1), no-inject-when-unset (1),
  no-inject-when-no-tools (1 covering both None and empty list),
  does-not-clobber-existing-tool_choice (1), no-inject-on-unknown
  (1), doctor silent-when-unset (1), doctor check_ok on required
  (1), doctor check_warn on unknown (1).
- 128 total green across affected suites (new + doctor + provider/
  intent/persona/tools-subset probes). No regression.

## Sequencing per #115 body

The issue says "Land after devagentic#315 Phase 1 has deployed +
been observed. If the preamble alone closes the reliability gap to
operator satisfaction, this issue may not need to ship."

This PR ships the env-knob in opt-in OFF-by-default mode, so:
- Operators can enable it the moment they observe NousResearch#315's preamble
  is insufficient (no further hermes-side dev cycle needed)
- Default behavior unchanged → zero risk to non-client-tier sessions
- Doctor probe surfaces the active state so operators can confirm
  enablement at boot

Saves the round-trip of waiting + then dev'ing once the signal
arrives.
PowerCreek added a commit that referenced this pull request May 27, 2026
After v0.18.4's tool_call recovery (#124) landed, the next-level
bug surfaced in sandbox field-test: model calls a tool name
hermes' worker didn't register, the invalid_tool_call retry path
fires, but its verbose-only print is invisible in default runs.
Combined with model hallucination ("the file has been created..."
narration on the NEXT turn), the mismatch becomes invisible —
operators see model narration, not the underlying tool-name
mismatch.

## Fix

Upgrade conversation_loop.py:3219's verbose-only print to:

1. ``logger.warning`` with the invented name + count + first 10
   registered names + model + provider for cross-system log
   correlation
2. ``agent._emit_status`` surfacing the mismatch in the user-
   facing stream

Operator immediately sees:
- WHICH name the model invented
- HOW MANY tools the worker has registered
- WHICH tools (sample) ARE registered
- Across which retry of 3

No behavior change — existing invalid_tool_call retry semantics
unchanged. Pure observability boost.

## Tests

- 3 new source-level tests in
  tests/agent/test_loud_invalid_tool_call.py: patch-landed,
  emit_status template includes name + count, warning includes
  model + provider for correlation.
- 20 total green across affected suites — no regression.

## Composition

Same observability family as the #95 / #96 doctor probes. Helps
operators distinguish "hermes ate the tool_call" from "sandbox
toolset doesn't expose what the model is calling".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant