diagnose: raw HTTP body capture before SDK parse (refs #138) by PowerCreek · Pull Request #138 · TechDevGroup/hermes-agent

PowerCreek · 2026-05-27T06:51:10Z

Monkey-patches SyncAPIClient._process_response_data and AsyncAPIClient._process_response_data (openai SDK 2.24.0) to capture the raw response dict BEFORE Pydantic validation strips fields. Logs per-choice summary + first 4KB of raw body via stderr-direct [hermes-diag] sdk RAW lines. Single install point covers all OpenAI() construction sites without modifying call signatures. Isolates server-bug vs SDK-bug for the codestral tool_calls drop. 39 tests pass.

v0.18.8 evidence (gist ebded1633ae3) confirmed the OpenAI SDK 2.24.0 drops tool_calls during parse: devagentic wire has tool_calls_count=2 finish_reason=tool_calls; hermes' post-parse view shows sdk_tool_calls=0 finish_reason=stop. model_dump() can't recover them either — the SDK has already filtered. To isolate **server-bug vs SDK-bug**, we need the raw HTTP body BEFORE the SDK parses it. ## Approach Monkey-patch ``SyncAPIClient._process_response_data`` and ``AsyncAPIClient._process_response_data`` (from ``openai._base_client``). These are the choke points where every parsed response funnels through. The patch: 1. Inspects the raw ``data`` dict (parsed from JSON, but PRE-Pydantic-validation — so dropped fields are still there) 2. For ChatCompletion responses specifically: - Logs per-choice summary: ``finish_reason``, ``content_len``, ``tool_calls_count`` directly from the raw dict - Dumps the first 4KB of the raw JSON body 3. Delegates to the original ``_process_response_data`` Idempotent. Fail-soft on import / attr errors. Installed at ``chat_completions.py`` module import — runs once per Python process; covers all 15+ OpenAI() construction sites in auxiliary_client.py + the main agent path without modifying any call signature. ## Pattern key Operator captures stderr from sandbox repro + greps for ``[hermes-diag] sdk RAW``. Compare with the existing 4 ``[hermes-diag]`` lines: - **Server-bug**: ``sdk RAW pre-parse: choice[0]: ... tool_calls_count=0`` → devagentic isn't actually sending the tool_calls on the wire (its server-side log is misleading). Bug is server-side. - **SDK-bug**: ``sdk RAW pre-parse: choice[0]: ... tool_calls_count=2`` followed by ``normalize_response entry: sdk_tool_calls=0`` → wire HAS tool_calls but SDK validation drops them at parse. Bug is SDK schema enforcement (likely a required field absent in Stainless-generated types). Once we know which, the fix path is clear: - Server-bug → devagentic-side - SDK-bug → hermes uses ``with_raw_response`` + bypass parse, or patches the SDK's response model schema, or downgrades openai to a permissive version ## Tests - 39 tests pass across affected suites (chat_completions / cascade_exhausted / finish_reason_tools / tool_call_type). - Module-import verification confirms the patch installs + ``_diag`` writes the install marker to stderr.

#141) v0.18.9 (PR #138) installed the OpenAI SDK monkey-patch at module import + emitted 4 ``[hermes-diag]`` stderr lines per dispatch + a 4KB raw-body dump. Field-test confirmed this floods interactive sandbox tmux panes. Operator filed #140: gate behind env, default OFF, opt-in for debug. ## Fix New ``HERMES_DIAG_RAW_CAPTURE`` env var (truthy values: ``1``/``true``/``yes``/``on``). When unset/falsy: - ``_diag()`` no-ops silently (was: writes to stderr unconditionally) - ``_install_sdk_raw_capture()`` is NOT called at module import (was: monkey-patches SyncAPIClient + AsyncAPIClient on every process) - ``conv-loop empty-check`` stderr write in conversation_loop.py is short-circuited via ``_diag_enabled()`` import When env is truthy: full v0.18.9 diagnostic behavior — 4 lines per response + SDK raw body capture. Operators opt-in for debug: ``HERMES_DIAG_RAW_CAPTURE=1 hermes ...``. ## Stderr routing confirmed User's secondary ask: confirm ``_diag()`` routes to stderr. Verified — uses ``sys.stderr.write()`` directly (not ``print()`` which defaults to stdout). The "stdout leak" appearance in the field is likely tmux pane interleaving stderr + stdout in the same window. Regression-pinned by ``test_diag_stderr_routing_distinct_from_stdout``. ## Tests - 10 new tests in tests/agent/test_diag_env_gate.py: * 4 ``_diag_enabled`` resolver: unset/empty silent; 5 truthy values parametrized; 5 falsy/unknown parametrized * 3 ``_diag`` behavior: silent-when-off; writes-stderr-when-on; distinct-stdout-vs-stderr (using direct sys.stderr capture) * 2 SDK install-gate source-level: gate present at install call-site; install marker only logged via _diag (which is itself gated) - 47 total green across affected suites — no regression on cascade-exhausted, tool_call_type, or finish_reason_tools. ## Composition v0.18.5 → v0.18.10 diagnostic-funnel cleanup. The diagnostics-as-defense-in-depth premise (#133 close note) stays intact, just no longer fires by default. Operators retain ``HERMES_DIAG_RAW_CAPTURE=1`` as the debug knob.

…143) (#145) T3 of the #143 thin-client refactor scope. When the active provider is ``devagentic-local``, augment ``disabled_toolsets`` with ``"clarify"`` before the ``get_tool_definitions`` call. ## Rationale Per devagentic#203 §1.3 + the #143 scope: devagentic-side intent classifier knows when clarification is actually needed and can surface it via an OpenAI-shaped assistant message. Hermes' modal TUI clarify-tool was a layered opinion fighting devagentic-side classification — both rotation's debug evidence + sandbox UX showed the dual-source as confusing (clarify modal popped even with --yolo, ignoring devagentic-side intent signals). This is the smallest of the T1-T3 sequence and the cleanest revert path — purely a tool-registry adjustment for one provider. ## Behavior | Setting | Before | After | |---|---|---| | provider=devagentic-local, no --enable-toolset | clarify enabled | clarify implicitly disabled | | provider=devagentic-local, --enable-toolset clarify | clarify enabled | clarify enabled (explicit override) | | provider=other | unchanged | unchanged | A boot-line print informs operators of the implicit disable + how to re-enable for legacy workflows. Composes naturally with the existing ``HERMES_TOOLS_SUBSET`` narrowing (#75/#87) — disable happens first, then subset narrows further if set. ## Tests - 4 source-level tests in ``tests/agent/test_t3_clarify_default_out.py``: patch-landed, explicit-enable-overrides-implicit-disable, re-enable hint visible in print message, strict-equality on provider name (no prefix/alias matching to avoid surprise on related providers). - 21 total green across affected suites (T3 + diag-env-gate). ## Composition Per #143 sequencing (T3 → T1 → T2-gated → T2-default-flip): - This PR: T3 (clarify default-out) - Next: T1 (HERMES_DEFER_PERSONA default-flip for devagentic-local) - Then: T2 (empty-content recovery removal, env-gated then default) - Later: T4-T6 (tool list / iteration cap / summary fallback) ## Preserved through the refactor - PR #119 (cascade_exhausted short-circuit) — hermes correctly deferring to devagentic; NOT recovery - PR #122/#125 (raw tool_calls fallback) — pre-recovery wire parsing; belt-and-suspenders against future streaming-chunker regressions - PR #131/#136/#138/#141 diagnostics — env-gated via HERMES_DIAG_RAW_CAPTURE; no-op when off

…MPTY_RECOVERY (refs #143) (#147) T2 of the #143 thin-client refactor scope. Devagentic-side cascade (NousResearch#324) + runaway detector (NousResearch#345-348) + exec-terminus (NousResearch#349-354) now cover the empty-content recovery layer with full intent / role / dispatch-trace context. Hermes-side recovery layered on top caused 3×4 dispatch stacking (#118) + mode confusion + invisible swallow points (#133 debug funnel). This patch default-flips: the four hermes-side recovery paths are short-circuited unless ``HERMES_LEGACY_EMPTY_RECOVERY`` is set to a truthy value. Legacy users opt-in to keep the pre-T2 behavior. ## Gated paths (all skipped when env unset) | Path | Source | Loc | |---|---|---| | ``_finish_wants_tools`` synthetic recovery | PR #108 / #99 | line ~3618 | | ``_post_tool_empty_retried`` nudge | pre-existing NousResearch#9400-class | line ~3749 | | ``_structural_empty`` synthetic recovery | PR #69 / #67 | line ~3880 | | 3-retry empty-content loop | pre-#67 | line ~3927 | When env unset (default), empty responses fall through to either the fallback-chain provider switch (if configured) or the clean ``(empty)`` terminal with ``_empty_terminal_sentinel=True``. ## Preserved through T2 (verified by test_t2_legacy_empty_recovery_gate.py) - **PR #119 cascade_exhausted short-circuit** — hermes correctly deferring to devagentic's sentinel; NOT recovery. Source-level test asserts the ``if _cascade_err:`` block is NOT prefixed by ``_legacy_recovery_on``. - **PR #122/#125 raw tool_calls fallback** — pre-recovery wire parsing in transports/chat_completions.py. Source-level test asserts the helper name doesn't appear in conversation_loop (lives elsewhere; untouched). - **PR #131/#136/#138/#141 diagnostics** — env-gated via ``HERMES_DIAG_RAW_CAPTURE`` (#140), independent of this env. ## Operator deploy Default behavior changes: empty responses surface cleanly (no synthetic re-prompt). To preserve pre-T2 behavior: ```bash export HERMES_LEGACY_EMPTY_RECOVERY=1 ``` The legacy escape hatch is intended as a temporary safety net while operators validate the thin-client architecture. Once the devagentic-side cascade is universally deployed + observed to cover all empty-content cases, the legacy gate can be removed in a follow-up (the env var stays as a no-op for backward compat). ## Tests - 23 new tests in ``tests/agent/test_t2_legacy_empty_recovery_gate.py``: resolver default-false / empty-false / 6 truthy / 6 falsy-or- unknown; source-level gate assertions for each of the four recovery branches; preserved-path assertions for cascade_exhausted + raw tool_calls fallback location; default-off + opt-in resolver round-trip. - 163 total green across affected suites (T2 + existing empty- terminal mirror tests + finish_reason_tools_recovery + cascade_exhausted + internal_marker_stripping + tool_call_type_default + tool_use_enforcement + T3 + T1 persona + doctor persona probe). ## Composition Per #143 sequencing: - T3 / PR #145 (clarify default-out) — merged ✓ - T1 / PR #146 (persona default-flip) — merged ✓ - **T2 / this PR (empty-content recovery removal, env-gated)** - Later: T4-T6 (tool list / iteration cap / summary fallback) The legacy escape hatch design choice (vs. full removal) lets operators roll back per-deployment if a previously-recovered edge case surfaces in the field. Diagnostic from #140 (HERMES_DIAG_RAW_CAPTURE) remains the observability tool to spot any uncovered case.

PowerCreek merged commit ea09c87 into main May 27, 2026

PowerCreek deleted the sdk-raw-response-capture-138 branch May 27, 2026 06:51

PowerCreek mentioned this pull request May 27, 2026

chore: bump pyproject.toml + uv.lock to 0.18.9 #139

Merged

PowerCreek added a commit that referenced this pull request May 27, 2026

chore: bump pyproject.toml + uv.lock to 0.18.9 (#138 release) (#139)

ad1a5b4

PowerCreek mentioned this pull request May 27, 2026

conversation_loop empty-content gate ignores tool_calls — recovery loop on codestral content_len=0 + tool_calls_count>=1 responses #133

Closed

PowerCreek mentioned this pull request May 28, 2026

refactor(T2): default-flip empty-content recovery via HERMES_LEGACY_EMPTY_RECOVERY (refs #143) #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagnose: raw HTTP body capture before SDK parse (refs #138)#138

diagnose: raw HTTP body capture before SDK parse (refs #138)#138
PowerCreek merged 1 commit into
mainfrom
sdk-raw-response-capture-138

PowerCreek commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PowerCreek commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant