diagnose: raw HTTP body capture before SDK parse (refs #138)#138
Merged
Conversation
v0.18.8 evidence (gist ebded1633ae3) confirmed the OpenAI SDK
2.24.0 drops tool_calls during parse: devagentic wire has
tool_calls_count=2 finish_reason=tool_calls; hermes' post-parse
view shows sdk_tool_calls=0 finish_reason=stop. model_dump() can't
recover them either — the SDK has already filtered.
To isolate **server-bug vs SDK-bug**, we need the raw HTTP body
BEFORE the SDK parses it.
## Approach
Monkey-patch ``SyncAPIClient._process_response_data`` and
``AsyncAPIClient._process_response_data`` (from
``openai._base_client``). These are the choke points where every
parsed response funnels through. The patch:
1. Inspects the raw ``data`` dict (parsed from JSON, but
PRE-Pydantic-validation — so dropped fields are still there)
2. For ChatCompletion responses specifically:
- Logs per-choice summary: ``finish_reason``, ``content_len``,
``tool_calls_count`` directly from the raw dict
- Dumps the first 4KB of the raw JSON body
3. Delegates to the original ``_process_response_data``
Idempotent. Fail-soft on import / attr errors. Installed at
``chat_completions.py`` module import — runs once per Python
process; covers all 15+ OpenAI() construction sites in
auxiliary_client.py + the main agent path without modifying
any call signature.
## Pattern key
Operator captures stderr from sandbox repro + greps for
``[hermes-diag] sdk RAW``. Compare with the existing 4
``[hermes-diag]`` lines:
- **Server-bug**: ``sdk RAW pre-parse: choice[0]: ... tool_calls_count=0``
→ devagentic isn't actually sending the tool_calls on the wire
(its server-side log is misleading). Bug is server-side.
- **SDK-bug**: ``sdk RAW pre-parse: choice[0]: ... tool_calls_count=2``
followed by ``normalize_response entry: sdk_tool_calls=0`` → wire
HAS tool_calls but SDK validation drops them at parse. Bug is
SDK schema enforcement (likely a required field absent in
Stainless-generated types).
Once we know which, the fix path is clear:
- Server-bug → devagentic-side
- SDK-bug → hermes uses ``with_raw_response`` + bypass parse, or
patches the SDK's response model schema, or downgrades openai
to a permissive version
## Tests
- 39 tests pass across affected suites (chat_completions /
cascade_exhausted / finish_reason_tools / tool_call_type).
- Module-import verification confirms the patch installs +
``_diag`` writes the install marker to stderr.
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
PowerCreek
added a commit
that referenced
this pull request
May 27, 2026
#141) v0.18.9 (PR #138) installed the OpenAI SDK monkey-patch at module import + emitted 4 ``[hermes-diag]`` stderr lines per dispatch + a 4KB raw-body dump. Field-test confirmed this floods interactive sandbox tmux panes. Operator filed #140: gate behind env, default OFF, opt-in for debug. ## Fix New ``HERMES_DIAG_RAW_CAPTURE`` env var (truthy values: ``1``/``true``/``yes``/``on``). When unset/falsy: - ``_diag()`` no-ops silently (was: writes to stderr unconditionally) - ``_install_sdk_raw_capture()`` is NOT called at module import (was: monkey-patches SyncAPIClient + AsyncAPIClient on every process) - ``conv-loop empty-check`` stderr write in conversation_loop.py is short-circuited via ``_diag_enabled()`` import When env is truthy: full v0.18.9 diagnostic behavior — 4 lines per response + SDK raw body capture. Operators opt-in for debug: ``HERMES_DIAG_RAW_CAPTURE=1 hermes ...``. ## Stderr routing confirmed User's secondary ask: confirm ``_diag()`` routes to stderr. Verified — uses ``sys.stderr.write()`` directly (not ``print()`` which defaults to stdout). The "stdout leak" appearance in the field is likely tmux pane interleaving stderr + stdout in the same window. Regression-pinned by ``test_diag_stderr_routing_distinct_from_stdout``. ## Tests - 10 new tests in tests/agent/test_diag_env_gate.py: * 4 ``_diag_enabled`` resolver: unset/empty silent; 5 truthy values parametrized; 5 falsy/unknown parametrized * 3 ``_diag`` behavior: silent-when-off; writes-stderr-when-on; distinct-stdout-vs-stderr (using direct sys.stderr capture) * 2 SDK install-gate source-level: gate present at install call-site; install marker only logged via _diag (which is itself gated) - 47 total green across affected suites — no regression on cascade-exhausted, tool_call_type, or finish_reason_tools. ## Composition v0.18.5 → v0.18.10 diagnostic-funnel cleanup. The diagnostics-as-defense-in-depth premise (#133 close note) stays intact, just no longer fires by default. Operators retain ``HERMES_DIAG_RAW_CAPTURE=1`` as the debug knob.
PowerCreek
added a commit
that referenced
this pull request
May 28, 2026
…143) (#145) T3 of the #143 thin-client refactor scope. When the active provider is ``devagentic-local``, augment ``disabled_toolsets`` with ``"clarify"`` before the ``get_tool_definitions`` call. ## Rationale Per devagentic#203 §1.3 + the #143 scope: devagentic-side intent classifier knows when clarification is actually needed and can surface it via an OpenAI-shaped assistant message. Hermes' modal TUI clarify-tool was a layered opinion fighting devagentic-side classification — both rotation's debug evidence + sandbox UX showed the dual-source as confusing (clarify modal popped even with --yolo, ignoring devagentic-side intent signals). This is the smallest of the T1-T3 sequence and the cleanest revert path — purely a tool-registry adjustment for one provider. ## Behavior | Setting | Before | After | |---|---|---| | provider=devagentic-local, no --enable-toolset | clarify enabled | clarify implicitly disabled | | provider=devagentic-local, --enable-toolset clarify | clarify enabled | clarify enabled (explicit override) | | provider=other | unchanged | unchanged | A boot-line print informs operators of the implicit disable + how to re-enable for legacy workflows. Composes naturally with the existing ``HERMES_TOOLS_SUBSET`` narrowing (#75/#87) — disable happens first, then subset narrows further if set. ## Tests - 4 source-level tests in ``tests/agent/test_t3_clarify_default_out.py``: patch-landed, explicit-enable-overrides-implicit-disable, re-enable hint visible in print message, strict-equality on provider name (no prefix/alias matching to avoid surprise on related providers). - 21 total green across affected suites (T3 + diag-env-gate). ## Composition Per #143 sequencing (T3 → T1 → T2-gated → T2-default-flip): - This PR: T3 (clarify default-out) - Next: T1 (HERMES_DEFER_PERSONA default-flip for devagentic-local) - Then: T2 (empty-content recovery removal, env-gated then default) - Later: T4-T6 (tool list / iteration cap / summary fallback) ## Preserved through the refactor - PR #119 (cascade_exhausted short-circuit) — hermes correctly deferring to devagentic; NOT recovery - PR #122/#125 (raw tool_calls fallback) — pre-recovery wire parsing; belt-and-suspenders against future streaming-chunker regressions - PR #131/#136/#138/#141 diagnostics — env-gated via HERMES_DIAG_RAW_CAPTURE; no-op when off
PowerCreek
added a commit
that referenced
this pull request
May 28, 2026
…MPTY_RECOVERY (refs #143) (#147) T2 of the #143 thin-client refactor scope. Devagentic-side cascade (NousResearch#324) + runaway detector (NousResearch#345-348) + exec-terminus (NousResearch#349-354) now cover the empty-content recovery layer with full intent / role / dispatch-trace context. Hermes-side recovery layered on top caused 3×4 dispatch stacking (#118) + mode confusion + invisible swallow points (#133 debug funnel). This patch default-flips: the four hermes-side recovery paths are short-circuited unless ``HERMES_LEGACY_EMPTY_RECOVERY`` is set to a truthy value. Legacy users opt-in to keep the pre-T2 behavior. ## Gated paths (all skipped when env unset) | Path | Source | Loc | |---|---|---| | ``_finish_wants_tools`` synthetic recovery | PR #108 / #99 | line ~3618 | | ``_post_tool_empty_retried`` nudge | pre-existing NousResearch#9400-class | line ~3749 | | ``_structural_empty`` synthetic recovery | PR #69 / #67 | line ~3880 | | 3-retry empty-content loop | pre-#67 | line ~3927 | When env unset (default), empty responses fall through to either the fallback-chain provider switch (if configured) or the clean ``(empty)`` terminal with ``_empty_terminal_sentinel=True``. ## Preserved through T2 (verified by test_t2_legacy_empty_recovery_gate.py) - **PR #119 cascade_exhausted short-circuit** — hermes correctly deferring to devagentic's sentinel; NOT recovery. Source-level test asserts the ``if _cascade_err:`` block is NOT prefixed by ``_legacy_recovery_on``. - **PR #122/#125 raw tool_calls fallback** — pre-recovery wire parsing in transports/chat_completions.py. Source-level test asserts the helper name doesn't appear in conversation_loop (lives elsewhere; untouched). - **PR #131/#136/#138/#141 diagnostics** — env-gated via ``HERMES_DIAG_RAW_CAPTURE`` (#140), independent of this env. ## Operator deploy Default behavior changes: empty responses surface cleanly (no synthetic re-prompt). To preserve pre-T2 behavior: ```bash export HERMES_LEGACY_EMPTY_RECOVERY=1 ``` The legacy escape hatch is intended as a temporary safety net while operators validate the thin-client architecture. Once the devagentic-side cascade is universally deployed + observed to cover all empty-content cases, the legacy gate can be removed in a follow-up (the env var stays as a no-op for backward compat). ## Tests - 23 new tests in ``tests/agent/test_t2_legacy_empty_recovery_gate.py``: resolver default-false / empty-false / 6 truthy / 6 falsy-or- unknown; source-level gate assertions for each of the four recovery branches; preserved-path assertions for cascade_exhausted + raw tool_calls fallback location; default-off + opt-in resolver round-trip. - 163 total green across affected suites (T2 + existing empty- terminal mirror tests + finish_reason_tools_recovery + cascade_exhausted + internal_marker_stripping + tool_call_type_default + tool_use_enforcement + T3 + T1 persona + doctor persona probe). ## Composition Per #143 sequencing: - T3 / PR #145 (clarify default-out) — merged ✓ - T1 / PR #146 (persona default-flip) — merged ✓ - **T2 / this PR (empty-content recovery removal, env-gated)** - Later: T4-T6 (tool list / iteration cap / summary fallback) The legacy escape hatch design choice (vs. full removal) lets operators roll back per-deployment if a previously-recovered edge case surfaces in the field. Diagnostic from #140 (HERMES_DIAG_RAW_CAPTURE) remains the observability tool to spot any uncovered case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Monkey-patches SyncAPIClient._process_response_data and AsyncAPIClient._process_response_data (openai SDK 2.24.0) to capture the raw response dict BEFORE Pydantic validation strips fields. Logs per-choice summary + first 4KB of raw body via stderr-direct [hermes-diag] sdk RAW lines. Single install point covers all OpenAI() construction sites without modifying call signatures. Isolates server-bug vs SDK-bug for the codestral tool_calls drop. 39 tests pass.