Skip to content

feat(provider): short-circuit retry on cascade_exhausted sentinel (closes #118)#119

Merged
PowerCreek merged 1 commit into
mainfrom
cascade-exhausted-short-circuit-118
May 27, 2026
Merged

feat(provider): short-circuit retry on cascade_exhausted sentinel (closes #118)#119
PowerCreek merged 1 commit into
mainfrom
cascade-exhausted-short-circuit-118

Conversation

@PowerCreek

Copy link
Copy Markdown

Closes #118. Companion to devagentic#324 (server-side recovery cascade). Hermes detects the devagentic_cascade_exhausted sentinel + skips its 3 retries (preventing 3×4=12 dispatch stacking per devagentic#321). Surfaces trace_id with dispatchTrace post-mortem hint. 12 new tests, 73 total green. Ships against documented envelope shape per #118 sequencing — useful the moment devagentic#324 Phase 1 emits its first response.

…oses #118)

Companion to devagentic#324 (server-side recovery cascade). When
devagentic's 4-step cascade surrenders, it returns HTTP 200 with an
error envelope of the shape:

  {"error": {"message": "...", "code": "devagentic_cascade_exhausted",
             "devagentic": {"trace_id": ..., "steps_attempted": [...],
                            "terminal_outcome": ...}}}

Without this fix, hermes' 3 retries would stack on top of devagentic's
4 cascade alternates for 3 × 4 = 12 dispatches per user request —
rate-budget blowout (per devagentic#321). With this fix, hermes
treats the sentinel as terminal + surfaces the trace_id so
post-mortem is a single ``dispatchTrace(trace_id)`` GraphQL call.

## Implementation

### Detection (chat_completions.py)

- New ``CASCADE_EXHAUSTED_CODE = "devagentic_cascade_exhausted"``
  constant + ``_extract_cascade_exhausted(response)`` helper
- Inspection order covers (a) direct ``response.error`` attr, (b)
  Pydantic v2 ``response.model_extra["error"]``, (c) dict-shape
  fallback — different OpenAI SDK versions park unknown top-level
  fields differently
- Fail-soft on attribute errors

### Normalization (chat_completions.py)

- ``ChatCompletionsTransport.normalize_response`` short-circuits
  BEFORE the choices lookup when the sentinel is detected
- Returns a ``NormalizedResponse`` with the envelope in
  ``provider_data["cascade_exhausted"]`` + the human-readable
  error message as content + ``finish_reason="stop"``

### Short-circuit (conversation_loop.py)

- In the else branch at line 3504 (no-tool-calls path), check
  ``assistant_message.provider_data["cascade_exhausted"]`` BEFORE
  the empty-retry path
- When present:
  * WARN log with trace_id + terminal_outcome
  * ``_emit_status`` ⛔ surface with trace_id
  * Build assistant message with ``dispatchTrace`` post-mortem hint
  * Mark ``_empty_terminal_sentinel`` so the loop exits cleanly
  * Set ``_turn_exit_reason = "devagentic_cascade_exhausted"``
  * ``break`` (no retry)

## Sequencing per #118

Issue allows shipping in parallel with devagentic#324: "hermes-
maintainer can stub against the documented envelope before NousResearch#324
ships to land in parallel." This PR ships against the documented
envelope shape — operator-side becomes useful the moment
devagentic#324 Phase 1 emits its first cascade_exhausted response.

## Tests

- 12 new tests in tests/agent/test_cascade_exhausted_sentinel.py:
  * 6 extraction-helper tests (None resp, no error field,
    non-dict error, wrong code, 3 parametrized sentinel-present
    via attr/model_extra/dict, fail-soft on raising attr)
  * 1 normalize_response integration (envelope round-trips into
    provider_data with finish_reason='stop' + content from message)
  * 1 source-level patch-landed check
  * 3 short-circuit guard mirror tests
  * 1 specific code-not-matching test (no false-positive on other
    error codes like rate_limited)
- 73 total green across affected suites — no regression.
@PowerCreek PowerCreek merged commit ea58fbb into main May 27, 2026
@PowerCreek PowerCreek deleted the cascade-exhausted-short-circuit-118 branch May 27, 2026 03:43
PowerCreek added a commit that referenced this pull request May 28, 2026
…143) (#145)

T3 of the #143 thin-client refactor scope. When the active provider
is ``devagentic-local``, augment ``disabled_toolsets`` with
``"clarify"`` before the ``get_tool_definitions`` call.

## Rationale

Per devagentic#203 §1.3 + the #143 scope: devagentic-side intent
classifier knows when clarification is actually needed and can
surface it via an OpenAI-shaped assistant message. Hermes' modal
TUI clarify-tool was a layered opinion fighting devagentic-side
classification — both rotation's debug evidence + sandbox UX
showed the dual-source as confusing (clarify modal popped even
with --yolo, ignoring devagentic-side intent signals).

This is the smallest of the T1-T3 sequence and the cleanest
revert path — purely a tool-registry adjustment for one provider.

## Behavior

| Setting | Before | After |
|---|---|---|
| provider=devagentic-local, no --enable-toolset | clarify enabled | clarify implicitly disabled |
| provider=devagentic-local, --enable-toolset clarify | clarify enabled | clarify enabled (explicit override) |
| provider=other | unchanged | unchanged |

A boot-line print informs operators of the implicit disable + how
to re-enable for legacy workflows. Composes naturally with the
existing ``HERMES_TOOLS_SUBSET`` narrowing (#75/#87) — disable
happens first, then subset narrows further if set.

## Tests

- 4 source-level tests in
  ``tests/agent/test_t3_clarify_default_out.py``: patch-landed,
  explicit-enable-overrides-implicit-disable, re-enable hint
  visible in print message, strict-equality on provider name
  (no prefix/alias matching to avoid surprise on related
  providers).
- 21 total green across affected suites (T3 + diag-env-gate).

## Composition

Per #143 sequencing (T3 → T1 → T2-gated → T2-default-flip):
- This PR: T3 (clarify default-out)
- Next: T1 (HERMES_DEFER_PERSONA default-flip for devagentic-local)
- Then: T2 (empty-content recovery removal, env-gated then default)
- Later: T4-T6 (tool list / iteration cap / summary fallback)

## Preserved through the refactor

- PR #119 (cascade_exhausted short-circuit) — hermes correctly
  deferring to devagentic; NOT recovery
- PR #122/#125 (raw tool_calls fallback) — pre-recovery wire
  parsing; belt-and-suspenders against future streaming-chunker
  regressions
- PR #131/#136/#138/#141 diagnostics — env-gated via
  HERMES_DIAG_RAW_CAPTURE; no-op when off
PowerCreek added a commit that referenced this pull request May 28, 2026
…MPTY_RECOVERY (refs #143) (#147)

T2 of the #143 thin-client refactor scope. Devagentic-side cascade
(NousResearch#324) + runaway detector (NousResearch#345-348) + exec-terminus (NousResearch#349-354)
now cover the empty-content recovery layer with full intent /
role / dispatch-trace context. Hermes-side recovery layered on top
caused 3×4 dispatch stacking (#118) + mode confusion + invisible
swallow points (#133 debug funnel).

This patch default-flips: the four hermes-side recovery paths are
short-circuited unless ``HERMES_LEGACY_EMPTY_RECOVERY`` is set to
a truthy value. Legacy users opt-in to keep the pre-T2 behavior.

## Gated paths (all skipped when env unset)

| Path | Source | Loc |
|---|---|---|
| ``_finish_wants_tools`` synthetic recovery | PR #108 / #99 | line ~3618 |
| ``_post_tool_empty_retried`` nudge | pre-existing NousResearch#9400-class | line ~3749 |
| ``_structural_empty`` synthetic recovery | PR #69 / #67 | line ~3880 |
| 3-retry empty-content loop | pre-#67 | line ~3927 |

When env unset (default), empty responses fall through to either
the fallback-chain provider switch (if configured) or the clean
``(empty)`` terminal with ``_empty_terminal_sentinel=True``.

## Preserved through T2 (verified by test_t2_legacy_empty_recovery_gate.py)

- **PR #119 cascade_exhausted short-circuit** — hermes correctly
  deferring to devagentic's sentinel; NOT recovery. Source-level
  test asserts the ``if _cascade_err:`` block is NOT prefixed by
  ``_legacy_recovery_on``.
- **PR #122/#125 raw tool_calls fallback** — pre-recovery wire
  parsing in transports/chat_completions.py. Source-level test
  asserts the helper name doesn't appear in conversation_loop
  (lives elsewhere; untouched).
- **PR #131/#136/#138/#141 diagnostics** — env-gated via
  ``HERMES_DIAG_RAW_CAPTURE`` (#140), independent of this env.

## Operator deploy

Default behavior changes: empty responses surface cleanly (no
synthetic re-prompt). To preserve pre-T2 behavior:

```bash
export HERMES_LEGACY_EMPTY_RECOVERY=1
```

The legacy escape hatch is intended as a temporary safety net
while operators validate the thin-client architecture. Once the
devagentic-side cascade is universally deployed + observed to
cover all empty-content cases, the legacy gate can be removed in
a follow-up (the env var stays as a no-op for backward compat).

## Tests

- 23 new tests in
  ``tests/agent/test_t2_legacy_empty_recovery_gate.py``:
  resolver default-false / empty-false / 6 truthy / 6 falsy-or-
  unknown; source-level gate assertions for each of the four
  recovery branches; preserved-path assertions for cascade_exhausted
  + raw tool_calls fallback location; default-off + opt-in
  resolver round-trip.
- 163 total green across affected suites (T2 + existing empty-
  terminal mirror tests + finish_reason_tools_recovery +
  cascade_exhausted + internal_marker_stripping +
  tool_call_type_default + tool_use_enforcement + T3 + T1 persona
  + doctor persona probe).

## Composition

Per #143 sequencing:
- T3 / PR #145 (clarify default-out) — merged ✓
- T1 / PR #146 (persona default-flip) — merged ✓
- **T2 / this PR (empty-content recovery removal, env-gated)**
- Later: T4-T6 (tool list / iteration cap / summary fallback)

The legacy escape hatch design choice (vs. full removal) lets
operators roll back per-deployment if a previously-recovered edge
case surfaces in the field. Diagnostic from #140
(HERMES_DIAG_RAW_CAPTURE) remains the observability tool to spot
any uncovered case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

companion to devagentic#324: short-circuit retry on cascade_exhausted sentinel + surface trace_id

1 participant