Skip to content

fix(agent): emit guardrail-halt message to client before closing stream#31448

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-60de4c26
May 24, 2026
Merged

fix(agent): emit guardrail-halt message to client before closing stream#31448
teknium1 merged 2 commits into
mainfrom
hermes/hermes-60de4c26

Conversation

@teknium1

@teknium1 teknium1 commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Guardrail halts now deliver the synthesized halt message to SSE/TUI clients instead of closing the stream silently. Fixes #30770.

Root cause

In agent/conversation_loop.py the guardrail-halt branch (~L3431) synthesizes final_response from a static template (_toolguard_controlled_halt_response) — it was never produced by the model and never streamed. The display callback was already flushed with None at L3423 before tool execution, so the SSE writer (gateway/platforms/api_server.py:_write_sse_chat_completion) drains an empty queue and emits a finish chunk with zero content. Open WebUI and similar clients render a blank bubble — indistinguishable from a crash.

The model-generated text_response path doesn't have this gap because its deltas were already streamed during the API call.

Changes

  • agent/conversation_loop.py — after the halt response is generated, push it through _safe_print (CLI/TUI) and stream_delta_callback (SSE) before breaking. Trailing None is filtered by the SSE writer (_on_delta at api_server.py:L1153) so it doesn't prematurely close the stream. (+13 LOC, salvaged from PR fix: emit guardrail halt message to client before closing stream #30807 by @annguyenNous)
  • tests/run_agent/test_tool_call_guardrail_runtime.py — regression test asserting the halt message reaches stream_delta_callback. Verified to fail when the production fix is reverted.

Validation

Before After
SSE client (Open WebUI) empty stream, then finish chunk content delta with halt explanation, then finish chunk
Targeted suite 8 passing 9 passing (regression test added)

scripts/run_tests.sh tests/run_agent/test_tool_call_guardrail_runtime.py → 9/9 pass.

Follow-up

Filing a separate issue for the two sibling sites with the same shape — partial_stream_recovery (~L3556) and fallback_prior_turn_content (~L3582) — where final_response is synthesized from non-streamed text. Lower-impact than the guardrail case (those paths only trigger on partial/empty-response recovery) but the gap is structurally the same.

Closes #30770
Salvages #30807

Infographic

guardrail-halt-no-more-silent-streams

@github-actions

github-actions Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-60de4c26 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9079 on HEAD, 9076 on base (🆕 +3)

🆕 New issues (2):

Rule Count
unresolved-attribute 2
First entries
tests/run_agent/test_tool_call_guardrail_runtime.py:335: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_disable_streaming` on type `AIAgent`
tests/run_agent/test_tool_call_guardrail_runtime.py:330: [unresolved-attribute] unresolved-attribute: Unresolved attribute `stream_delta_callback` on type `AIAgent`

✅ Fixed issues: none

Unchanged: 4833 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery labels May 24, 2026
vanthinh6886 and others added 2 commits May 24, 2026 06:54
When the tool loop guardrail fires (max_tool_failures, etc.), the
turn exits with guardrail_halt but no final assistant message was
emitted to the client. The SSE stream closed silently —
indistinguishable from a crash.

The stream_delta_callback(None) before tool execution is a display
flush, not a hard close. After generating the halt response, emit
it through both _safe_print (CLI) and stream_delta_callback (SSE)
so clients see the explanation.

Fixes #30770
Regression guard for #30770 — verifies the guardrail-halt branch in
agent/conversation_loop.py pushes the synthesized halt message through
stream_delta_callback before breaking out of the loop.  Without the
emit, chat-completions SSE writers drain an empty queue and clients
(Open WebUI, etc.) see a finish chunk with zero content delta —
indistinguishable from a crash.

Verified: the test fails when the production fix is reverted.
@teknium1 teknium1 force-pushed the hermes/hermes-60de4c26 branch from 5db53c4 to c44fafb Compare May 24, 2026 13:54
@teknium1 teknium1 merged commit 186bf25 into main May 24, 2026
26 checks passed
@teknium1 teknium1 deleted the hermes/hermes-60de4c26 branch May 24, 2026 14:38
teknium1 added a commit that referenced this pull request May 27, 2026
)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (#31329, #31448, #31695, #31709, #31745,
#32264, #33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR #31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
#AI commit#
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

guardrail_halt exits silently — no final assistant message delivered to client

3 participants