fix(agent): emit guardrail-halt message to client before closing stream#31448
Merged
Conversation
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-attribute |
2 |
First entries
tests/run_agent/test_tool_call_guardrail_runtime.py:335: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_disable_streaming` on type `AIAgent`
tests/run_agent/test_tool_call_guardrail_runtime.py:330: [unresolved-attribute] unresolved-attribute: Unresolved attribute `stream_delta_callback` on type `AIAgent`
✅ Fixed issues: none
Unchanged: 4833 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
19 tasks
When the tool loop guardrail fires (max_tool_failures, etc.), the turn exits with guardrail_halt but no final assistant message was emitted to the client. The SSE stream closed silently — indistinguishable from a crash. The stream_delta_callback(None) before tool execution is a display flush, not a hard close. After generating the halt response, emit it through both _safe_print (CLI) and stream_delta_callback (SSE) so clients see the explanation. Fixes #30770
Regression guard for #30770 — verifies the guardrail-halt branch in agent/conversation_loop.py pushes the synthesized halt message through stream_delta_callback before breaking out of the loop. Without the emit, chat-completions SSE writers drain an empty queue and clients (Open WebUI, etc.) see a finish chunk with zero content delta — indistinguishable from a crash. Verified: the test fails when the production fix is reverted.
5db53c4 to
c44fafb
Compare
1 task
teknium1
added a commit
that referenced
this pull request
May 27, 2026
) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (#31329, #31448, #31695, #31709, #31745, #32264, #33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR #31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
mathias3
pushed a commit
to mathias3/hermes-agent
that referenced
this pull request
May 28, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26) #AI commit#
mosaiq-systems
pushed a commit
to mosaiq-systems/hermes-agent
that referenced
this pull request
May 29, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Guardrail halts now deliver the synthesized halt message to SSE/TUI clients instead of closing the stream silently. Fixes #30770.
Root cause
In
agent/conversation_loop.pythe guardrail-halt branch (~L3431) synthesizesfinal_responsefrom a static template (_toolguard_controlled_halt_response) — it was never produced by the model and never streamed. The display callback was already flushed withNoneat L3423 before tool execution, so the SSE writer (gateway/platforms/api_server.py:_write_sse_chat_completion) drains an empty queue and emits a finish chunk with zero content. Open WebUI and similar clients render a blank bubble — indistinguishable from a crash.The model-generated
text_responsepath doesn't have this gap because its deltas were already streamed during the API call.Changes
agent/conversation_loop.py— after the halt response is generated, push it through_safe_print(CLI/TUI) andstream_delta_callback(SSE) before breaking. TrailingNoneis filtered by the SSE writer (_on_deltaat api_server.py:L1153) so it doesn't prematurely close the stream. (+13 LOC, salvaged from PR fix: emit guardrail halt message to client before closing stream #30807 by @annguyenNous)tests/run_agent/test_tool_call_guardrail_runtime.py— regression test asserting the halt message reachesstream_delta_callback. Verified to fail when the production fix is reverted.Validation
scripts/run_tests.sh tests/run_agent/test_tool_call_guardrail_runtime.py→ 9/9 pass.Follow-up
Filing a separate issue for the two sibling sites with the same shape —
partial_stream_recovery(~L3556) andfallback_prior_turn_content(~L3582) — wherefinal_responseis synthesized from non-streamed text. Lower-impact than the guardrail case (those paths only trigger on partial/empty-response recovery) but the gap is structurally the same.Closes #30770
Salvages #30807
Infographic