fix(agent): emit guardrail-halt message to client before closing stream by teknium1 · Pull Request #31448 · NousResearch/hermes-agent

teknium1 · 2026-05-24T11:34:49Z

Summary

Guardrail halts now deliver the synthesized halt message to SSE/TUI clients instead of closing the stream silently. Fixes #30770.

Root cause

In agent/conversation_loop.py the guardrail-halt branch (~L3431) synthesizes final_response from a static template (_toolguard_controlled_halt_response) — it was never produced by the model and never streamed. The display callback was already flushed with None at L3423 before tool execution, so the SSE writer (gateway/platforms/api_server.py:_write_sse_chat_completion) drains an empty queue and emits a finish chunk with zero content. Open WebUI and similar clients render a blank bubble — indistinguishable from a crash.

The model-generated text_response path doesn't have this gap because its deltas were already streamed during the API call.

Changes

agent/conversation_loop.py — after the halt response is generated, push it through _safe_print (CLI/TUI) and stream_delta_callback (SSE) before breaking. Trailing None is filtered by the SSE writer (_on_delta at api_server.py:L1153) so it doesn't prematurely close the stream. (+13 LOC, salvaged from PR fix: emit guardrail halt message to client before closing stream #30807 by @annguyenNous)
tests/run_agent/test_tool_call_guardrail_runtime.py — regression test asserting the halt message reaches stream_delta_callback. Verified to fail when the production fix is reverted.

Validation

	Before	After
SSE client (Open WebUI)	empty stream, then finish chunk	content delta with halt explanation, then finish chunk
Targeted suite	8 passing	9 passing (regression test added)

scripts/run_tests.sh tests/run_agent/test_tool_call_guardrail_runtime.py → 9/9 pass.

Follow-up

Filing a separate issue for the two sibling sites with the same shape — partial_stream_recovery (~L3556) and fallback_prior_turn_content (~L3582) — where final_response is synthesized from non-streamed text. Lower-impact than the guardrail case (those paths only trigger on partial/empty-response recovery) but the gap is structurally the same.

Closes #30770
Salvages #30807

Infographic

github-actions · 2026-05-24T11:37:14Z

🔎 Lint report: `hermes/hermes-60de4c26` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9079 on HEAD, 9076 on base (🆕 +3)

🆕 New issues (2):

Rule	Count
`unresolved-attribute`	2

First entries

tests/run_agent/test_tool_call_guardrail_runtime.py:335: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_disable_streaming` on type `AIAgent`
tests/run_agent/test_tool_call_guardrail_runtime.py:330: [unresolved-attribute] unresolved-attribute: Unresolved attribute `stream_delta_callback` on type `AIAgent`

✅ Fixed issues: none

Unchanged: 4833 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

When the tool loop guardrail fires (max_tool_failures, etc.), the turn exits with guardrail_halt but no final assistant message was emitted to the client. The SSE stream closed silently — indistinguishable from a crash. The stream_delta_callback(None) before tool execution is a display flush, not a hard close. After generating the halt response, emit it through both _safe_print (CLI) and stream_delta_callback (SSE) so clients see the explanation. Fixes #30770

Regression guard for #30770 — verifies the guardrail-halt branch in agent/conversation_loop.py pushes the synthesized halt message through stream_delta_callback before breaking out of the loop. Without the emit, chat-completions SSE writers drain an empty queue and clients (Open WebUI, etc.) see a finish chunk with zero content delta — indistinguishable from a crash. Verified: the test fails when the production fix is reverted.

) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (#31329, #31448, #31695, #31709, #31745, #32264, #33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR #31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26) #AI commit#

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

teknium1 mentioned this pull request May 24, 2026

guardrail-halt-style silent stream close at partial_stream_recovery + fallback_prior_turn_content sites #31449

Open

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery labels May 24, 2026

briandevans mentioned this pull request May 24, 2026

fix(agent): emit recovery-path final response through stream_delta_callback #31477

Open

19 tasks

vanthinh6886 and others added 2 commits May 24, 2026 06:54

teknium1 force-pushed the hermes/hermes-60de4c26 branch from 5db53c4 to c44fafb Compare May 24, 2026 13:54

teknium1 merged commit 186bf25 into main May 24, 2026
26 checks passed

teknium1 deleted the hermes/hermes-60de4c26 branch May 24, 2026 14:38

haran2001 mentioned this pull request May 24, 2026

fix(agent): emit recovered/fallback text to stream clients before close (#31449) #31539

Open

1 task

teknium1 mentioned this pull request May 27, 2026

tools(terminal): nudge homebrewed CI pollers at the tool surface #33142

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): emit guardrail-halt message to client before closing stream#31448

fix(agent): emit guardrail-halt message to client before closing stream#31448
teknium1 merged 2 commits into
mainfrom
hermes/hermes-60de4c26

teknium1 commented May 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Changes

Validation

Follow-up

Infographic

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-60de4c26 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teknium1 commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-60de4c26` vs `origin/main`