tools(terminal): nudge homebrewed CI pollers at the tool surface#33142
Merged
Conversation
Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (#31329, #31448, #31695, #31709, #31745, #32264, #33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR #31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unsupported-operator |
1 |
First entries
tools/terminal_tool.py:1979: [unsupported-operator] unsupported-operator: Operator `+` is not supported between objects of type `(str & ~AlwaysFalsy) | (int & ~AlwaysFalsy)` and `Literal["\n\n"]`
✅ Fixed issues: none
Unchanged: 5012 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
1 task
mathias3
pushed a commit
to mathias3/hermes-agent
that referenced
this pull request
May 28, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26) #AI commit#
mosaiq-systems
pushed a commit
to mosaiq-systems/hermes-agent
that referenced
this pull request
May 29, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Background processes whose command contains
gh pr view --json statusCheckRolluporgh pr checks | jqnow get a runtime hint in the tool result pointing at the canonical green-ci-policy snippets. This is the embed-footguns-in-tool-surface pattern applied to a recurring silent-failure mode.Why
The homebrew CI-poller shape has caused at least seven silent CI-watcher failures in the past two weeks alone:
gh pr checksrc=1 fired notify_on_complete on first shard fail while 5 shards still queuedPENDINGin.conclusioncolumn while in-progress checks have empty conclusionfor ... done 2>&1loop produced zero captured output for 12+ minutesfrom_entrieschoked on nullconclusionkeys, loop never terminated; user surfaced "CI was already done" manuallyEach one was a different variation of the same fundamental problem. The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse.
What it does
Detector in
tools/terminal_tool.pyis deliberately narrow — flags exactly two shapes:statusCheckRollup(the JSON-API path — conclusion vs status field semantics keep burning us)gh pr view/gh pr checkscombined withjq(gh pr checksdoesn't emit JSON, so any| jqhere is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq)Does NOT flag:
awk -F"\t" "$2==\"pending\"")gh pr checks $PR >/dev/nullsnippetThe hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable.
Hint text
When the anti-pattern fires:
Changes
tools/terminal_tool.py— +70 LOC: detector + hint emission alongside the existing notify_on_complete hinttests/tools/test_notify_on_complete.py— +4 tests:statusCheckRollupjq pipeline → hintgh pr checks | jq→ hintValidation
scripts/run_tests.sh tests/tools/test_notify_on_complete.pyCompanion skill update
The companion skill update at
~/.hermes/skills/github/hermes-agent-dev/references/green-ci-policy.md(not in this PR — lives in private skill store) adds the #33131 incident to the pitfall list and documents the new tool-level hint so future agents read both layers consistently.Infographic