Skip to content

tools(terminal): nudge homebrewed CI pollers at the tool surface#33142

Merged
teknium1 merged 1 commit into
mainfrom
ci-poller-tool-guard
May 27, 2026
Merged

tools(terminal): nudge homebrewed CI pollers at the tool surface#33142
teknium1 merged 1 commit into
mainfrom
ci-poller-tool-guard

Conversation

@teknium1

@teknium1 teknium1 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Background processes whose command contains gh pr view --json statusCheckRollup or gh pr checks | jq now get a runtime hint in the tool result pointing at the canonical green-ci-policy snippets. This is the embed-footguns-in-tool-surface pattern applied to a recurring silent-failure mode.

Why

The homebrew CI-poller shape has caused at least seven silent CI-watcher failures in the past two weeks alone:

PR Failure mode
#31329 Grepped for TTY-only banner that never appears when stdout is piped
#31448 Heredoc/quoting bug swallowed all stdout, never fired notify_on_complete
#31695 Agent described a poller as an option, never actually launched it
#31709 gh pr checks rc=1 fired notify_on_complete on first shard fail while 5 shards still queued
#31745 Filtered PENDING in .conclusion column while in-progress checks have empty conclusion
#32264 Block-buffered for ... done 2>&1 loop produced zero captured output for 12+ minutes
#33131 jq from_entries choked on null conclusion keys, loop never terminated; user surfaced "CI was already done" manually

Each one was a different variation of the same fundamental problem. The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse.

What it does

Detector in tools/terminal_tool.py is deliberately narrow — flags exactly two shapes:

  1. Any command containing statusCheckRollup (the JSON-API path — conclusion vs status field semantics keep burning us)
  2. gh pr view / gh pr checks combined with jq (gh pr checks doesn't emit JSON, so any | jq here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq)

Does NOT flag:

  • The blessed column-2 awk-on-tabs poller (awk -F"\t" "$2==\"pending\"")
  • The exit-code-driven gh pr checks $PR >/dev/null snippet
  • Non-CI background commands that happen to use awk/jq for unrelated reasons

The hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable.

Hint text

When the anti-pattern fires:

This looks like a homebrewed CI poller built from gh pr view --json statusCheckRollup and/or gh pr checks | jq. That shape has burned us repeatedly in hermes-agent dev work (PRs #31329, #31448, #31695, #31709, #31745, #32264, #33131) — stdout buffering kills output capture, jq null-key edge cases silently exit the loop, conclusion-vs-status field confusion exits early with bogus all-green verdicts, TTY-only summary banners never appear when piped. Use the canonical snippets in the green-ci-policy skill instead: the exit-code-driven gh pr checks $PR >/dev/null (rc 0 = green, 8 = pending, else fail) for exit-on-first-fail behavior, or the column-2 awk-on-tabs poller for sharded matrices. Load skill_view(...) for the verbatim snippets. If you must roll a custom loop with rich structured output, write each tick to a known file (tee -a /tmp/ci.log) and rely on process(action='log') to read THAT file — do not rely on background-process stdout capture for line-buffered shell loops.

Changes

  • tools/terminal_tool.py — +70 LOC: detector + hint emission alongside the existing notify_on_complete hint
  • tests/tools/test_notify_on_complete.py — +4 tests:
    • Positive: statusCheckRollup jq pipeline → hint
    • Positive: gh pr checks | jq → hint
    • Negative: canonical column-2 awk-on-tabs poller → no hint
    • Negative: canonical exit-code-driven loop → no hint
    • Negative: non-CI command using awk → no hint

Validation

Result
scripts/run_tests.sh tests/tools/test_notify_on_complete.py 30/30 passing in 0.5s (was 26)
Manual E2E with the exact broken poller from #33131 Hint fires immediately, names the canonical alternatives, references the skill file
Manual E2E with the canonical column-2 awk poller No hint, no false positive

Companion skill update

The companion skill update at ~/.hermes/skills/github/hermes-agent-dev/references/green-ci-policy.md (not in this PR — lives in private skill store) adds the #33131 incident to the pitfall list and documents the new tool-level hint so future agents read both layers consistently.

Infographic

ci-poller-tool-guard

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (#31329, #31448, #31695, #31709, #31745,
#32264, #33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR #31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: ci-poller-tool-guard vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9493 on HEAD, 9492 on base (🆕 +1)

🆕 New issues (1):

Rule Count
unsupported-operator 1
First entries
tools/terminal_tool.py:1979: [unsupported-operator] unsupported-operator: Operator `+` is not supported between objects of type `(str & ~AlwaysFalsy) | (int & ~AlwaysFalsy)` and `Literal["\n\n"]`

✅ Fixed issues: none

Unchanged: 5012 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit 187cf0f into main May 27, 2026
26 checks passed
@teknium1 teknium1 deleted the ci-poller-tool-guard branch May 27, 2026 09:22
@alt-glitch alt-glitch added type/refactor Code restructuring, no behavior change P3 Low — cosmetic, nice to have tool/terminal Terminal execution and process management labels May 27, 2026
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
#AI commit#
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…sResearch#33142)

Background processes whose command contains `gh pr view --json
statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in
the result pointing at the canonical green-ci-policy snippets. The
homebrew shape has caused at least seven silent CI-watcher failures
in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745,
NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the
same fundamental problem (stdout buffering, jq null-key edge cases,
conclusion-vs-status confusion, TTY-only banner grepping).

The skill that documents this anti-pattern is excellent, but a skill
only fires if the agent loads it. The tool surface fires on every
misuse. This is the embed-footguns-in-tool-surface pattern from
PR NousResearch#31289 applied to a recurring failure mode that's outgrown
skill-only enforcement.

Detector is deliberately narrow — flags two specific shapes:

  1. Any command containing `statusCheckRollup` (the JSON-API path —
     conclusion vs status field semantics keep burning us).
  2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr
     checks doesn't emit JSON, so any `| jq` here is confused intent;
     the canonical column-2 poller uses awk-on-tabs, not jq).

Does NOT flag the blessed column-2 awk-on-tabs poller (which uses
`awk -F"\t" "\==\"pending\""`) or the exit-code-driven
`gh pr checks $PR >/dev/null` snippet.

Hint composes with the existing background-without-notify_on_complete
hint — both can fire on the same call. Each is independently
actionable.

Tests:
- 4 new cases in tests/tools/test_notify_on_complete.py
- test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive)
- test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive)
- test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative)
- test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative)
- test_non_ci_background_command_does_not_emit_homebrew_hint (negative)
- 30/30 passing (was 26)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have tool/terminal Terminal execution and process management type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants