tools(terminal): nudge homebrewed CI pollers at the tool surface by teknium1 · Pull Request #33142 · NousResearch/hermes-agent

teknium1 · 2026-05-27T09:13:55Z

Summary

Background processes whose command contains gh pr view --json statusCheckRollup or gh pr checks | jq now get a runtime hint in the tool result pointing at the canonical green-ci-policy snippets. This is the embed-footguns-in-tool-surface pattern applied to a recurring silent-failure mode.

Why

The homebrew CI-poller shape has caused at least seven silent CI-watcher failures in the past two weeks alone:

PR	Failure mode
#31329	Grepped for TTY-only banner that never appears when stdout is piped
#31448	Heredoc/quoting bug swallowed all stdout, never fired notify_on_complete
#31695	Agent described a poller as an option, never actually launched it
#31709	`gh pr checks` rc=1 fired notify_on_complete on first shard fail while 5 shards still queued
#31745	Filtered `PENDING` in `.conclusion` column while in-progress checks have empty conclusion
#32264	Block-buffered `for ... done 2>&1` loop produced zero captured output for 12+ minutes
#33131	jq `from_entries` choked on null `conclusion` keys, loop never terminated; user surfaced "CI was already done" manually

Each one was a different variation of the same fundamental problem. The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse.

What it does

Detector in tools/terminal_tool.py is deliberately narrow — flags exactly two shapes:

Any command containing statusCheckRollup (the JSON-API path — conclusion vs status field semantics keep burning us)
gh pr view / gh pr checks combined with jq (gh pr checks doesn't emit JSON, so any | jq here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq)

Does NOT flag:

The blessed column-2 awk-on-tabs poller (awk -F"\t" "$2==\"pending\"")
The exit-code-driven gh pr checks $PR >/dev/null snippet
Non-CI background commands that happen to use awk/jq for unrelated reasons

The hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable.

Hint text

When the anti-pattern fires:

This looks like a homebrewed CI poller built from gh pr view --json statusCheckRollup and/or gh pr checks | jq. That shape has burned us repeatedly in hermes-agent dev work (PRs #31329, #31448, #31695, #31709, #31745, #32264, #33131) — stdout buffering kills output capture, jq null-key edge cases silently exit the loop, conclusion-vs-status field confusion exits early with bogus all-green verdicts, TTY-only summary banners never appear when piped. Use the canonical snippets in the green-ci-policy skill instead: the exit-code-driven gh pr checks $PR >/dev/null (rc 0 = green, 8 = pending, else fail) for exit-on-first-fail behavior, or the column-2 awk-on-tabs poller for sharded matrices. Load skill_view(...) for the verbatim snippets. If you must roll a custom loop with rich structured output, write each tick to a known file (tee -a /tmp/ci.log) and rely on process(action='log') to read THAT file — do not rely on background-process stdout capture for line-buffered shell loops.

Changes

tools/terminal_tool.py — +70 LOC: detector + hint emission alongside the existing notify_on_complete hint
tests/tools/test_notify_on_complete.py — +4 tests:
- Positive: statusCheckRollup jq pipeline → hint
- Positive: gh pr checks | jq → hint
- Negative: canonical column-2 awk-on-tabs poller → no hint
- Negative: canonical exit-code-driven loop → no hint
- Negative: non-CI command using awk → no hint

Validation

	Result
`scripts/run_tests.sh tests/tools/test_notify_on_complete.py`	30/30 passing in 0.5s (was 26)
Manual E2E with the exact broken poller from #33131	Hint fires immediately, names the canonical alternatives, references the skill file
Manual E2E with the canonical column-2 awk poller	No hint, no false positive

Companion skill update

The companion skill update at ~/.hermes/skills/github/hermes-agent-dev/references/green-ci-policy.md (not in this PR — lives in private skill store) adds the #33131 incident to the pitfall list and documents the new tool-level hint so future agents read both layers consistently.

Infographic

Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (#31329, #31448, #31695, #31709, #31745, #32264, #33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR #31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

github-actions · 2026-05-27T09:14:38Z

🔎 Lint report: `ci-poller-tool-guard` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9493 on HEAD, 9492 on base (🆕 +1)

🆕 New issues (1):

Rule	Count
`unsupported-operator`	1

First entries

tools/terminal_tool.py:1979: [unsupported-operator] unsupported-operator: Operator `+` is not supported between objects of type `(str & ~AlwaysFalsy) | (int & ~AlwaysFalsy)` and `Literal["\n\n"]`

✅ Fixed issues: none

Unchanged: 5012 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26) #AI commit#

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

teknium1 merged commit 187cf0f into main May 27, 2026
26 checks passed

teknium1 deleted the ci-poller-tool-guard branch May 27, 2026 09:22

alt-glitch added type/refactor Code restructuring, no behavior change P3 Low — cosmetic, nice to have tool/terminal Terminal execution and process management labels May 27, 2026

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools(terminal): nudge homebrewed CI pollers at the tool surface#33142

tools(terminal): nudge homebrewed CI pollers at the tool surface#33142
teknium1 merged 1 commit into
mainfrom
ci-poller-tool-guard

teknium1 commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What it does

Hint text

Changes

Validation

Companion skill update

Infographic

Uh oh!

github-actions Bot commented May 27, 2026

🔎 Lint report: ci-poller-tool-guard vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teknium1 commented May 27, 2026 •

edited

Loading

🔎 Lint report: `ci-poller-tool-guard` vs `origin/main`