fix(background-review): allow pinned skills to be improved by teknium1 · Pull Request #31329 · NousResearch/hermes-agent

teknium1 · 2026-05-24T05:36:48Z

Summary

Background review now patches pinned skills when a pitfall/missing step turns up. Before this fix, the reviewer prompt listed pinned skills under 'Protected skills (DO NOT edit these)' and told the agent to say 'Nothing to save.' if only protected skills needed updating — so pinning a skill silently disabled improvement on it, requiring an unpin/re-pin dance.

The underlying tool layer already gets this right: skill_manage's _pinned_guard fires only on action=delete; patch/edit/write_file go through on pinned skills. Curator archive/consolidation still skips pinned at the data layer (agent/curator.py L272). Pin's job is anti-deletion/anti-consolidation, not anti-content-update — this PR aligns the prompt with that contract.

Reported by @BrennerSpear on X: 'I still want to be able to improve pinned skills. I get blocked from improving skills that are pinned - I want them to just not be touched by background curator jobs.'

Changes

agent/background_review.py: in both _SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT, drop pinned skills from the protected list and replace with explicit guidance that pinned skills CAN be improved, with rationale (pin = anti-deletion/archive/consolidation, not anti-content-update).

Validation

	Before	After
Reviewer behavior on pinned skill needing fix	refuses, says 'Nothing to save.'	patches the skill
`skill_manage(action=delete)` on pinned	blocked (unchanged)	blocked (unchanged)
Curator archive/consolidation of pinned	skips (unchanged)	skips (unchanged)
Targeted tests: `tests/run_agent/test_background_review*.py` + `test_review_prompt_class_first.py`	37/37 ✓	37/37 ✓

Infographic

The post-turn background reviewer prompt listed pinned skills under 'Protected skills (DO NOT edit these)' alongside bundled and hub-installed skills, with the instruction to say 'Nothing to save.' if only protected skills needed updating. This meant the reviewer would refuse to patch a pinned skill even when the user explicitly wanted that skill improved. The underlying tool layer already gets this right: skill_manage's _pinned_guard only fires on delete; patch/edit/write_file go through on pinned skills. Curator archive/consolidation still skips pinned at the data layer (agent/curator.py), which is the correct place for that protection — pin's job is anti-deletion, not anti-improvement. Both _SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT now explicitly tell the reviewer that pinned skills can be patched, with rationale, so it doesn't bail out of an improvement just because the target is pinned.

github-actions · 2026-05-24T05:37:24Z

🔎 Lint report: `hermes/hermes-8e894e96` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9047 on HEAD, 9047 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4816 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (#31329, #31448, #31695, #31709, #31745, #32264, #33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR #31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26) #AI commit#

…sResearch#33142) Background processes whose command contains `gh pr view --json statusCheckRollup` or `gh pr checks | jq` now get a runtime hint in the result pointing at the canonical green-ci-policy snippets. The homebrew shape has caused at least seven silent CI-watcher failures in the past two weeks (NousResearch#31329, NousResearch#31448, NousResearch#31695, NousResearch#31709, NousResearch#31745, NousResearch#32264, NousResearch#33131) — each one a different jq/awk/grep variation of the same fundamental problem (stdout buffering, jq null-key edge cases, conclusion-vs-status confusion, TTY-only banner grepping). The skill that documents this anti-pattern is excellent, but a skill only fires if the agent loads it. The tool surface fires on every misuse. This is the embed-footguns-in-tool-surface pattern from PR NousResearch#31289 applied to a recurring failure mode that's outgrown skill-only enforcement. Detector is deliberately narrow — flags two specific shapes: 1. Any command containing `statusCheckRollup` (the JSON-API path — conclusion vs status field semantics keep burning us). 2. `gh pr view` / `gh pr checks` combined with `jq` (gh pr checks doesn't emit JSON, so any `| jq` here is confused intent; the canonical column-2 poller uses awk-on-tabs, not jq). Does NOT flag the blessed column-2 awk-on-tabs poller (which uses `awk -F"\t" "\==\"pending\""`) or the exit-code-driven `gh pr checks $PR >/dev/null` snippet. Hint composes with the existing background-without-notify_on_complete hint — both can fire on the same call. Each is independently actionable. Tests: - 4 new cases in tests/tools/test_notify_on_complete.py - test_homebrew_ci_poller_via_statusCheckRollup_emits_hint (positive) - test_homebrew_ci_poller_via_gh_pr_checks_piped_to_jq_emits_hint (positive) - test_canonical_column2_awk_poller_does_not_emit_homebrew_hint (negative) - test_canonical_gh_pr_checks_exit_code_loop_does_not_emit_hint (negative) - test_non_ci_background_command_does_not_emit_homebrew_hint (negative) - 30/30 passing (was 26)

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/skills Skills system (list, view, manage) labels May 24, 2026

teknium1 merged commit 2442a0c into main May 24, 2026
26 checks passed

teknium1 deleted the hermes/hermes-8e894e96 branch May 24, 2026 05:57

teknium1 mentioned this pull request May 27, 2026

tools(terminal): nudge homebrewed CI pollers at the tool surface #33142

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(background-review): allow pinned skills to be improved#31329

fix(background-review): allow pinned skills to be improved#31329
teknium1 merged 1 commit into
mainfrom
hermes/hermes-8e894e96

teknium1 commented May 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Infographic

Uh oh!

github-actions Bot commented May 24, 2026

🔎 Lint report: hermes/hermes-8e894e96 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teknium1 commented May 24, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-8e894e96` vs `origin/main`