ci(#526): gate the hook test suite in CI#527
Merged
Merged
Conversation
- bin/run-hook-tests.sh: glob-discovery runner (bash 3.2-portable), per-test timeout, fail-on-any, documented QUARANTINE skip-list. Reusable locally. - .github/workflows/tests.yml: runs the suite on PR + push to dev/main with jq installed and a git identity (test sandboxes git init/commit). - Turns ~65 advisory test_*.sh into an actual regression gate (only 2 ran in CI before). Decision + quarantine policy in AgDR-0067. First CI run is the Linux oracle for the real failing set (macOS /private/var symlink noise excluded); quarantine list + any in-scope fixes follow. Refs #526 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Linux CI oracle showed 59/65 pass. Of the 6 failures (all pre-existing on dev, not caused by the gate): - FIXED test_setup_reset.sh — pointed the template source at onboarding.example.yaml (my #517 untracked onboarding.yaml; this test still read the old path) - FIXED the /handover CLAUDE.md skill-table row (35→23 words, ≤25 budget) - QUARANTINED 5 (md-to-pdf network/chromium; handover-clone-prompt stale spec; agent-routing case-2 drift; harnessability 1/14 case; token-efficiency residual doc-hygiene drift) — each logged with a reason, tracked in #528 to fix + un-quarantine The other 6 macOS-only failures in local triage were /private/var symlink artifacts — they pass on Linux (confirmed by the first CI run). Refs #526 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
atlas-apex
commented
Jun 6, 2026
atlas-apex
left a comment
Collaborator
Author
There was a problem hiding this comment.
Code Review: PR #527 — APPROVED (posted as comment; cannot self-approve own PR)
Commit: 44604ffb0195944d47a57135af6b4f6922ec95d0
Summary
Turns the ~65-script mechanical-enforcement test corpus into an actual CI gate. Adds a bash-3.2-portable glob-discovery runner (bin/run-hook-tests.sh), a tests.yml workflow (PR + push to dev/main), AgDR-0067 (decision + quarantine-as-ratchet policy), and two in-scope fixes to land the gate green. The PR's own hook test suite check is GREEN at HEAD (PASS=60, FAIL=0, SKIP=5).
Checklist Results
- Architecture & Design: Pass
- Code Quality: Pass
- Testing: Pass (this PR is the testing infrastructure)
- Security: Pass (
permissions: contents: read, minimal scope) - Performance: Pass (per-test 120s cap; concurrency cancel-in-progress)
- PR Description & Glossary: Pass
- Summary Bullet Narrative: Pass (narrative, each with rationale)
- Technical Decisions (AgDR):Pass (AgDR-0067 present + linked; Refs #526)
- Adopter Handbooks: N/A (no migration/TS/domain code in a shell-only diff)
Verification performed (against PR HEAD, not local tree)
- CI green at HEAD —
hook test suitecheck = success (PASS=60/FAIL=0/SKIP=5). The gate fails-on-real-failures by design:set -uo pipefail(notset -e, so failures aggregate rather than abort the loop),rc=$?captured,exit 1whenfail>0. - Tracking issue #528 is real + well-formed — OPEN, lists all 5 quarantined tests with root causes and an AC that drains QUARANTINE to empty. A genuine ratchet, not a graveyard.
- Quarantine legitimacy spot-checked —
test_token_efficiency_wave1.shasserts a 200-char SKILL-description hard cap (matches cited "plan-initiative >200 chars" drift);test_handover_clone_prompt.shasserts pre-restructure clone-prompt strings ([y / n / later], "Offer the clone-first") which the SKILL has since moved. Both genuine pre-existing drift, not caused by this PR. All 5 SKIP-logged with cited reasons — visible, never silent. - test_setup_reset fix aligns to #517 — at PR HEAD
onboarding.yamlis NOT tracked (404 via contents API),onboarding.example.yamlexists (4640 bytes) with the\"Your Company Name\"placeholder, and/setup --resetdoescp onboarding.example.yaml onboarding.yaml(SKILL.md L80). Test now reads the correct source. - CLAUDE.md /handover trim accurate — 35→24 words; preserves the three substantive facts (5-dimension scoring, checklist doc-pick, Next-Steps ticket offer). No information loss.
Runner correctness (focus #1)
- bash 3.2 portable —
while IFS= read+< <(...)instead ofmapfile; POSIX param expansions. Good. - Empty-QUARANTINE guard under
set -u—is_quarantinedshort-circuits on${#QUARANTINE[@]} -gt 0before iterating, so draining the array in #528 won't trip an unbound-array expansion. Forward-looking + correct. - timeout fallback —
timeout→gtimeoutprobe, empty default, intentional unquoted word-split (SC2086 disabled). Correct. - exit codes — aggregates, prints failed list, exits 1 on any failure. Correct.
Issues Found
None blocking.
Suggestions (non-blocking)
- nit: the main run loop
for t in \"${TESTS[@]}\"lacks the empty-array guard thatis_quarantinedand--listhave. Underset -u+bash 3.2 a zero-test discovery would expand as unbound. Not reachable today (60+ tests) — but a one-line[ \"${#TESTS[@]}\" -gt 0 ] || { echo \"no tests discovered\"; exit 1; }would turn a silent-empty-pass into a loud failure, the safer direction for a gate.
Verdict
APPROVED (Rex first-pass). Approval marker written for the merge gate. Human approver does the second review before merge.
🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 44604ffb0195944d47a57135af6b4f6922ec95d0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_*.shsuites assert the mechanical-enforcement layer (merge gate, ticket-first + per-worktree tiers, onboarding/secrets/leak guards, validators, portfolio paths), but only 2 ran in CI; a hook regression shipped green. This adds a glob-discovery runner + atests.ymlworkflow that runs the whole suite on every PR and on push todev/main, failing the job on any failure.bin/run-hook-tests.sh— discovers everytest_*.sh/*.test.shunder.claude/**/tests/, runs each with a per-test timeout, prints PASS/FAIL/SKIP, exits non-zero on any failure. Bash-3.2-portable so it's reusable locally (bash bin/run-hook-tests.sh). A documentedQUARANTINEarray skips (and logs) tests that genuinely can't run headless — never a silent drop..github/workflows/tests.yml—ubuntu-latest, installsjq, sets a git identity (sandboxes dogit init/commit), runs the runner.Status / how I'm finalizing the quarantine
A local macOS run showed 52/65 passing; the 12 failures were a mix of macOS
/private/varsymlink artifacts (which pass on Linux) and genuinely stale tests. Rather than guess, this PR runs everything so the PR's owntestscheck is the Linux oracle. I'll then:So the
testscheck on this PR may be red on the first run by design — I'll drive it green before requesting merge.Testing
bash bin/run-hook-tests.sh --list→ discovers 65 tests (verified locally, bash 3.2).bash bin/run-hook-tests.shlocally → 52 pass / 12 fail (macOS); failures triaged via this PR's Linuxtestscheck.python3 yaml.safe_loadontests.yml+shellcheck -S error bin/run-hook-tests.sh→ clean.testscheck will be green.Refs #526
Glossary