ci(#526): gate the hook test suite in CI by atlas-apex · Pull Request #527 · me2resh/apexyard

atlas-apex · 2026-06-06T11:41:13Z

Summary

Turns the test corpus into an actual gate — ~65 test_*.sh suites assert the mechanical-enforcement layer (merge gate, ticket-first + per-worktree tiers, onboarding/secrets/leak guards, validators, portfolio paths), but only 2 ran in CI; a hook regression shipped green. This adds a glob-discovery runner + a tests.yml workflow that runs the whole suite on every PR and on push to dev/main, failing the job on any failure.
bin/run-hook-tests.sh — discovers every test_*.sh / *.test.sh under .claude/**/tests/, runs each with a per-test timeout, prints PASS/FAIL/SKIP, exits non-zero on any failure. Bash-3.2-portable so it's reusable locally (bash bin/run-hook-tests.sh). A documented QUARANTINE array skips (and logs) tests that genuinely can't run headless — never a silent drop.
.github/workflows/tests.yml — ubuntu-latest, installs jq, sets a git identity (sandboxes do git init/commit), runs the runner.
Decision + quarantine policy in AgDR-0067.

Status / how I'm finalizing the quarantine

A local macOS run showed 52/65 passing; the 12 failures were a mix of macOS /private/var symlink artifacts (which pass on Linux) and genuinely stale tests. Rather than guess, this PR runs everything so the PR's own tests check is the Linux oracle. I'll then:

leave the gate enforcing all tests that pass on Linux,
quarantine (with a cited reason + follow-up ticket) only the genuinely-stale/headless-incompatible ones,
fix any trivially-in-scope failures (e.g. tests that [Feature] Keep onboarding config out of git: example-file + gitignore + commit-time guard #517 outdated).

So the tests check on this PR may be red on the first run by design — I'll drive it green before requesting merge.

Testing

bash bin/run-hook-tests.sh --list → discovers 65 tests (verified locally, bash 3.2).
bash bin/run-hook-tests.sh locally → 52 pass / 12 fail (macOS); failures triaged via this PR's Linux tests check.
python3 yaml.safe_load on tests.yml + shellcheck -S error bin/run-hook-tests.sh → clean.
Pre-merge smoke: a deliberately-broken hook will be confirmed to make the job go red (AC), and the final tests check will be green.

Refs #526

Glossary

Term	Definition
Discovery runner	A script that finds test files by glob and runs them all, instead of a hardcoded list that drifts.
Quarantine	An explicit, logged skip-list for tests that can't run headless or are known-failing-and-tracked — visible, not silent.
Gated test	A test whose failure fails CI (blocks merge), vs an advisory test that only runs if someone remembers.
Linux oracle	Using the actual CI environment to determine the true pass/fail set, free of local macOS artifacts.

- bin/run-hook-tests.sh: glob-discovery runner (bash 3.2-portable), per-test timeout, fail-on-any, documented QUARANTINE skip-list. Reusable locally. - .github/workflows/tests.yml: runs the suite on PR + push to dev/main with jq installed and a git identity (test sandboxes git init/commit). - Turns ~65 advisory test_*.sh into an actual regression gate (only 2 ran in CI before). Decision + quarantine policy in AgDR-0067. First CI run is the Linux oracle for the real failing set (macOS /private/var symlink noise excluded); quarantine list + any in-scope fixes follow. Refs #526 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Linux CI oracle showed 59/65 pass. Of the 6 failures (all pre-existing on dev, not caused by the gate): - FIXED test_setup_reset.sh — pointed the template source at onboarding.example.yaml (my #517 untracked onboarding.yaml; this test still read the old path) - FIXED the /handover CLAUDE.md skill-table row (35→23 words, ≤25 budget) - QUARANTINED 5 (md-to-pdf network/chromium; handover-clone-prompt stale spec; agent-routing case-2 drift; harnessability 1/14 case; token-efficiency residual doc-hygiene drift) — each logged with a reason, tracked in #528 to fix + un-quarantine The other 6 macOS-only failures in local triage were /private/var symlink artifacts — they pass on Linux (confirmed by the first CI run). Refs #526 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

atlas-apex

Code Review: PR #527 — APPROVED (posted as comment; cannot self-approve own PR)

Commit: 44604ffb0195944d47a57135af6b4f6922ec95d0

Summary

Turns the ~65-script mechanical-enforcement test corpus into an actual CI gate. Adds a bash-3.2-portable glob-discovery runner (bin/run-hook-tests.sh), a tests.yml workflow (PR + push to dev/main), AgDR-0067 (decision + quarantine-as-ratchet policy), and two in-scope fixes to land the gate green. The PR's own hook test suite check is GREEN at HEAD (PASS=60, FAIL=0, SKIP=5).

Checklist Results

Architecture & Design: Pass
Code Quality: Pass
Testing: Pass (this PR is the testing infrastructure)
Security: Pass (permissions: contents: read, minimal scope)
Performance: Pass (per-test 120s cap; concurrency cancel-in-progress)
PR Description & Glossary: Pass
Summary Bullet Narrative: Pass (narrative, each with rationale)
Technical Decisions (AgDR):Pass (AgDR-0067 present + linked; Refs #526)
Adopter Handbooks: N/A (no migration/TS/domain code in a shell-only diff)

Verification performed (against PR HEAD, not local tree)

CI green at HEAD — hook test suite check = success (PASS=60/FAIL=0/SKIP=5). The gate fails-on-real-failures by design: set -uo pipefail (not set -e, so failures aggregate rather than abort the loop), rc=$? captured, exit 1 when fail>0.
Tracking issue #528 is real + well-formed — OPEN, lists all 5 quarantined tests with root causes and an AC that drains QUARANTINE to empty. A genuine ratchet, not a graveyard.
Quarantine legitimacy spot-checked — test_token_efficiency_wave1.sh asserts a 200-char SKILL-description hard cap (matches cited "plan-initiative >200 chars" drift); test_handover_clone_prompt.sh asserts pre-restructure clone-prompt strings ([y / n / later], "Offer the clone-first") which the SKILL has since moved. Both genuine pre-existing drift, not caused by this PR. All 5 SKIP-logged with cited reasons — visible, never silent.
test_setup_reset fix aligns to #517 — at PR HEAD onboarding.yaml is NOT tracked (404 via contents API), onboarding.example.yaml exists (4640 bytes) with the \"Your Company Name\" placeholder, and /setup --reset does cp onboarding.example.yaml onboarding.yaml (SKILL.md L80). Test now reads the correct source.
CLAUDE.md /handover trim accurate — 35→24 words; preserves the three substantive facts (5-dimension scoring, checklist doc-pick, Next-Steps ticket offer). No information loss.

Runner correctness (focus #1)

bash 3.2 portable — while IFS= read + < <(...) instead of mapfile; POSIX param expansions. Good.
Empty-QUARANTINE guard under set -u — is_quarantined short-circuits on ${#QUARANTINE[@]} -gt 0 before iterating, so draining the array in #528 won't trip an unbound-array expansion. Forward-looking + correct.
timeout fallback — timeout→gtimeout probe, empty default, intentional unquoted word-split (SC2086 disabled). Correct.
exit codes — aggregates, prints failed list, exits 1 on any failure. Correct.

Issues Found

None blocking.

Suggestions (non-blocking)

nit: the main run loop for t in \"${TESTS[@]}\" lacks the empty-array guard that is_quarantined and --list have. Under set -u+bash 3.2 a zero-test discovery would expand as unbound. Not reachable today (60+ tests) — but a one-line [ \"${#TESTS[@]}\" -gt 0 ] || { echo \"no tests discovered\"; exit 1; } would turn a silent-empty-pass into a loud failure, the safer direction for a gate.

Verdict

APPROVED (Rex first-pass). Approval marker written for the merge gate. Human approver does the second review before merge.

🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 44604ffb0195944d47a57135af6b4f6922ec95d0

github-actions Bot added github-issue has-ticket labels Jun 6, 2026

atlas-apex commented Jun 6, 2026

View reviewed changes

atlas-apex merged commit 1027dbc into dev Jun 6, 2026
8 checks passed

atlas-apex deleted the ci/GH-526-gate-hook-tests branch June 6, 2026 11:59

atlas-apex mentioned this pull request Jun 6, 2026

[CI] Gate the hook test suite in CI #526

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(#526): gate the hook test suite in CI#527

ci(#526): gate the hook test suite in CI#527
atlas-apex merged 2 commits into
devfrom
ci/GH-526-gate-hook-tests

atlas-apex commented Jun 6, 2026

Uh oh!

atlas-apex left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atlas-apex commented Jun 6, 2026

Summary

Status / how I'm finalizing the quarantine

Testing

Glossary

Uh oh!

atlas-apex left a comment

Choose a reason for hiding this comment

Code Review: PR #527 — APPROVED (posted as comment; cannot self-approve own PR)

Summary

Checklist Results

Verification performed (against PR HEAD, not local tree)

Runner correctness (focus #1)

Issues Found

Suggestions (non-blocking)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants