Skip to content

ci(#526): gate the hook test suite in CI#527

Merged
atlas-apex merged 2 commits into
devfrom
ci/GH-526-gate-hook-tests
Jun 6, 2026
Merged

ci(#526): gate the hook test suite in CI#527
atlas-apex merged 2 commits into
devfrom
ci/GH-526-gate-hook-tests

Conversation

@atlas-apex

Copy link
Copy Markdown
Collaborator

Summary

  • Turns the test corpus into an actual gate — ~65 test_*.sh suites assert the mechanical-enforcement layer (merge gate, ticket-first + per-worktree tiers, onboarding/secrets/leak guards, validators, portfolio paths), but only 2 ran in CI; a hook regression shipped green. This adds a glob-discovery runner + a tests.yml workflow that runs the whole suite on every PR and on push to dev/main, failing the job on any failure.
  • bin/run-hook-tests.sh — discovers every test_*.sh / *.test.sh under .claude/**/tests/, runs each with a per-test timeout, prints PASS/FAIL/SKIP, exits non-zero on any failure. Bash-3.2-portable so it's reusable locally (bash bin/run-hook-tests.sh). A documented QUARANTINE array skips (and logs) tests that genuinely can't run headless — never a silent drop.
  • .github/workflows/tests.ymlubuntu-latest, installs jq, sets a git identity (sandboxes do git init/commit), runs the runner.
  • Decision + quarantine policy in AgDR-0067.

Status / how I'm finalizing the quarantine

A local macOS run showed 52/65 passing; the 12 failures were a mix of macOS /private/var symlink artifacts (which pass on Linux) and genuinely stale tests. Rather than guess, this PR runs everything so the PR's own tests check is the Linux oracle. I'll then:

So the tests check on this PR may be red on the first run by design — I'll drive it green before requesting merge.

Testing

  1. bash bin/run-hook-tests.sh --list → discovers 65 tests (verified locally, bash 3.2).
  2. bash bin/run-hook-tests.sh locally → 52 pass / 12 fail (macOS); failures triaged via this PR's Linux tests check.
  3. python3 yaml.safe_load on tests.yml + shellcheck -S error bin/run-hook-tests.sh → clean.
  4. Pre-merge smoke: a deliberately-broken hook will be confirmed to make the job go red (AC), and the final tests check will be green.

Refs #526


Glossary

Term Definition
Discovery runner A script that finds test files by glob and runs them all, instead of a hardcoded list that drifts.
Quarantine An explicit, logged skip-list for tests that can't run headless or are known-failing-and-tracked — visible, not silent.
Gated test A test whose failure fails CI (blocks merge), vs an advisory test that only runs if someone remembers.
Linux oracle Using the actual CI environment to determine the true pass/fail set, free of local macOS artifacts.

- bin/run-hook-tests.sh: glob-discovery runner (bash 3.2-portable), per-test
  timeout, fail-on-any, documented QUARANTINE skip-list. Reusable locally.
- .github/workflows/tests.yml: runs the suite on PR + push to dev/main with jq
  installed and a git identity (test sandboxes git init/commit).
- Turns ~65 advisory test_*.sh into an actual regression gate (only 2 ran in CI
  before). Decision + quarantine policy in AgDR-0067.

First CI run is the Linux oracle for the real failing set (macOS /private/var
symlink noise excluded); quarantine list + any in-scope fixes follow.

Refs #526

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Linux CI oracle showed 59/65 pass. Of the 6 failures (all pre-existing on dev,
not caused by the gate):
- FIXED test_setup_reset.sh — pointed the template source at onboarding.example.yaml
  (my #517 untracked onboarding.yaml; this test still read the old path)
- FIXED the /handover CLAUDE.md skill-table row (35→23 words, ≤25 budget)
- QUARANTINED 5 (md-to-pdf network/chromium; handover-clone-prompt stale spec;
  agent-routing case-2 drift; harnessability 1/14 case; token-efficiency residual
  doc-hygiene drift) — each logged with a reason, tracked in #528 to fix + un-quarantine

The other 6 macOS-only failures in local triage were /private/var symlink
artifacts — they pass on Linux (confirmed by the first CI run).

Refs #526

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@atlas-apex atlas-apex left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #527 — APPROVED (posted as comment; cannot self-approve own PR)

Commit: 44604ffb0195944d47a57135af6b4f6922ec95d0

Summary

Turns the ~65-script mechanical-enforcement test corpus into an actual CI gate. Adds a bash-3.2-portable glob-discovery runner (bin/run-hook-tests.sh), a tests.yml workflow (PR + push to dev/main), AgDR-0067 (decision + quarantine-as-ratchet policy), and two in-scope fixes to land the gate green. The PR's own hook test suite check is GREEN at HEAD (PASS=60, FAIL=0, SKIP=5).

Checklist Results

  • Architecture & Design: Pass
  • Code Quality: Pass
  • Testing: Pass (this PR is the testing infrastructure)
  • Security: Pass (permissions: contents: read, minimal scope)
  • Performance: Pass (per-test 120s cap; concurrency cancel-in-progress)
  • PR Description & Glossary: Pass
  • Summary Bullet Narrative: Pass (narrative, each with rationale)
  • Technical Decisions (AgDR):Pass (AgDR-0067 present + linked; Refs #526)
  • Adopter Handbooks: N/A (no migration/TS/domain code in a shell-only diff)

Verification performed (against PR HEAD, not local tree)

  1. CI green at HEADhook test suite check = success (PASS=60/FAIL=0/SKIP=5). The gate fails-on-real-failures by design: set -uo pipefail (not set -e, so failures aggregate rather than abort the loop), rc=$? captured, exit 1 when fail>0.
  2. Tracking issue #528 is real + well-formed — OPEN, lists all 5 quarantined tests with root causes and an AC that drains QUARANTINE to empty. A genuine ratchet, not a graveyard.
  3. Quarantine legitimacy spot-checkedtest_token_efficiency_wave1.sh asserts a 200-char SKILL-description hard cap (matches cited "plan-initiative >200 chars" drift); test_handover_clone_prompt.sh asserts pre-restructure clone-prompt strings ([y / n / later], "Offer the clone-first") which the SKILL has since moved. Both genuine pre-existing drift, not caused by this PR. All 5 SKIP-logged with cited reasons — visible, never silent.
  4. test_setup_reset fix aligns to #517 — at PR HEAD onboarding.yaml is NOT tracked (404 via contents API), onboarding.example.yaml exists (4640 bytes) with the \"Your Company Name\" placeholder, and /setup --reset does cp onboarding.example.yaml onboarding.yaml (SKILL.md L80). Test now reads the correct source.
  5. CLAUDE.md /handover trim accurate — 35→24 words; preserves the three substantive facts (5-dimension scoring, checklist doc-pick, Next-Steps ticket offer). No information loss.

Runner correctness (focus #1)

  • bash 3.2 portable — while IFS= read + < <(...) instead of mapfile; POSIX param expansions. Good.
  • Empty-QUARANTINE guard under set -uis_quarantined short-circuits on ${#QUARANTINE[@]} -gt 0 before iterating, so draining the array in #528 won't trip an unbound-array expansion. Forward-looking + correct.
  • timeout fallback — timeoutgtimeout probe, empty default, intentional unquoted word-split (SC2086 disabled). Correct.
  • exit codes — aggregates, prints failed list, exits 1 on any failure. Correct.

Issues Found

None blocking.

Suggestions (non-blocking)

  • nit: the main run loop for t in \"${TESTS[@]}\" lacks the empty-array guard that is_quarantined and --list have. Under set -u+bash 3.2 a zero-test discovery would expand as unbound. Not reachable today (60+ tests) — but a one-line [ \"${#TESTS[@]}\" -gt 0 ] || { echo \"no tests discovered\"; exit 1; } would turn a silent-empty-pass into a loud failure, the safer direction for a gate.

Verdict

APPROVED (Rex first-pass). Approval marker written for the merge gate. Human approver does the second review before merge.


🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 44604ffb0195944d47a57135af6b4f6922ec95d0

@atlas-apex atlas-apex merged commit 1027dbc into dev Jun 6, 2026
8 checks passed
@atlas-apex atlas-apex deleted the ci/GH-526-gate-hook-tests branch June 6, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants