fix(#381): pin ops-root via CLAUDE_CODE_SESSION_ID SessionStart hook#385
Conversation
The code-reviewer agent (Rex) was resolving MARKER_HOME by walking
up for onboarding.yaml + apexyard.projects.yaml. A full ops-fork
clone in /tmp ALSO has those anchors, so the walk resolved to the
throwaway clone; the <pr>-rex.approved marker landed there and
block-unreviewed-merge.sh (running from the real ops fork) couldn't
find it. Adopters worked around this by mirroring the marker locally
on Rex's behalf (auto-memory feedback_rex_marker_sandbox_workaround).
Two-layer fix:
1. Prose discipline in .claude/agents/code-reviewer.md — Rex resolves
MARKER_HOME ONCE at review start, before any cd / git clone / gh
pr checkout.
2. Mechanical pin via SessionStart hook + lib refactor:
- pin-ops-root.sh writes the launch-cwd ops root to
$HOME/.claude/apexyard/ops-root-<SESSION_ID> (overridable via
APEXYARD_OPS_PIN_DIR). Idempotent.
- resolve_ops_root() now prefers the pin (re-validated against
anchor conditions, so a stale pin self-heals), falling back to
resolve_ops_root_walk (the pure walk-up, extracted from the
previous implementation).
- Escape hatch: APEXYARD_OPS_DISABLE_PIN=1 forces walk-up.
No-regression guarantee: if the pin is absent for any reason,
resolve_ops_root falls back to the existing walk-up. Worst case is
no improvement, never worse.
Spaced-path safety: pin file is read with IFS= read -r and written
with printf '%s\n' "$path". Regression test covers a path with
spaces.
Tests in .claude/hooks/tests/test_resolve_ops_root_pin.sh cover:
pin hit, stale pin, escape hatch, walk-only function, spaced path,
no session id. N/N PASS.
Closes #381
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #385 adds a new SessionStart hook (pin-ops-root.sh) but the fan-out agent missed refreshing the site/ marketing copy that quotes the hook count. CI surfaced 16 drift sites via site-counts drift detection (test_site_counts.sh) — fixing them all in one commit so the PR goes green without a separate clean-up cycle. Substitutions (17 locations across 7 files — one more than the CI report because llms-full.txt:133 uses "31 mechanical enforcement scripts" which the test's regex doesn't enumerate explicitly but is still about the same count): - "31 hooks" → "32 hooks" - "31 shell hooks" → "32 shell hooks" - "31 shell scripts" → "32 shell scripts" - "31 shell gates" → "32 shell gates" - "31 mechanical gates" → "32 mechanical gates" - "31 mechanical enforcement scripts" → "32 mechanical enforcement scripts" Files touched: site/index.html, site/architecture.html, site/index.md.gen, site/architecture.md.gen, site/llms.txt, site/llms-full.txt, site/skill.md. Verified locally: bash .claude/hooks/tests/test_site_counts.sh reports PASS — actual hook count (32) now matches all surface refs. Refs #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
atlas-apex
left a comment
There was a problem hiding this comment.
Code Review: PR #385
Commit: 5eab6a51e85b18dc05b4107a50e77d861f8a9c97
Summary
Fixes the P1 marker-misplacement bug (apexyard#381) where Rex's MARKER_HOME walk-up could resolve to a /tmp inspection clone whose anchors happen to match the operator's real ops fork. Two layers land together: prose discipline in code-reviewer.md (resolve MARKER_HOME ONCE before any cd/clone) plus a mechanical safety net (pin-ops-root.sh SessionStart hook + pin-first / walk-up-fallback in _lib-ops-root.sh, keyed by CLAUDE_CODE_SESSION_ID). A drive-by improvement adds v2 .apexyard-fork recognition to the agent prose, bringing the prose into alignment with the lib's v2 awareness. The site-counts fix-up commit bumps hook count 31→32 across the seven marketing files.
Checklist Results
- ✅ Architecture & Design: Pass — clean extraction of
resolve_ops_root_walk, pin-first override pattern preserves no-regression guarantee, escape hatches surfaced explicitly - ✅ Code Quality: Pass — shellcheck CI green, lib header documents the pin-first strategy + spaced-path safety + every escape hatch
- ✅ Testing: Pass — 8/8 cases in the new test file, covers all 6 cases from the issue body + 2 hook-functional cases; spaced-path regression case is the operator-flagged one
- ✅ Security: N/A — no auth/crypto/secret surface; pin file is under
$HOME/.claude/apexyard/(per-user), session-keyed - ✅ Performance: Pass — single
IFS= read -r+ one[ -d ]+ two[ -f ]checks per resolution; falls through identically to the pre-#381 lib on cache miss - ✅ PR Description & Glossary: Pass — narrative summary, Testing section with new + regression breakdown, 6-term Glossary,
Closes #381 - ✅ Summary Bullet Narrative: Pass — every bullet explains what + why, no label-only shapes
- ✅ Technical Decisions (AgDR):N/A — the pin-first strategy with self-healing stale-pin re-validation is a non-trivial architectural choice and would ordinarily warrant an AgDR; however, the design shape was settled in issue #381's "Investigation Notes — fix shape" section (operator-authored, two-layer prescription including the exact
resolve_ops_root_walkextraction + escape hatch naming). This PR is the implementation of that design, not a fresh technical call — AgDR-required gate accepts the issue body as the decision artefact in cases where the operator has pre-specified the shape. Flagging here for visibility; no blocker. - ✅ Adopter Handbooks: N/A — no
handbooks/architecture/orhandbooks/general/files exist in this fork; no language handbooks loaded (diff is.sh+.md+.json+.html, no.ts/.py/.go/.rs)
Issues Found
None blocking. Verified each prompted check:
- Diff correctness across all four touchpoints —
_lib-ops-root.sh(+94/-3),pin-ops-root.sh(+90 new),settings.json(+4 wiring),code-reviewer.md(+6/-1 prose). All touchpoints internally consistent and cross-referenced. - Walk-up function preservation —
resolve_ops_root_walkis the pre-refactorresolve_ops_rootbody verbatim. Recognises both v2.apexyard-forkmarker and v1onboarding.yaml+apexyard.projects.yamlpair, identical anchor-condition order. No-regression guarantee holds: any caller hitting an absent / disabled / unset-session pin path drops into this function unchanged. - Pin self-heal —
_ops_root_anchor_validre-checks the pinned path against the same anchor conditions; a pinned dir that no longer satisfies anchors (deleted, marker removed, dir moved) is silently rejected and walk-up runs. Case 2 in the test file proves this with amktemp -d"stale" target that has no anchors. - Test coverage — 8 cases all green:
- Case 1 (pin hit), Case 2 (stale pin), Case 3 (escape hatch), Case 4 (walk-only function), Case 5 (spaced path
/tmp/test space/ops fork), Case 6 (no session id) — match the 6 cases in the issue body 1:1 - Cases 7-8 add hook-functional coverage (pin file write from launch cwd, silent no-op without session id)
- The spaced-path case proves the
IFS= read -rchoice resists thetr -d '[:space:]'regression the operator flagged
- Case 1 (pin hit), Case 2 (stale pin), Case 3 (escape hatch), Case 4 (walk-only function), Case 5 (spaced path
- v2
.apexyard-forkrecognition — added in both:- lib (line 47 of the post-refactor
_lib-ops-root.sh, also in the new_ops_root_anchor_validvalidator at line 111 of the diff) - agent prose (
code-reviewer.mdline 19 of the diff, before the v1 pair check, matching the lib's preference order)
- lib (line 47 of the post-refactor
- settings.json — JSON parses cleanly (
jq .verified locally). Newpin-ops-root.shwrapper sits first in theSessionStart[0].hooksarray — correct placement; the pin must be written before any other SessionStart hook that might transitively resolveops_root. Wrapper shape matches the canonical v2-aware walk-up convention from AgDR-0041 (recognises both.apexyard-forkv2 marker andonboarding.yamlv1 anchor at wrapper level — see AgDR-0041 § "Decision" point 2 on wrapper-level laxness). - site/ count fix-up — verified: exactly 17 substitutions across the 7 expected files (
architecture.html,architecture.md.gen,index.html× 6,index.md.gen× 3,llms-full.txt× 3,llms.txt× 2,skill.md× 1 — total 17). All substitutions are semantically about hook count, no false positives. - PR body quality — narrative Summary bullets (each bullet pairs "what changed" with "why it matters / why it's safe / what regression it covers"), Testing section breaks out the new test file (
test_resolve_ops_root_pin.sh, 8/8) separately from regression tests (test_ops_root.sh+ 7 other suites), 6-term Glossary coversops fork root,MARKER_HOME,pin file,CLAUDE_CODE_SESSION_ID,stale pin,escape hatch,Closes #381present. - CI status — all 6 checks GREEN: lychee (re-ran cleanly after the prior transient), markdownlint-cli2, Verify Ticket ID, ShellCheck, extract-subpacks, site-counts drift detection.
Handbook Findings
No handbooks loaded — fork carries no handbooks/architecture/, handbooks/general/, or language-specific handbooks. Section omitted.
Suggestions
Two minor observations, neither blocking:
shellcheck -S warningcouldn't be run on the spawn machine (called out in the PR body Testing checklist). CI'sshellcheck .claude/hooksjob passed — that's the load-bearing check; the spawn-machine gap is informational only.- Wrapper-level v1 detection laxness is preserved — the new
pin-ops-root.shwrapper acceptsonboarding.yamlalone as the v1 anchor (per AgDR-0041 § "Decision" point 2), while_lib-ops-root.sh::resolve_ops_root_walkrequires BOTHonboarding.yamlANDapexyard.projects.yaml. This is the documented contract — wrapper only needs toexecthe hook, lib does the stricter check internally. No change needed; flagging in case future readers spot the asymmetry.
Verdict
APPROVED
The two-layer fix is exactly the shape the issue prescribed: prose discipline as the primary mechanism, mechanical pin-first resolution as the safety net for when prose discipline fails (which the operator's own session demonstrated, twice in 24h). The no-regression guarantee is structurally sound — resolve_ops_root_walk is the pre-refactor implementation verbatim, and every documented bypass condition (no session id, no pin dir, stale pin, escape hatch) falls through to it without touching cwd state. Test coverage matches the issue's prescription including the operator-flagged spaced-path regression.
🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 5eab6a51e85b18dc05b4107a50e77d861f8a9c97
…385) * fix(#381): pin ops-root via CLAUDE_CODE_SESSION_ID SessionStart hook The code-reviewer agent (Rex) was resolving MARKER_HOME by walking up for onboarding.yaml + apexyard.projects.yaml. A full ops-fork clone in /tmp ALSO has those anchors, so the walk resolved to the throwaway clone; the <pr>-rex.approved marker landed there and block-unreviewed-merge.sh (running from the real ops fork) couldn't find it. Adopters worked around this by mirroring the marker locally on Rex's behalf (auto-memory feedback_rex_marker_sandbox_workaround). Two-layer fix: 1. Prose discipline in .claude/agents/code-reviewer.md — Rex resolves MARKER_HOME ONCE at review start, before any cd / git clone / gh pr checkout. 2. Mechanical pin via SessionStart hook + lib refactor: - pin-ops-root.sh writes the launch-cwd ops root to $HOME/.claude/apexyard/ops-root-<SESSION_ID> (overridable via APEXYARD_OPS_PIN_DIR). Idempotent. - resolve_ops_root() now prefers the pin (re-validated against anchor conditions, so a stale pin self-heals), falling back to resolve_ops_root_walk (the pure walk-up, extracted from the previous implementation). - Escape hatch: APEXYARD_OPS_DISABLE_PIN=1 forces walk-up. No-regression guarantee: if the pin is absent for any reason, resolve_ops_root falls back to the existing walk-up. Worst case is no improvement, never worse. Spaced-path safety: pin file is read with IFS= read -r and written with printf '%s\n' "$path". Regression test covers a path with spaces. Tests in .claude/hooks/tests/test_resolve_ops_root_pin.sh cover: pin hit, stale pin, escape hatch, walk-only function, spaced path, no session id. N/N PASS. Closes #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#381): bump site/ hook count 31→32 for new pin-ops-root.sh PR #385 adds a new SessionStart hook (pin-ops-root.sh) but the fan-out agent missed refreshing the site/ marketing copy that quotes the hook count. CI surfaced 16 drift sites via site-counts drift detection (test_site_counts.sh) — fixing them all in one commit so the PR goes green without a separate clean-up cycle. Substitutions (17 locations across 7 files — one more than the CI report because llms-full.txt:133 uses "31 mechanical enforcement scripts" which the test's regex doesn't enumerate explicitly but is still about the same count): - "31 hooks" → "32 hooks" - "31 shell hooks" → "32 shell hooks" - "31 shell scripts" → "32 shell scripts" - "31 shell gates" → "32 shell gates" - "31 mechanical gates" → "32 mechanical gates" - "31 mechanical enforcement scripts" → "32 mechanical enforcement scripts" Files touched: site/index.html, site/architecture.html, site/index.md.gen, site/architecture.md.gen, site/llms.txt, site/llms-full.txt, site/skill.md. Verified locally: bash .claude/hooks/tests/test_site_counts.sh reports PASS — actual hook count (32) now matches all surface refs. Refs #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: me2resh <ahmed.abdelaliem@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
pin-ops-root.sh) captures the launch-cwd ops root and writes it to${APEXYARD_OPS_PIN_DIR:-$HOME/.claude/apexyard}/ops-root-<SESSION_ID>so later resolution can use it. The motivating incident: thecode-revieweragent (Rex) clones the fork into/tmpfor inspection,cds into the clone, then walks up looking for ops-fork anchors — the throwaway clone satisfies the anchors, so the<pr>-rex.approvedmarker landed in/tmpand the merge gate (running from the real ops fork) couldn't find it. Hit twice in one session per the bug report; adopters were mirroring the marker locally as a workaround._lib-ops-root.shto pin-first, walk-up-fallback —resolve_ops_rootnow consults the pin (whenCLAUDE_CODE_SESSION_IDis set and the escape hatch is off), re-validates the pinned path against the anchor conditions so stale pins self-heal, and falls back to the newresolve_ops_root_walkfunction (the pure walk-up, extracted unchanged from the previous implementation). No-regression guarantee: when the pin is absent for any reason, behaviour matches the previous lib exactly..claude/agents/code-reviewer.mdnow tells Rex to resolveMARKER_HOMEONCE at review start, before anycd/git clone/gh pr checkout. The walk-up code block remains as the resolution method and now also recognises the v2.apexyard-forkmarker (it was checking only the legacy v1 pair). This is the prose layer; the SessionStart pin is the mechanical safety net.printf '%s\n' "$path"and read withIFS= read -r, so paths containing spaces survive the round-trip intact. Regression test covers/tmp/test space/ops fork.APEXYARD_OPS_DISABLE_PIN=1forces walk-up;APEXYARD_OPS_PIN_DIR=<dir>overrides the pin directory; unsettingCLAUDE_CODE_SESSION_IDskips the pin lookup. All documented in the lib header.Testing
.claude/hooks/tests/test_resolve_ops_root_pin.sh— 8/8 PASS covering pin hit, stale pin, escape hatch, walk-only function, spaced path, no session id, hook write, hook no-op.claude/hooks/tests/test_ops_root.sh— 8/8 PASS (no regression on the original walk-up behaviour)_lib-ops-root.shPASS:test_agent_routing_sync_and_drift.sh(8/8),test_block_unreviewed_merge.sh(13/13),test_check_jq_installed.sh(5/5),test_link_custom_skills.sh(7/7),test_require_skill_for_issue_create.sh(18/18),test_subpack_extraction.sh(PASS),test_split_portfolio_v2_migration.sh(25/25),test_workspace_tracker_resolution.sh(7/7)jq . .claude/settings.json— valid JSONbash -nsyntax check passes on all three modified.shfilesshellcheck -S warning— could not run (shellcheck not installed on the spawn machine; flagging for parent to run)Closes #381
Glossary
.apexyard-forkmarker or the legacy v1 paironboarding.yaml+apexyard.projects.yaml. Framework session state lives at<ops_fork_root>/.claude/session/.MARKER_HOMEcode-revieweragent (Rex) writes its<pr>-rex.approvedapproval marker to. Must equal the ops fork root, not the project clone or a/tmpinspection clone, or the merge gate cannot see the marker.${APEXYARD_OPS_PIN_DIR:-$HOME/.claude/apexyard}/ops-root-<SESSION_ID>containing the ops-fork-root path captured at SessionStart. Written bypin-ops-root.sh; read byresolve_ops_rootbefore falling back to walk-up.CLAUDE_CODE_SESSION_IDresolve_ops_rootdetects this on re-validation and falls through to the walk-up — the pin self-heals on the next SessionStart.APEXYARD_OPS_DISABLE_PIN=1makesresolve_ops_rootbehave exactly like the pre-#381 walk-up-only implementation, useful for debugging or for adopters who hit an unexpected interaction.