fix(#381): pin ops-root via CLAUDE_CODE_SESSION_ID SessionStart hook by atlas-apex · Pull Request #385 · me2resh/apexyard

atlas-apex · 2026-05-23T18:58:11Z

Summary

Pin the ops-fork root at SessionStart — a new SessionStart hook (pin-ops-root.sh) captures the launch-cwd ops root and writes it to ${APEXYARD_OPS_PIN_DIR:-$HOME/.claude/apexyard}/ops-root-<SESSION_ID> so later resolution can use it. The motivating incident: the code-reviewer agent (Rex) clones the fork into /tmp for inspection, cds into the clone, then walks up looking for ops-fork anchors — the throwaway clone satisfies the anchors, so the <pr>-rex.approved marker landed in /tmp and the merge gate (running from the real ops fork) couldn't find it. Hit twice in one session per the bug report; adopters were mirroring the marker locally as a workaround.
Refactor _lib-ops-root.sh to pin-first, walk-up-fallback — resolve_ops_root now consults the pin (when CLAUDE_CODE_SESSION_ID is set and the escape hatch is off), re-validates the pinned path against the anchor conditions so stale pins self-heal, and falls back to the new resolve_ops_root_walk function (the pure walk-up, extracted unchanged from the previous implementation). No-regression guarantee: when the pin is absent for any reason, behaviour matches the previous lib exactly.
Add the agent-side prose discipline — .claude/agents/code-reviewer.md now tells Rex to resolve MARKER_HOME ONCE at review start, before any cd / git clone / gh pr checkout. The walk-up code block remains as the resolution method and now also recognises the v2 .apexyard-fork marker (it was checking only the legacy v1 pair). This is the prose layer; the SessionStart pin is the mechanical safety net.
Spaced-path safe — the pin file is written with printf '%s\n' "$path" and read with IFS= read -r, so paths containing spaces survive the round-trip intact. Regression test covers /tmp/test space/ops fork.
Escape hatches surfaced explicitly — APEXYARD_OPS_DISABLE_PIN=1 forces walk-up; APEXYARD_OPS_PIN_DIR=<dir> overrides the pin directory; unsetting CLAUDE_CODE_SESSION_ID skips the pin lookup. All documented in the lib header.

Testing

New tests at .claude/hooks/tests/test_resolve_ops_root_pin.sh — 8/8 PASS covering pin hit, stale pin, escape hatch, walk-only function, spaced path, no session id, hook write, hook no-op
Existing .claude/hooks/tests/test_ops_root.sh — 8/8 PASS (no regression on the original walk-up behaviour)
All other tests touching _lib-ops-root.sh PASS: test_agent_routing_sync_and_drift.sh (8/8), test_block_unreviewed_merge.sh (13/13), test_check_jq_installed.sh (5/5), test_link_custom_skills.sh (7/7), test_require_skill_for_issue_create.sh (18/18), test_subpack_extraction.sh (PASS), test_split_portfolio_v2_migration.sh (25/25), test_workspace_tracker_resolution.sh (7/7)
jq . .claude/settings.json — valid JSON
bash -n syntax check passes on all three modified .sh files
shellcheck -S warning — could not run (shellcheck not installed on the spawn machine; flagging for parent to run)

Closes #381

Glossary

Term	Definition
ops fork root	The directory holding the operator's apexyard fork — identified by either the v2 `.apexyard-fork` marker or the legacy v1 pair `onboarding.yaml` + `apexyard.projects.yaml`. Framework session state lives at `<ops_fork_root>/.claude/session/`.
`MARKER_HOME`	The path the `code-reviewer` agent (Rex) writes its `<pr>-rex.approved` approval marker to. Must equal the ops fork root, not the project clone or a `/tmp` inspection clone, or the merge gate cannot see the marker.
pin file	A small text file at `${APEXYARD_OPS_PIN_DIR:-$HOME/.claude/apexyard}/ops-root-<SESSION_ID>` containing the ops-fork-root path captured at SessionStart. Written by `pin-ops-root.sh`; read by `resolve_ops_root` before falling back to walk-up.
`CLAUDE_CODE_SESSION_ID`	The per-session identifier Claude Code exports to hooks. Used here as the pin-file key so concurrent sessions on the same machine don't clobber each other's pins.
stale pin	A pin file whose path no longer satisfies the ops-fork anchor conditions (e.g. the dir was deleted or the marker file removed). `resolve_ops_root` detects this on re-validation and falls through to the walk-up — the pin self-heals on the next SessionStart.
escape hatch	An environment variable that disables the pin lookup. `APEXYARD_OPS_DISABLE_PIN=1` makes `resolve_ops_root` behave exactly like the pre-#381 walk-up-only implementation, useful for debugging or for adopters who hit an unexpected interaction.

The code-reviewer agent (Rex) was resolving MARKER_HOME by walking up for onboarding.yaml + apexyard.projects.yaml. A full ops-fork clone in /tmp ALSO has those anchors, so the walk resolved to the throwaway clone; the <pr>-rex.approved marker landed there and block-unreviewed-merge.sh (running from the real ops fork) couldn't find it. Adopters worked around this by mirroring the marker locally on Rex's behalf (auto-memory feedback_rex_marker_sandbox_workaround). Two-layer fix: 1. Prose discipline in .claude/agents/code-reviewer.md — Rex resolves MARKER_HOME ONCE at review start, before any cd / git clone / gh pr checkout. 2. Mechanical pin via SessionStart hook + lib refactor: - pin-ops-root.sh writes the launch-cwd ops root to $HOME/.claude/apexyard/ops-root-<SESSION_ID> (overridable via APEXYARD_OPS_PIN_DIR). Idempotent. - resolve_ops_root() now prefers the pin (re-validated against anchor conditions, so a stale pin self-heals), falling back to resolve_ops_root_walk (the pure walk-up, extracted from the previous implementation). - Escape hatch: APEXYARD_OPS_DISABLE_PIN=1 forces walk-up. No-regression guarantee: if the pin is absent for any reason, resolve_ops_root falls back to the existing walk-up. Worst case is no improvement, never worse. Spaced-path safety: pin file is read with IFS= read -r and written with printf '%s\n' "$path". Regression test covers a path with spaces. Tests in .claude/hooks/tests/test_resolve_ops_root_pin.sh cover: pin hit, stale pin, escape hatch, walk-only function, spaced path, no session id. N/N PASS. Closes #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #385 adds a new SessionStart hook (pin-ops-root.sh) but the fan-out agent missed refreshing the site/ marketing copy that quotes the hook count. CI surfaced 16 drift sites via site-counts drift detection (test_site_counts.sh) — fixing them all in one commit so the PR goes green without a separate clean-up cycle. Substitutions (17 locations across 7 files — one more than the CI report because llms-full.txt:133 uses "31 mechanical enforcement scripts" which the test's regex doesn't enumerate explicitly but is still about the same count): - "31 hooks" → "32 hooks" - "31 shell hooks" → "32 shell hooks" - "31 shell scripts" → "32 shell scripts" - "31 shell gates" → "32 shell gates" - "31 mechanical gates" → "32 mechanical gates" - "31 mechanical enforcement scripts" → "32 mechanical enforcement scripts" Files touched: site/index.html, site/architecture.html, site/index.md.gen, site/architecture.md.gen, site/llms.txt, site/llms-full.txt, site/skill.md. Verified locally: bash .claude/hooks/tests/test_site_counts.sh reports PASS — actual hook count (32) now matches all surface refs. Refs #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

atlas-apex

Code Review: PR #385

Commit: 5eab6a51e85b18dc05b4107a50e77d861f8a9c97

Summary

Fixes the P1 marker-misplacement bug (apexyard#381) where Rex's MARKER_HOME walk-up could resolve to a /tmp inspection clone whose anchors happen to match the operator's real ops fork. Two layers land together: prose discipline in code-reviewer.md (resolve MARKER_HOME ONCE before any cd/clone) plus a mechanical safety net (pin-ops-root.sh SessionStart hook + pin-first / walk-up-fallback in _lib-ops-root.sh, keyed by CLAUDE_CODE_SESSION_ID). A drive-by improvement adds v2 .apexyard-fork recognition to the agent prose, bringing the prose into alignment with the lib's v2 awareness. The site-counts fix-up commit bumps hook count 31→32 across the seven marketing files.

Checklist Results

✅ Architecture & Design: Pass — clean extraction of resolve_ops_root_walk, pin-first override pattern preserves no-regression guarantee, escape hatches surfaced explicitly
✅ Code Quality: Pass — shellcheck CI green, lib header documents the pin-first strategy + spaced-path safety + every escape hatch
✅ Testing: Pass — 8/8 cases in the new test file, covers all 6 cases from the issue body + 2 hook-functional cases; spaced-path regression case is the operator-flagged one
✅ Security: N/A — no auth/crypto/secret surface; pin file is under $HOME/.claude/apexyard/ (per-user), session-keyed
✅ Performance: Pass — single IFS= read -r + one [ -d ] + two [ -f ] checks per resolution; falls through identically to the pre-#381 lib on cache miss
✅ PR Description & Glossary: Pass — narrative summary, Testing section with new + regression breakdown, 6-term Glossary, Closes #381
✅ Summary Bullet Narrative: Pass — every bullet explains what + why, no label-only shapes
✅ Technical Decisions (AgDR):N/A — the pin-first strategy with self-healing stale-pin re-validation is a non-trivial architectural choice and would ordinarily warrant an AgDR; however, the design shape was settled in issue #381's "Investigation Notes — fix shape" section (operator-authored, two-layer prescription including the exact resolve_ops_root_walk extraction + escape hatch naming). This PR is the implementation of that design, not a fresh technical call — AgDR-required gate accepts the issue body as the decision artefact in cases where the operator has pre-specified the shape. Flagging here for visibility; no blocker.
✅ Adopter Handbooks: N/A — no handbooks/architecture/ or handbooks/general/ files exist in this fork; no language handbooks loaded (diff is .sh + .md + .json + .html, no .ts/.py/.go/.rs)

Issues Found

None blocking. Verified each prompted check:

Diff correctness across all four touchpoints — _lib-ops-root.sh (+94/-3), pin-ops-root.sh (+90 new), settings.json (+4 wiring), code-reviewer.md (+6/-1 prose). All touchpoints internally consistent and cross-referenced.
Walk-up function preservation — resolve_ops_root_walk is the pre-refactor resolve_ops_root body verbatim. Recognises both v2 .apexyard-fork marker and v1 onboarding.yaml + apexyard.projects.yaml pair, identical anchor-condition order. No-regression guarantee holds: any caller hitting an absent / disabled / unset-session pin path drops into this function unchanged.
Pin self-heal — _ops_root_anchor_valid re-checks the pinned path against the same anchor conditions; a pinned dir that no longer satisfies anchors (deleted, marker removed, dir moved) is silently rejected and walk-up runs. Case 2 in the test file proves this with a mktemp -d "stale" target that has no anchors.
Test coverage — 8 cases all green:
- Case 1 (pin hit), Case 2 (stale pin), Case 3 (escape hatch), Case 4 (walk-only function), Case 5 (spaced path /tmp/test space/ops fork), Case 6 (no session id) — match the 6 cases in the issue body 1:1
- Cases 7-8 add hook-functional coverage (pin file write from launch cwd, silent no-op without session id)
- The spaced-path case proves the IFS= read -r choice resists the tr -d '[:space:]' regression the operator flagged
v2 .apexyard-fork recognition — added in both:
- lib (line 47 of the post-refactor _lib-ops-root.sh, also in the new _ops_root_anchor_valid validator at line 111 of the diff)
- agent prose (code-reviewer.md line 19 of the diff, before the v1 pair check, matching the lib's preference order)
settings.json — JSON parses cleanly (jq . verified locally). New pin-ops-root.sh wrapper sits first in the SessionStart[0].hooks array — correct placement; the pin must be written before any other SessionStart hook that might transitively resolve ops_root. Wrapper shape matches the canonical v2-aware walk-up convention from AgDR-0041 (recognises both .apexyard-fork v2 marker and onboarding.yaml v1 anchor at wrapper level — see AgDR-0041 § "Decision" point 2 on wrapper-level laxness).
site/ count fix-up — verified: exactly 17 substitutions across the 7 expected files (architecture.html, architecture.md.gen, index.html × 6, index.md.gen × 3, llms-full.txt × 3, llms.txt × 2, skill.md × 1 — total 17). All substitutions are semantically about hook count, no false positives.
PR body quality — narrative Summary bullets (each bullet pairs "what changed" with "why it matters / why it's safe / what regression it covers"), Testing section breaks out the new test file (test_resolve_ops_root_pin.sh, 8/8) separately from regression tests (test_ops_root.sh + 7 other suites), 6-term Glossary covers ops fork root, MARKER_HOME, pin file, CLAUDE_CODE_SESSION_ID, stale pin, escape hatch, Closes #381 present.
CI status — all 6 checks GREEN: lychee (re-ran cleanly after the prior transient), markdownlint-cli2, Verify Ticket ID, ShellCheck, extract-subpacks, site-counts drift detection.

Handbook Findings

No handbooks loaded — fork carries no handbooks/architecture/, handbooks/general/, or language-specific handbooks. Section omitted.

Suggestions

Two minor observations, neither blocking:

shellcheck -S warning couldn't be run on the spawn machine (called out in the PR body Testing checklist). CI's shellcheck .claude/hooks job passed — that's the load-bearing check; the spawn-machine gap is informational only.
Wrapper-level v1 detection laxness is preserved — the new pin-ops-root.sh wrapper accepts onboarding.yaml alone as the v1 anchor (per AgDR-0041 § "Decision" point 2), while _lib-ops-root.sh::resolve_ops_root_walk requires BOTH onboarding.yaml AND apexyard.projects.yaml. This is the documented contract — wrapper only needs to exec the hook, lib does the stricter check internally. No change needed; flagging in case future readers spot the asymmetry.

Verdict

APPROVED

The two-layer fix is exactly the shape the issue prescribed: prose discipline as the primary mechanism, mechanical pin-first resolution as the safety net for when prose discipline fails (which the operator's own session demonstrated, twice in 24h). The no-regression guarantee is structurally sound — resolve_ops_root_walk is the pre-refactor implementation verbatim, and every documented bypass condition (no session id, no pin dir, stale pin, escape hatch) falls through to it without touching cwd state. Test coverage matches the issue's prescription including the operator-flagged spaced-path regression.

🤖 Reviewed by Rex (Code Reviewer Agent)
📌 Reviewed commit: 5eab6a51e85b18dc05b4107a50e77d861f8a9c97

…385) * fix(#381): pin ops-root via CLAUDE_CODE_SESSION_ID SessionStart hook The code-reviewer agent (Rex) was resolving MARKER_HOME by walking up for onboarding.yaml + apexyard.projects.yaml. A full ops-fork clone in /tmp ALSO has those anchors, so the walk resolved to the throwaway clone; the <pr>-rex.approved marker landed there and block-unreviewed-merge.sh (running from the real ops fork) couldn't find it. Adopters worked around this by mirroring the marker locally on Rex's behalf (auto-memory feedback_rex_marker_sandbox_workaround). Two-layer fix: 1. Prose discipline in .claude/agents/code-reviewer.md — Rex resolves MARKER_HOME ONCE at review start, before any cd / git clone / gh pr checkout. 2. Mechanical pin via SessionStart hook + lib refactor: - pin-ops-root.sh writes the launch-cwd ops root to $HOME/.claude/apexyard/ops-root-<SESSION_ID> (overridable via APEXYARD_OPS_PIN_DIR). Idempotent. - resolve_ops_root() now prefers the pin (re-validated against anchor conditions, so a stale pin self-heals), falling back to resolve_ops_root_walk (the pure walk-up, extracted from the previous implementation). - Escape hatch: APEXYARD_OPS_DISABLE_PIN=1 forces walk-up. No-regression guarantee: if the pin is absent for any reason, resolve_ops_root falls back to the existing walk-up. Worst case is no improvement, never worse. Spaced-path safety: pin file is read with IFS= read -r and written with printf '%s\n' "$path". Regression test covers a path with spaces. Tests in .claude/hooks/tests/test_resolve_ops_root_pin.sh cover: pin hit, stale pin, escape hatch, walk-only function, spaced path, no session id. N/N PASS. Closes #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#381): bump site/ hook count 31→32 for new pin-ops-root.sh PR #385 adds a new SessionStart hook (pin-ops-root.sh) but the fan-out agent missed refreshing the site/ marketing copy that quotes the hook count. CI surfaced 16 drift sites via site-counts drift detection (test_site_counts.sh) — fixing them all in one commit so the PR goes green without a separate clean-up cycle. Substitutions (17 locations across 7 files — one more than the CI report because llms-full.txt:133 uses "31 mechanical enforcement scripts" which the test's regex doesn't enumerate explicitly but is still about the same count): - "31 hooks" → "32 hooks" - "31 shell hooks" → "32 shell hooks" - "31 shell scripts" → "32 shell scripts" - "31 shell gates" → "32 shell gates" - "31 mechanical gates" → "32 mechanical gates" - "31 mechanical enforcement scripts" → "32 mechanical enforcement scripts" Files touched: site/index.html, site/architecture.html, site/index.md.gen, site/architecture.md.gen, site/llms.txt, site/llms-full.txt, site/skill.md. Verified locally: bash .claude/hooks/tests/test_site_counts.sh reports PASS — actual hook count (32) now matches all surface refs. Refs #381 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: me2resh <ahmed.abdelaliem@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added github-issue has-ticket labels May 23, 2026

atlas-apex commented May 23, 2026

View reviewed changes

atlas-apex merged commit 29bb5a4 into dev May 23, 2026
6 checks passed

atlas-apex deleted the fix/GH-381-pin-ops-root-session-id-impl branch May 23, 2026 20:38

atlas-apex mentioned this pull request May 23, 2026

[Bug] code-reviewer agent's approval marker can land in a /tmp inspection clone instead of the ops fork #381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#381): pin ops-root via CLAUDE_CODE_SESSION_ID SessionStart hook#385

fix(#381): pin ops-root via CLAUDE_CODE_SESSION_ID SessionStart hook#385
atlas-apex merged 2 commits into
devfrom
fix/GH-381-pin-ops-root-session-id-impl

atlas-apex commented May 23, 2026

Uh oh!

atlas-apex left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atlas-apex commented May 23, 2026

Summary

Testing

Glossary

Uh oh!

atlas-apex left a comment

Choose a reason for hiding this comment

Code Review: PR #385

Summary

Checklist Results

Issues Found

Handbook Findings

Suggestions

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants