Skip to content

v1.25.1.0 fix: resolve build CLI path#6

Merged
anbangr merged 3 commits into
mainfrom
codex/fix-build-cli-resolution
May 2, 2026
Merged

v1.25.1.0 fix: resolve build CLI path#6
anbangr merged 3 commits into
mainfrom
codex/fix-build-cli-resolution

Conversation

@anbangr

@anbangr anbangr commented May 1, 2026

Copy link
Copy Markdown
Owner

Summary

Build CLI resolution

  • Added a host-aware _GSTACK_BUILD_CLI resolver to /build so spawned Claude/Codex shells can launch gstack-build even when interactive PATH setup is missing.
  • Registered the BUILD_CLI_CANDIDATES template resolver and generated Claude/Codex-safe candidate paths.
  • Replaced bare gstack-build launch/resume commands with the resolved executable.

Documentation and release metadata

  • Documented manual PATH setup separately from the /build resolver order in build/README.md and build/orchestrator/README.md.
  • Bumped VERSION and package.json to 1.25.1.0 and added the changelog entry.

Test Coverage

Resolver behavior is covered by regression tests in test/gen-skill-docs.test.ts:

  • Claude/source build skills contain _GSTACK_BUILD_CLI, command -v gstack-build, and resolved launch commands.
  • Codex generated build skill contains the same resolver behavior.
  • Tests reject regressions to bare gstack-build "$_PLAN_FILE" style launches.

Coverage gate: PASS for changed code paths. The changed resolver path has direct regression coverage and generated host-output freshness checks.

Pre-Landing Review

Pre-Landing Review: No issues found.

Design Review

No frontend files changed — design review skipped.

Eval Results

No app prompt/eval suite matched the ship eval patterns — evals skipped. Generated skill behavior was verified through gen-skill-docs regression coverage and full repo tests.

Scope Drift

Scope Check: CLEAN

Intent: Fix /build failures when the build CLI is not available on spawned-shell PATH, including Claude skill output.

Delivered: /build now resolves an executable through env, PATH, host-specific setup paths, and repo-local bin/gstack-build; docs and regression tests cover the behavior.

Plan Completion

No plan file detected — skipped.

Verification Results

No dev-server verification applies to this backend/docs/test change.

TODOS

No TODO items completed in this PR.

Documentation

Updated build/README.md and build/orchestrator/README.md to explain GSTACK_BUILD_CLI, Claude/Codex install paths, and resolver order.

Test plan

  • bun test
  • bun run build
  • bun run skill:check
  • bun test test/gen-skill-docs.test.ts test/ship-version-sync.test.ts --timeout 60000

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

@anbangr anbangr merged commit 286ff93 into main May 2, 2026
@anbangr anbangr deleted the codex/fix-build-cli-resolution branch May 2, 2026 00:03
anbangr added a commit that referenced this pull request May 3, 2026
…typed sentinel

Three failure modes consolidated:

1. The trigger condition was substring matching against a hard-coded
   English message in phase-runner.ts. Any rephrasing — for clarity,
   localization, or just an unrelated edit — would silently disable
   BLOCKED.md production with no compile-time signal. Export
   CODEX_CONVERGENCE_FAILURE_REASON_PREFIX + isCodexConvergenceFailure()
   from phase-runner.ts; both producer and consumer reference the
   constant. A future rephrasing now requires touching the constant
   too, which the type system surfaces.

2. BLOCKED.md was overwritten on every convergence failure. Two
   concrete losses: (a) prior phase's findings clobbered when a
   second phase fails; (b) parallel-phases mode (already designed
   via parallel-planner.ts) would race-clobber across phases. Switch
   to per-phase filename: BLOCKED-phase-{N}.md. The previous
   convergence failure's report stays around for triage.

3. BLOCKED.md was not in .gitignore. A user running `git add .`
   would ship the file to the remote — including the embedded
   reviewer findings, which can contain LLM output and excerpts of
   prior diffs. Add ensureBlockedGitignored(repoRoot) that idempotently
   appends `BLOCKED*.md` to project .gitignore, recognizing common
   pre-existing equivalent patterns (BLOCKED.md, /BLOCKED*.md,
   BLOCKED-phase-*.md) so it doesn't double-write.

Bonus hardening: wrap the BLOCKED.md write in try/catch. A write
failure (existing as directory or symlink, disk full, permissions)
must not mask the underlying phase failure that the FAIL handler
is reporting.

12 tests in blocked-md.test.ts pin sentinel matching (rejects
substring-buried false positives, rejects unrelated FAIL reasons)
and gitignore-helper behavior (idempotent across runs, recognizes
existing equivalent patterns, comment-line handling, trailing-newline
preservation when appending to a file without one).

Caught by /review post-landing pass: maintainability M8, claude
adversarial A4/A10, codex adversarial #6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
anbangr pushed a commit that referenced this pull request May 7, 2026
… rename (garrytan#1351)

* feat: gstack-gbrain-mcp-verify helper for remote MCP probe

Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize,
classifies failures into NETWORK / AUTH / MALFORMED with one-line
remediation hints, and runs a tools/list capability probe to detect
sources_add MCP support (forward-compat for when gbrain ships URL ingest).

Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set
both 'application/json' AND 'text/event-stream' in Accept; that gotcha
costs 10 minutes of debugging when missed (regression-tested).

Live-verified against wintermute (gbrain v0.27.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gstack-artifacts-init + gstack-artifacts-url helpers

artifacts-init replaces brain-init with provider choice (gh / glab /
manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in
~/.gstack-artifacts-remote.txt, and a "send this to your brain admin"
hookup printout. Always prints the command, never auto-executes — gbrain
v0.26.x has no admin-scope MCP probe (codex Finding #3).

artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers
don't each string-mangle (codex Finding #10). The remote-conflict check in
artifacts-init compares at the canonical level so re-running with HTTPS
input doesn't trip on a stored SSH URL for the same logical repo.

The "URL form not supported" branch prints a two-line clone-then-path
form for gbrain v0.26.x; the supported branch is a one-liner with --url
ready for when gbrain ships URL ingest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote

Adds two new fields to detect's JSON output:

- gbrain_mcp_mode: local-stdio | remote-http | none
  Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json
  → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves
  the file format, the first two tiers absorb it.

- gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt
  Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration
  window so detect doesn't return empty between upgrade and migration.

Existing detect tests still pass (15/15). New 19 tests cover every fallback
tier independently, plus a schema regression for /sync-gbrain compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Path 4 (remote MCP) + artifacts rename

Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it
as an HTTP-transport MCP without needing a local gbrain CLI install. The
flow:

- Step 2 gains a fourth option (Remote gbrain MCP)
- Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify
  via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier)
- Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio
  branch all skip on Path 4
- Step 5a adds an HTTP+bearer registration form: claude mcp add
  --transport http --header "Authorization: Bearer ..."
- Step 7 renamed "session memory sync" → "artifacts sync" and now calls
  gstack-artifacts-init (which always prints the brain-admin hookup
  command — no auto-execute, codex Finding #3)
- Step 8 CLAUDE.md block branches: remote-http includes URL + server
  version (never the token); local-stdio keeps engine + config-file
- Step 9 smoke test on Path 4 prints the curl-equivalent for
  post-restart verification (MCP tools aren't visible mid-session)
- Step 10 verdict block has separate templates per mode

Idempotency: re-running with gbrain_mcp_mode=remote-http already in
detect output skips Step 2 entirely and goes to verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep)

Hard rename, no dual-read alias (codex Finding D4). The on-disk migration
script (Phase C, separate commit) renames the config key in users'
~/.gstack/config.yaml and any CLAUDE.md blocks.

Touched call sites:
- bin/gstack-config defaults + validation + list/defaults output
- bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted
  with the same name for downstream-tool compat; reads new key)
- bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall
- bin/gstack-timeline-log (comment ref)
- scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key,
  branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC:
  remote-mode (managed by brain server <host>)" instead of the local
  mode/queue/last_push line (codex Finding #11)
- bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read
  ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback
  during the migration window
- bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local
  paths, file://, self-hosted gitea) so test infrastructure and unusual
  remotes work without canonicalization
- test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init
- test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys
- test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode
  probe in the preamble resolver
- health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line

Hard delete:
- bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0)
- test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after artifacts-sync rename

Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md
files reflect the renamed config key (gbrain_sync_mode →
artifacts_sync_mode), the renamed remote-helper file
(~/.gstack-artifacts-remote.txt with brain fallback), the renamed init
script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode
status line that fires when a remote-http MCP is registered.

Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match
the regenerated default-ship output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename

Journaled, interruption-safe migration. Six steps, each writes to
~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes
from the next un-done step. On final success, journal is replaced by
~/.gstack/.migrations/v1.27.0.0.done.

Steps:
1. gh_repo_renamed       gh/glab repo rename gstack-brain-$USER →
                         gstack-artifacts-$USER (idempotent: detects
                         already-renamed and skips)
2. remote_txt_renamed    mv ~/.gstack-brain-remote.txt → artifacts file,
                         rewriting URL path to match the new repo name
3. config_key_renamed    sed -i in ~/.gstack/config.yaml flips
                         gbrain_sync_mode → artifacts_sync_mode
4. claude_md_block       sed flips "- Memory sync:" → "- Artifacts sync:"
                         in cwd CLAUDE.md and ~/.gstack/CLAUDE.md
5. sources_swapped       gbrain sources add NEW (verify) → remove OLD
                         (codex Finding #6: add-before-remove ordering,
                         no downtime window). On remote-MCP mode, prints
                         commands for the brain admin instead of executing.
6. done                  touchfile + delete journal

User opt-out: any "n" or "skip-for-now" answer at the initial prompt
writes a marker file that prevents re-prompting; user can re-invoke
via /setup-gbrain --rerun-migration.

11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent
re-run, journal-resume mid-flight, remote-MCP print-only path,
add-before-remove ordering verification, add-fail → old source stays
registered, CLAUDE.md field rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings #1, #8, #9, #12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename

Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump
guidance: ~1500 line net change including a new path in /setup-gbrain,
two new bin helpers, a journaled migration, 59 new tests, and a config
key rename across the codebase).

CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain →
artifacts rename, the journaled migration, the verify-helper error
classifier, the artifacts-init multi-host provider choice. Includes
the canonical Garry-voice headline + numbers table + audience close
per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: demote setup-gbrain Path 4 E2E to periodic-tier

The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and
skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic —
the model interprets "follow Path 4 only" prompts flexibly and can
skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which
makes the gate-tier assertions flaky.

The deterministic gate coverage for Path 4 is in
test/setup-gbrain-path4-structure.test.ts: a fast structural lint that
catches AUQ-pacing regressions and prose contract drift in <200ms with
zero token spend. That test is the right tool for catching the failure
mode the gate-tier was meant to guard against.

The Agent SDK E2E tests stay available on-demand for periodic-tier runs
(EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts).
Also tightened the verify-error assertion to the literal field shape
("error_class": "AUTH") instead of a substring match that false-matches
the parent claude session's "needs-auth" MCP discovery markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 1.27.0.0

VERSION was bumped to 1.27.0.0 in f6ec11e but package.json was not
updated in the same commit. The gen-skill-docs.test.ts assertion
"package.json version matches VERSION file" caught the drift.

This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check
is designed for; the fix is the documented sync-only repair (no
re-bump, package.json synced to existing VERSION).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anbangr added a commit that referenced this pull request May 18, 2026
…tion discard

Features 3 + 4 from inbox/build-implementor-hygiene-hardening-20260517.md.
Both features extend GitSnapshot with workTreeHashes (mandatory) and
workTreeContents (opt-in), so they share that foundation in a single
commit. Test files are split by feature for clean bisection.

Feature 3 (D5) — worktree-content-hash hygiene delta:
  - GitSnapshot.workTreeHashes: sha256 of each dirty path's worktree bytes
    at capture time. Replaces round-1 D3's HEAD-blob comparison which
    misreported user's pre-existing dirty work as agent-modified.
  - captureGitSnapshot(cwd, {captureContents:true}) populates the bytes
    map for the discard helper (Feature 4).
  - contentHashDelta(before, after, cwd) returns ONLY real agent changes:
    idempotent rewrites of pre-existing dirty (Foundry case) drop;
    idempotent rewrites of clean tracked files (HEAD blob match) drop;
    untracked rewrites with matching hashes drop; deletions, content
    changes, and missing hashes all count.
  - validatePostAgentHygiene rewired to use contentHashDelta.
  - 15 regression tests in hygiene-delta.test.ts, including T3.1.9
    (gpt-5.5 plan-review CRITICAL #1: pre-existing dirty unchanged → no
    fault) and T3.1.8 (null-head conservatism).

Feature 4 (D2 + D6) — blind-execution detection + path-preserving discard:
  - detectBlindExecution(logPath): per-agent marker table covering
    Gemini (proven from T111646 + 2026-05-18 attempt 3) and speculative
    Kimi/Codex patterns (refined on first observed failure).
  - discardBlindExecutionChanges(cwd, before): replaces round-1 D1's
    `git reset --hard + git clean -fd` (which erased user's pre-existing
    dirty + untracked work, gpt-5.5 CRITICAL #2) with path-by-path
    restore from before.workTreeContents. Pre-existing dirty TRACKED
    AND UNTRACKED state survives the discard.
  - Guards: null-head, cwd-inside-$HOME/.gstack/build-worktrees/,
    workTreeContents required (fails closed otherwise).
  - Wire-in to applyMutableAgentHygiene BEFORE the timedOut/nonzero
    early-return so blind agents that crash also get cleaned up
    (gpt-5.5 IMPORTANT #6).
  - 15 regression tests in blind-execution-detect.test.ts, including
    T4.1.10 + T4.1.11 (CRITICAL #2 regressions: pre-existing tracked
    AND untracked dirty survive discard), T4.1.13 + T4.1.14 (call-site
    integration regressions: probe runs before recovery + on nonzero exit).

Updated coverage-matrix.test.ts to register the two new test files as
owners for cli.ts.

Closes the implementor-hygiene-hardening plan together with the
already-landed Feature 1 (Kimi timeout 49c8866) and Feature 2 (Gemini
staging path 67480ef).
anbangr added a commit that referenced this pull request May 20, 2026
* docs: design for build plan-review convergence loop

Spec for /build's planSynthesizer ↔ planReviewer loop: in-process round
loop, mid-loop user triage gate, plan-file-as-ledger cross-round memory,
set-aware adaptive cap. Triggering case was bundle-1 (5→3→2→manual r4,
~$5-10) where rigor caught real bugs but the operator was locked out
until round 3. New default brings user in at round 1, makes each round
cheaper via in-process loop, and adapts the cap to actual convergence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(spec): pin adaptive-cap exit_reason mapping in convergence design

Self-review caught a small ambiguity: the decision table listed two bail-out triggers (re-raises-only and regression) but the exit_reason enum had three adaptive-cap values without an explicit mapping. Fixed: enum now has exactly adaptive_cap_re_raises_only and adaptive_cap_regression, and the decision table rows reference which exit_reason each triggers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): implementation plan for build plan-review convergence

19-task TDD-structured plan for the convergence loop spec. Each task is
self-contained: failing test → minimal impl → passing test → commit.
Covers types (T1), annotation contract (T2), history JSONL (T3),
convergence aggregate (T4), adaptive cap (T5), TTY triage (T6), non-TTY
triage (T7), prompts (T8), main loop (T9), CLI wire-in (T10), disputed
counting (T11), three integration tests (T12-T14), E2E (T15), SKILL.md
shrink + version bump (T16), README (T17), CHANGELOG (T18), final
verification (T19).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/types): add convergence types for plan-review loop

Extends PlanReviewVerdict with optional triage_decisions, round_history_path,
convergence, interrupted_at_objection fields. Adds TriageDecision and
ConvergenceSnapshot interfaces and ROUND_HISTORY_FORMAT_VERSION constant.
All new fields are optional so existing call sites compile unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(build/types): camelCase convergence field names; drop NEW: prefix

Code-quality review flagged inconsistency: existing PlanReviewVerdict
fields use camelCase (reviewedBy, round) but the new convergence fields
landed as snake_case (triage_decisions, round_history_path, etc).
Converts all new TypeScript interface fields to camelCase to match the
file's established convention. JSONL wire formats in later tasks can
still use snake_case via JSON.stringify of manually-shaped objects --
TypeScript types do not need to mirror JSONL key shape.

Also drops the informal "NEW:" prefix from JSDoc comments and adds a
one-line doc for TriageDecision.decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-reviewer): add round-annotation read/write contract

Adds parseRoundAnnotations, writeRoundAnnotation, parseRoundHistoryHeader,
updateRoundHistoryHeader exported from plan-reviewer.ts. These implement
the cross-round memory contract: each round's triage decisions and synth
resolutions are written into the plan file as HTML comment blocks above
the matching '### Phase N' heading, plus a top-of-plan history block.
The next round's reviewer reads these to know what's already been decided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-reviewer): ROUND N REVIEWER attaches to round N's entry

The original Task 2 commit had the parser attach ROUND N REVIEWER lines
to round N-1's entry. That choice was forced by a self-contradicting
test fixture (a ROUND 2 REVIEWER: line inside an annotation the test
required to have rounds.length === 1).

Both the design spec and Task 9's planned writer in runPlanReviewLoop
treat ROUND N REVIEWER as a round-N observation paired with round N's
USER decision (if any). The N-1 offset would have corrupted every
annotation round-trip in Tasks 9, 11, and the integration tests.

Fixes the parser to attach directly to round N. Updates the broken
test fixture to assert the realistic multi-round shape: round 1 carries
USER/RESOLUTION, round 2 carries only REVIEWER (because no round-2 USER
decision happened on this annotation -- the reviewer did not re-raise).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-reviewer): silent corruption + missing JSDoc on annotation writers

Three issues from code-quality review:

1. writeRoundAnnotation's two String.replace() call sites used string
   replacements, which interpret \$&, \$', \$\` and \$N as substitution
   patterns. A plan annotation whose field text contains those tokens
   (plausible when reviewing regex / shell / template code) would have
   produced silently corrupted output. Both call sites now use function
   replacements which suppress the interpolation.

2. Misleading comment in parseRoundAnnotations claimed the header always
   names "round 1". An annotation first written in round 2+ opens with
   ROUND 2 CRITICAL [...], which the parser already handles correctly --
   the comment was the only wrong thing.

3. RoundHistoryEntry, parseRoundHistoryHeader, and updateRoundHistoryHeader
   gained JSDoc explaining purpose, the optional finalLine parameter
   (used on loop exit for the 'final: APPROVED after N rounds, ...' line),
   and the atomic-write invariant.

New regression test covers the \$& interpolation foot-gun by round-tripping
fields containing every replacement-pattern token.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): add round-history JSONL writer

New module plan-review-loop.ts will house the in-process round loop, triage
gate, and adaptive-cap. First commit establishes the per-build-state history
JSONL: append-only, corruption-tolerant reads, round-counter derivation.
appendHistoryEntry / readHistoryEntries / deriveRoundNumber pair with the
existing plan-reviewer.ts::readPlanReviewRound for cross-launch resume.

HistoryEntry uses camelCase field names (objectionCountRaw, noForwardProgress,
reRaises, newObjections) matching the Task 1 convention; the JSONL on disk
serializes the same camelCase, so jq queries can use the field names verbatim.

Also registers plan-review-loop.ts in coverage-matrix.test.ts so the
MODULE_TEST_OWNERS invariant stays green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): cross-build convergence aggregate writer

writeConvergenceAggregate appends one line per completed build to
~/.gstack/analytics/convergence.jsonl. Captures trajectory, exitReason,
total accept/reject/defer counts, wall time, and annotation parse
errors -- the tuning signal needed to validate MAX_ROUNDS=5 and the
adaptive cap rule over weeks of builds.

Best-effort write: aggregate analytics never block the build path.
ConvergenceAggregate uses camelCase fields matching the Task 1
convention; jq queries use the field names verbatim.

Adds the ExitReason union as a distinct exported type so Task 5's
adaptive-cap decision and Task 9's loop exit can return values
type-checked against it.

Also extends MODULE_TEST_OWNERS coverage entry for plan-review-loop.ts
to include the new test file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): set-aware adaptive cap rule

computeConvergenceSnapshot compares round-k accepted objections against
prior-round annotations parsed from the plan file. Classifies each as
re-raise (prior accepted-and-resolved, same (location, severity)) or
new objection. Rejected-prior matches are deliberately neither -- they
are reviewer-prompt-fidelity signals, not synth-failure signals.

shouldBailAdaptive implements the decision table from the design spec:
hard cap at MAX_ROUNDS triggers stalemate_gate, accepted-count regression
triggers adaptive_cap_regression, re-raises-with-no-new triggers
adaptive_cap_re_raises_only. Precedence is explicit and tested across
six scenarios covering round 1, mid-loop continue, both bail paths,
the hard cap, and adaptive-disabled.

camelCase fields throughout (priorRoundAccepted, reRaises, newObjections,
noForwardProgress) matching Task 1 convention. Exports ConvergenceSnapshotInput,
RoundConvergenceSnapshot, AdaptiveCapInput, AdaptiveCapDecision so Task 9's
runPlanReviewLoop can compose with them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): TTY triage gate for per-objection decisions

runTriageGateTTY prompts user per CRITICAL objection with the 8-key menu
(a/r/d/v/A/R/s/q + Enter), captures optional rationale, surfaces re-raises
with prior rejection context, and supports fast-path A/R or early quit.

Stream-based input/output so tests can drive it without a real TTY,
following the existing build/orchestrator/__tests__/feature-review-prompt.test.ts
pattern: Readable.push(Buffer.from(...)) to avoid object-mode readline pitfalls.
Uses a line-queue/waiter pattern to decouple readline event emission from the
sequential ask() calls — avoids the ERR_USE_AFTER_CLOSE trap that occurs when
readline output ownership conflicts with the injected output stream.

8 tests cover: per-objection accept with rationale capture, mixed
reject/defer/accept, [A]ccept-ALL and [R]eject-ALL fast paths, [s]top
remainder, [q]uit early, and re-raise framing with prior-rejection context.
Returns TriageGateResult with quitEarly / fastPathed flags so Task 9's loop
can route exit codes correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): non-TTY triage modes for CI / scripts / agents

runTriageGateNonTTY: synchronous, no readline. Three modes, picked via the
--plan-review-noninteractive CLI flag added in Task 10:

- auto-accept (default, extends the existing IMPORTANT-objection non-TTY
  behavior to CRITICAL): accept every objection, re-synth, continue
- fail-fast: exit code 3 on the first round with CRITICAL — strict CI gate
- auto-reject: reject every objection, annotate as rejected, proceed —
  escape hatch for known-noisy reviewer runs

Returns NonTTYTriageResult with decisions[] (one per input objection)
and shouldFailFast flag. camelCase fields throughout matching Task 1
convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-reviewer): annotation-aware reviewer + synth prompts

Extends PLAN_REVIEW_PROMPT with a paragraph teaching the reviewer to read
prior-round annotations (USER:accept/reject, RESOLUTION:pending/disputed,
REVIEWER:re-raised) and not re-raise settled concerns. Promotes the const
to an export so the new snapshot test can verify it without dynamic
import gymnastics.

Adds new exported SYNTH_REVISION_PROMPT (formerly inline in
build/SKILL.md.tmpl Step 5.5 -- moved to plan-reviewer.ts so Task 10's
cli.ts can use it from a typed import) instructing the synthesizer to
honor user triage decisions, write RESOLUTION lines, and mark disputes
when the user accepted something the synth thinks is wrong.

Snapshot-tested so unintended drift surfaces in CI; non-snapshot
assertions pin the core annotation contract phrases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): in-process round loop runPlanReviewLoop

Composes the reviewer call, triage gate (TTY or non-TTY), annotation
writes, convergence snapshot, adaptive-cap decision, history JSONL
append, top-of-plan history header update, and synth dispatch -- all
in-process so re-launch overhead between rounds is eliminated.

runStalemateGate handles the user-facing AskUser at adaptive-cap-bail
or MAX_ROUNDS exit. Uses the same line-queue/waiter readline pattern
as runTriageGateTTY (Task 6) to work around Bun's readline
ERR_USE_AFTER_CLOSE on multi-question sequences.

Single-readline-per-loop: when isTTY, runPlanReviewLoop stands up one
shared readline + ask function and threads it into runTriageGateTTY
and runStalemateGate via an optional askFn injection point. Per-call
readlines drain the entire buffered input stream on the first close,
starving later rounds. Existing standalone-call tests for the gates
keep working because askFn defaults to undefined and each gate opens
its own readline in that case.

Tests cover: APPROVE round 1 (no synth, fastest path), bundle-1
trajectory 5->3->2->0 with three synth invocations, adaptive bail on
re-raises stall at round 2.

disputed_resolutions per-round count is a placeholder zero in this
commit; Task 11 wires it to the real RESOLUTION: disputed annotation
detection after each synth call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/cli): wire runPlanReviewLoop into startup path

Replaces the single-shot runPlanReview + reconcilePlanReview call with the
new in-process loop. Adds three CLI flags:
  --plan-review-max-rounds=N    (default 5; range 1..20)
  --plan-review-no-adaptive-cap (off; disables forward-progress bail)
  --plan-review-noninteractive=<auto-accept|fail-fast|auto-reject>
                                (default auto-accept)

Exit codes preserved on the existing side (0 approve, 1 runtime, 2 test fail);
new contract:
  3 = stalemate (user picked [m] or non-TTY fail-fast hit CRITICAL)
  4 = user abort (user picked [q] at gate)
130 = SIGINT during triage (unchanged)

Legacy plan-review-report.json is still written from loopResult.finalVerdict
so SKILL.md.tmpl Step 5.5 stalemate handler keeps working without changes
until Task 16 shrinks it.

synthFn adapter dispatches a configured Claude role with the
SYNTH_REVISION_PROMPT from Task 8 against the now-annotated plan file.
Since the dedicated planSynthesizer role does not yet exist in
role-config.ts, the adapter falls back to planReviewer's role config and
the timeout falls back to BUILD_DEFAULTS.timeoutsMs.planReview. Both are
forward-compatible: once planSynthesizer lands, the `as any` accessors
pick it up automatically.

reconcilePlanReview and readPlanReviewRound are no longer imported in
cli.ts now that the loop owns reconciliation and round tracking; they
remain exported from plan-reviewer.ts for tests that exercise them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/plan-review-loop): count disputed resolutions per round

Replaces the Task 9 placeholder disputedResolutions.push(0) after the
synth call with a real count: re-parse plan annotations, find this
round's entries with RESOLUTION starting "disputed", tally them.

The synth writes RESOLUTION: disputed -- <reason> when it disagrees
with a user-accepted objection. Each disputed entry surfaces in the
next round's triage so the user can re-decide. The aggregate count
in convergence.jsonl is the tuning signal for "how often does synth
disagree with user accepts" -- high counts suggest reviewer prompt
fidelity issues or user triage rationale weakness.

Annotation parse errors during this re-parse increment
annotationParseErrors and log to console.warn; they do not crash the
loop. The placeholder pushes on adaptive-bail-continue and on APPROVE
paths stay 0 (no synth invocation means no resolutions to count).

Also fixes annotationParseErrors declaration from const to let, which
was previously a no-op placeholder but now receives increments inside
the error catch path.

New test covers the disputed-resolution path: round 1 has 2 accepted
CRITICAL, synth marks one disputed and one applied, aggregate's
disputedResolutions[0] is 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(build/integration): bundle-1 trajectory converges 5→3→2→0

Integration test using stub reviewer (replaying fixture data from the
real bundle-1 trajectory: 5 CRITICAL → 3 new → 2 new → APPROVE) and
a stub synth that replaces RESOLUTION:pending with RESOLUTION:applied.

Verifies the end-to-end story:
- 4 rounds, 3 synth invocations, exit APPROVE
- history JSONL has 4 lines (one per round)
- convergence aggregate has correct trajectory and totals:
  trajectoryRaw [5, 3, 2, 0], totalAccepted 10, finalVerdict APPROVE
- plan file accumulates ROUND 1/2/3 annotations + top-of-plan history block

Fixture data lives in test/fixtures/build-convergence/ for sharing
across Tasks 13, 14, 15 (the other integration + E2E tests).

Coverage-matrix updated: plan-review-loop.ts now lists the integration
test as an additional owner alongside the unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(build/integration): adaptive cap bails on re-raises-only

Integration test verifying the no-forward-progress bail path:

Round 1 raises 2 CRITICAL, user accepts both, synth marks RESOLUTION
non-pending (a real fix attempt). Round 2 raises the same 2 CRITICAL
with zero new objections — adaptive cap detects this as a true stall
(reRaises=2, newObjections=0). Bail-out gate fires. User picks
[m]anual mode → exit code 3, outcome user_manual, exit_reason
user_manual in convergence.jsonl.

Synth is invoked exactly once (round 1) — the bail-out fires before
the round-2 synth call, saving the wasted API spend the design was
built to prevent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(build/integration): synth disputes a user-accepted objection

Integration test for the synth-dispute escape hatch:

Round 1 reviewer raises 1 CRITICAL ("use bcrypt"). User accepts. Synth
disagrees -- instead of applying the fix, it writes
RESOLUTION: disputed -- bcrypt conflicts with FIPS in this build.

The dispute is captured in two telemetry surfaces:
- disputedResolutions[0] === 1 in convergence.jsonl
- The plan annotation preserves the synth's reasoning verbatim so the
  next round's reviewer (or user, if it re-surfaces) sees WHY synth
  declined to apply the fix.

Round 2 reviewer reads the disputed annotation and approves (validating
that a dispute does not force a re-raise -- the reviewer can decide).
Loop exits clean with outcome "approved".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): real Codex respects round-annotation contract

Layer 4 E2E from the convergence design spec. Drives runPlanReviewLoop
with the real Codex planReviewer against the bundle-1 fixture plan
created in Task 12. Stub synth rewrites RESOLUTION: pending so the
round-2 reviewer call sees a non-pending resolution annotation.

Asserts that once round 1's objections are accepted and annotated as
RESOLUTION: applied, round 2's Codex call does not re-raise them
(reRaises[1] === 0) -- proving the reviewer prompt addition (Task 8)
actually changes real-Codex behavior, not just our parser tests.

Classified gate tier per CLAUDE.md; ~$0.50/run. Gated by EVALS=1;
free tests run without invoking Codex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/skill): shrink Step 5.5 to handle exit codes only

The in-process loop in plan-review-loop.ts now resolves most rounds
inside the CLI. Step 5.5 shrinks to a four-branch dispatch on exit
codes 0/3/4/130, plus the existing exit-1/2 paths. The cross-round
annotation history lives in the plan file, so manual edits between
launches just need to preserve those annotations so the next round's
reviewer has context.

Bumps skill version 1.24.0 -> 1.25.0 (MINOR -- new capability).
NOT bumping top-level VERSION per fork rule in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(build/orchestrator): document the plan-review convergence loop

Adds a section to the orchestrator README covering the loop architecture,
exit code contract, flags, telemetry file layout, triage gate key map,
and module boundaries. Cross-references the design spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): build skill 1.25.0 — in-process plan-review loop

User-facing entry for the convergence loop change. NOT bumping top-level
VERSION per CLAUDE.md fork rule -- only the build skill frontmatter
bumped (Task 16). Entry covers what changed for the user, projected
metrics from the bundle-1 case study, itemized add/changed/contributor
changes, with spec cross-reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(build/orchestrator): clarify exit code 3 ambiguity + fix dropped indent

Pre-landing review surfaced two README findings:

1. Exit code 3 has two meanings since this PR: lock contention at startup
   AND plan-review stalemate. The general exit-code table now reflects
   both. Exit codes 4 (user abort) and 130 (SIGINT) added to the table
   for completeness.

2. The troubleshooting bullet for --mark-phase-committed lost its
   2-space leading indent during the table reformatting pass. Restored
   so the bullet continuation renders correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(build/orchestrator): strengthen 8 test gaps from pre-landing review

Pre-landing review found 8 test-quality findings. Addressed here:

1. types-convergence.test.ts: trivial echo-assertions replaced with
   behavioral tests that exercise types via real function calls.

2. plan-annotation-round-trip.test.ts: new test for the fallback-prepend
   path in writeRoundAnnotation when the referenced Phase heading is
   absent from the plan text.

3. loop-converge-bundle-1.test.ts: added structural placement assertion
   to confirm annotations land in the expected position (the fixture
   intentionally exercises the fallback-prepend path for rounds 2-3
   which raise objections at non-existent phases).

4. plan-reviewer-triage-tty.test.ts: new test for the [v]iew prose key
   path, verifying the assessment text is shown before the re-prompt.

5. skill-e2e-build-convergence.test.ts: tier-gate pattern aligned with
   other E2E tests (EVALS=1 && EVALS_TIER==='gate', no 'all' fallback).
   Core contract assertion no longer wrapped in a conditional that
   would let the test pass vacuously if Codex approves round 1.

6. plan-reviewer-loop.test.ts: new test exercising the max-rounds
   stalemate path (reviewer always REVISE, non-TTY auto-accept,
   confirms rounds === maxRounds at exit).

7. adaptive-cap-set-aware.test.ts: new test for the
   'fewer accepted with new objections' branch, confirming the loop
   continues when accepted count decreases but new dimensions appeared.

8. History-entry expectations updated for tests where the production
   fix (IMPORTANT/SUGGESTION counts now recorded) changes the asserted
   shape. Most tests use CRITICAL-only fixtures and are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-review-loop): restore IMPORTANT/SUGGESTION handling + 9 cleanups

Pre-landing review surfaced one CRITICAL contract violation and 9
informational items. All addressed in this commit:

CRITICAL — IMPORTANT/SUGGESTION objections silently dropped (regression).
The early-return at the APPROVE / zero-CRITICAL branch treated REVISE-with-
non-CRITICAL-only as APPROVE without annotating IMPORTANT/SUGGESTION
objections in the plan file or recording them in the history. The pre-
loop reconcilePlanReview path handled them via inline annotate-and-
proceed; that path is no longer reached because cli.ts dropped the
reconcilePlanReview call when wiring runPlanReviewLoop. Restored by
filtering important/suggestion, calling writeRoundAnnotation for each,
and recording real counts in HistoryEntry.

Maintainability cleanups:
- LoopOutcome aliased to ExitReason (byte-identical union); the as-cast
  in writeAggregate removed.
- Dead sigint literal removed from ExitReason; SIGINT handling lives in
  cli.ts signal handler, not the loop return type.
- ROUND_HISTORY_FORMAT_VERSION removed from types.ts; was never imported.
- Always-false interrupted field removed from ConvergenceAggregate; no
  code path mutated it.
- Stale "Task 11 placeholder" comment on disputedResolutions replaced
  with accurate description of what the array contains.
- makeReadlineAsk helper extracted; the ERR_USE_AFTER_CLOSE workaround
  for Bun readline is now in one place instead of three duplicated
  blocks (triage TTY standalone, loop shared, stalemate standalone).

Performance cleanups:
- Two redundant readFileSync calls per round eliminated: the plan-text
  read for re-raise detection is reused for annotation-write, and the
  in-memory updatedPlan is passed to updateRoundHistoryHeader instead of
  re-reading from disk after writing.
- existsSync + statSync collapsed to statSync with try/catch.

Performance finding #4 (pre-parse annotations once before
writeRoundAnnotation loop) deferred — requires writeRoundAnnotation
signature change in plan-reviewer.ts; sub-ms cost at typical scale
(K=5-10 objections, 20-50KB plan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-reviewer): close 4 silent-data-loss annotation bugs

Red Team review verified 4 round-trip corruption modes in the annotation
parser/writer. Each fails silently with no warning.

1. '-->' in any user-supplied field closes the HTML comment early.
   Subsequent lines bleed into plan text; parser silently drops rounds[].
2. '"' in rationale stops the [^"]* regex group at the first inner
   quote. Rationale field silently dropped.
3. ']' in location stops the [^\]\n]+ regex group at the first inner
   bracket. ENTIRE annotation silently lost.
4. Non-canonical whitespace in existing in-file annotations makes
   writeRoundAnnotation's merge path a silent no-op. Round 2's
   decision silently lost.

Fix #1-3: HTML-entity encoding on serialize, decode on parse. Encode
ampersand first (so user input containing literal escape sequences
round-trips correctly); decode ampersand last. Encoded sequences:
']' -> '&#93;', '"' -> '&#34;', '-->' -> '--&gt;', '&' -> '&amp;'.
Applied to userRationale, issue, suggestion, location, resolution.
userDecision is a fixed enum (no encoding). Operator warning logged
on first character requiring encoding per field so silent encoding
never lands without an audit trail.

Fix #4: writeRoundAnnotation merge path walks the regex over planText
directly to locate the actual byte range of each existing block
(handling non-canonical whitespace, CRLF, external-editor reformat).
A fresh local regex avoids the shared ANNOTATION_BLOCK_RE's lastIndex
being reset inside parseRoundAnnotations during the loop, which
otherwise caused infinite loops on the same span.

Twelve new tests pin the round-trip invariants:
- '-->' / '"' / ']' / '%' / '&' all round-trip correctly
- merge into non-canonical 8-space-indent block preserves round 2
- literal '&#93;' does not decode to ']' on parse

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-review-loop, cli): close 8 Red Team findings

CRITICAL #1 (regression from round-1 fix): TTY mode now prompts per
IMPORTANT objection (y/skip/all) instead of unconditionally
auto-accepting. Restores the pre-loop reconcilePlanReview contract
that was lost when the loop went in-process. SUGGESTION objections
remain auto-annotated (no prompt). Non-TTY mode still auto-accepts
per existing CI contract.

CRITICAL #6: runPlanReviewLoop now calls deriveRoundNumber on
startup, reads prior history, and starts the round counter at the
next available number on resume. Trajectory arrays are hydrated
from history so shouldBailAdaptive has the right context. Resume
after exit-3 or exit-130 no longer duplicates round numbers in
history.jsonl or writes a second convergence aggregate record.

INFO #7: objectionCountRaw set to critical.length only (matches JSDoc
and other paths). Verdict written as verdict.verdict, not hardcoded
"APPROVE", in the REVISE-with-zero-CRITICAL branch.

INFO #8: Early-exit paths (user quit, non-TTY fail-fast) now push
sentinel -1 to trajectoryAccepted, reRaisesArr, reRejectedArr,
disputedResolutions to maintain length parity with trajectoryRaw.
User-quit path also writes an INTERRUPTED history entry.

INFO #9: synthFn rejection wrapped in try/catch. On failure, the
loop writes an INTERRUPTED history entry, writes the aggregate with
exitReason: reviewer_unavailable, closes shared resources, and exits
with code 1.

INFO #10: Plan-file writes use atomic tmp + rename pattern.
SIGINT during a write can no longer leave a half-written plan that
the parser would silently skip on resume.

INFO #11: HOME fallback uses os.homedir() with os.tmpdir() as
secondary. Containers without HOME no longer pollute the project
worktree.

INFO #12: [s]top key in runTriageGateTTY no longer prompts for the
current-objection rationale before exiting. The current decision is
pushed with empty rationale and the loop moves on.

Tests added: resume-from-history, synth-failure-handled, TTY
IMPORTANT prompt, array-length parity on quit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-review-loop): close adversarial review CRITICALs

Claude adversarial review found two real bugs the 3-specialist + Red
Team rounds missed. Both verified by reading the code.

C1 (CRITICAL): TTY infinite-loop on stdin close. The decision-loop
default case in runTriageGateTTY (and two siblings in the IMPORTANT
prompt + runStalemateGate) was `opts.output.write("Invalid input '${ans}'...")`
with no EOF handling. Once makeReadlineAsk's streamClosed sets the
internal flag, ask() resolves with "" forever, and the while-loop
spins at 100% CPU printing "Invalid input ''" until externally
killed. Triggered by routine Ctrl+D, piped stdin closing, or terminal
disconnect mid-triage. All three sites now treat empty input as a
quit/abort/skip-all signal.

C2 (CRITICAL): Convergence aggregate array-length skew on
reviewer_unavailable exit. planFileSizeBytes was pushed
unconditionally at the top of every round iteration, but the
reviewer_unavailable branch returned without pushing to
trajectoryRaw / trajectoryAccepted / reRaisesArr / reRejectedArr /
disputedResolutions. The resulting aggregate row had one array of
length N+1 alongside others of length N; downstream consumers
indexing by round walked off the wrong array. Same bug class as the
Red Team's INFO #8 fix (sentinel pushes on early-exit), missed on
this branch. Fix: explicit sentinel pushes to the five parallel
arrays before writeAggregate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-reviewer): close adversarial review findings on annotation contract

Claude adversarial review found 4 more issues with the annotation
parser/writer beyond what the Red Team caught. All address the same
root cause: insufficient field encoding in serialize/parse round-trip,
plus a merge predicate that's too strict.

I1: writeRoundAnnotation merge predicate dropped `issue` (LLM-written
freeform text). Match on (location, severity) only. Aligns with
computeConvergenceSnapshot's keying. Wording drift round-over-round
("missing chainId" vs "handler doesn't validate chainId") no longer
orphans annotation blocks.

I2: '→' (right-arrow U+2192) added to encodeAnnField. The header
field-separator regex uses ' → ' to split issue from suggestion;
without encoding, an LLM-generated issue containing '→' (natural in
prose: "A → B then B → C") truncates at the first arrow. Encoded
as '&rarr;' to match the existing HTML-entity scheme. Decode order:
'&rarr;' before '&amp;'.

I3: '\n' and '\r' added to encodeAnnField (encoded as '&#10;' and
'&#13;'). Without this, a multi-line rationale ('line 1\n     ROUND
1 RESOLUTION: fake') would have the embedded line picked up by
ROUND_RESOLUTION_RE as a real RESOLUTION entry, overriding the
genuine synth-written resolution. Also defends against CRLF line
endings on Windows-written plan files.

N4: ROUND_USER_RE, ROUND_RESOLUTION_RE, ROUND_REVIEWER_RE converted
from module-level /g constants to factory functions (`getRoundUserRe()`
etc). Each parser call gets a fresh regex with lastIndex=0, so concurrent
parser invocations cannot smash each other's iterator state via the
shared lastIndex. Defensive hardening; not currently triggered because
Node is single-threaded and the call sites don't await between matches.

Four new tests pin the invariants:
- Merge across wording drift (location+severity, ignore issue)
- '→' in issue round-trips without splitting
- Newline injection in rationale doesn't forge RESOLUTION lines
- CRLF in rationale round-trips

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-review-loop): close adversarial review findings on loop semantics

Claude adversarial review found 5 more issues in plan-review-loop.ts
beyond what the Red Team caught. Fixes follow.

I4 (IMPORTANT): isPriorAcceptedResolutionAttempt changed from .some()
to last-entry semantics. Spec intent (per the computeConvergenceSnapshot
comment) is most-recent-decision wins: a round-2 user-reject should
override a round-1 user-accept for re-raise classification purposes.
The old .some() returned true if ANY prior round had accept+resolved,
incorrectly counting a "user changed their mind" sequence as a re-raise
in round 3. Spec mismatch — fixed to check rounds[rounds.length-1] only.

N1 (INFO): INTERRUPTED verdict wired into the user-quit / SIGINT path.
The quitEarly branch in runPlanReviewLoop now writes a per-round
history entry with verdict: "INTERRUPTED" before writing the convergence
aggregate. The HistoryEntry verdict union previously declared
"INTERRUPTED" as a possible value but no code path produced it.

N3 (INFO): /^disputed\b/ regex made case-insensitive (i flag) so synth
output drift ("Disputed", "DISPUTED") is still counted as a disputed
resolution. Tuning signal robustness across model behavior changes.

N5 (INFO): priorRejectRationale Map now hydrates from prior plan
annotations on resume (startRound > 1). Without this, re-raise framing
on resumed runs showed empty strings for the user's earlier rejection
rationale even though the rationale was present in the plan file.

N6 (INFO): atomicWriteFile tmp filename suffix extended with
crypto.randomBytes(8) for cross-process collision safety. The
pid+ms approach was unique within a single Node process but
container-in-container scenarios could share both.

N7 (empty rationale to undefined on round-trip) is owned by
plan-reviewer.ts (parallel scope) — not addressed here.

Three new tests cover:
- last-round-wins re-raise classification (positive + negative)
- case-insensitive 'Disputed' counted as disputed resolution
- priorRejectRationale hydration on resume shows prior rationale

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/plan-review-loop, cli): close 3 Codex HIGH findings on loop semantics

H2: synthFn now propagates the SubAgentResult's exit-code as `ok`,
and runPlanReviewLoop checks the returned ok. A synth that hit a
timeout, model-not-found error, or non-zero exit no longer silently
masquerades as a successful round. Failure routes through the
synth_failure exit reason added in c5e7b3e.

H3: [c] Continue anyway at the adaptive bail gate now runs the synth
before looping. Previously the `continue` statement jumped straight
to the next round iteration, skipping synth dispatch entirely — the
next paid reviewer round then reviewed the same unresolved plan with
RESOLUTION: pending unchanged, virtually guaranteeing re-raises and
burning the round. Extracted the synth-call into a helper so the
normal-flow path and the continue path share one code path.

H4: Resume past maxRounds (startRound > maxRounds, e.g., user re-
launched after exiting manual mode at the cap) no longer falls
through the empty for-loop to a `lastVerdict!` non-null assertion
on a null value. Added a startRound > maxRounds guard that synthesizes
an APPROVE verdict ("Resume past maxRounds; review skipped.") and
returns with outcome: "approved", round: startRound - 1.

Three new tests pin the invariants:
- synthFn returning ok:false routes to synth_failure exit code 1
- [c] continue invokes synth before next reviewer round
- resume with startRound > maxRounds doesn't crash

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/cli, plan-review-loop): close 3 Codex structured review P1/P2 findings

Codex structured review (the formal --base main P0/P1/P2/P3 pass) found
three issues that bypass the plan-review gate. All closed here.

P1 #1: cli.ts:9935-9963 only handled exit codes 3 / 4 / 130 then fell
through to `state.planReview = loopResult.finalVerdict; saveState; ...`
and proceeded into implementation. exitCode 1 (synth_failure) silently
bypassed the gate. Now exitCode 1 persists state.planReview with
status: "synth_failure" and throws ExitError(1). The build stops; the
resume gate (see next fix) picks it up.

P1 #2: cli.ts:9814-9818 resume gate only re-ran the loop when
state.planReview was missing or status === "critical_exit_pending".
Status values "user_aborted" and "synth_failure" were treated as
"already reviewed" — the next resume saw a REVISE verdict and proceeded
to phases. Both pending statuses now route through the gate so resume
picks up the loop as the docs promise.

P2: plan-review-loop.ts:851 resume-past-maxRounds auto-approved with a
synthetic APPROVE verdict purely based on history length. A user who
typed `gstack-build resume` without actually editing the plan bypassed
the review gate. Now runs ONE more verification reviewer call on the
(potentially-edited) plan. APPROVE → proceed. REVISE with CRITICAL →
exit STALEMATE (3); the user must edit further or pass
--no-plan-review to explicitly override.

The existing H4 test ("returns approved with synthetic verdict") was
updated to assert the new behavior (one reviewer call, APPROVE result),
and a companion H4-P2 test covers the REVISE-from-verification path
that exits STALEMATE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: post-ship documentation update for plan-review convergence (v1.40.5.0)

- CHANGELOG.md: remove stray merge-artifact line (> > > > > > > origin/main)
  and fix stray # in Design spec bullet point in For contributors section
- build/orchestrator/README.md: add plan-reviewer.ts, plan-review-loop.ts,
  drain-faults.ts, halt-event-helpers.ts, halt-events.ts to Architecture
  module map (all added on this branch, were missing from the module list)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anbangr added a commit that referenced this pull request May 21, 2026
…build skill v1.27.0) (#72)

* test: add makeMockState() helper

Lets the drain-faults and orchestrator tests construct a BuildState
without spelling out every field. Used by the upcoming drain wrapper
tests that need a real BuildState (the existing partial-state casts
hit code paths that try to call saveState on incomplete shapes and
take ~15s waiting on filesystem ops).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/orchestrator): heartbeat writes per-run sidecar with progress snapshot

Adds a per-run heartbeat.json sidecar alongside the existing stdout
tick. The sidecar carries runId, pid, stateSlug, current phase,
state.lastUpdatedAt, and drainProcessedCount. Atomic tmp+rename writes
so the monitor never observes a partial JSON read.

The stdout JSON line still goes to process.stdout (existing behavior,
preserves the monitor's recentProcessActivity check). The sidecar is
the new channel the monitor reads in the next commit to detect
"process alive but state has not advanced" stalls.

If the sidecar write fails (ENOSPC, permission flip), heartbeat logs
once and keeps stdout ticking. The monitor falls back to its existing
decision tree on a missing sidecar, so a filesystem hiccup degrades to
pre-fix behavior, not a new silent failure mode.

Tests cover: atomic-rename leaves no .tmp file, payload embeds
runId/pid/stateSlug, undefined snapshot fields are omitted cleanly,
sidecar write failure logs once and keeps stdout alive,
removeHeartbeatSidecar is best-effort and idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/orchestrator): drainFaultsForBuildRun wrapper + AbortSignal kill path

Adds the production end-of-build auto-drain wrapper that threads a
BuildState through the queue consumer so each processed entry bumps
state.lastUpdatedAt AND an in-memory drainProcessedCount counter. The
monitor's stall arm reads both signals from the heartbeat sidecar and
only escalates when BOTH are frozen, so a healthy drain processing
entries one by one counts as progress; a drain spinning on the same
broken entry counts as stalled (closes codex finding #6).

The base drainFaultsFromHaltEventsQueue(opts) API stays unchanged.
Manual `gstack-build drain-faults --queue` keeps working without a
BuildState (codex finding #7).

DrainHaltEventsOptions gains:
  - signal?: AbortSignal — cooperative cancellation, checked before
    every entry; threads down to spawnInvestigatorCapture which
    SIGTERMs the in-flight investigator child with a 5s grace
    period before SIGKILL using killProcessAndGroup. Same kill
    semantics as the per-investigator wall-clock timeout (closes
    codex finding #8 — Promise.race alone left orphans).
  - onEntryProcessed?: () => void — fires per processed/short-circuited
    entry. NOT per skipped entry (filtered by severity or runId): a
    flood of irrelevant queue entries must not keep the stall arm
    quiet forever.

DrainHaltEventsResult gains aborted + deferred so the orchestrator
logs `auto-drain budget exceeded after Xm; N entries left for next
run` and telemetry sees the abort.

Tests cover: 3-entry drain → 3 saveState + counter bumps; counter
monotonic; base API still works without state; mid-drain abort halts
loop and reports deferred; pre-aborted signal returns immediately with
deferred = full queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/orchestrator): monitor escalates when orchestrator state freezes

Adds the stall-detection arm that closes the 47-min auto-drain hang
class. When the orchestrator's process is alive AND the heartbeat
sidecar shows both progress signals (state.lastUpdatedAt +
drainProcessedCount) frozen beyond build_stall_threshold_ms (default
15 min), the monitor emits USER_ACTION_REQUIRED instead of staying at
RUN_RUNNING forever.

The check is gated on a trust check: the sidecar's runId AND pid must
match the run being evaluated. A stale sidecar from a crashed prior
run on the same stateSlug (PID reuse, cleanup-skip) gets null'd and
the monitor falls back to its existing decision tree (codex finding
#3).

The monitor is stateless between polls (evaluateMonitorOnce reads
fresh from disk), so the "frozen for X ms" computation can't live in
memory. Solved by writing a per-run tracker sidecar:
  <stateDir>/<stateSlug>.heartbeat-track.json
recording lastSeenStateLastUpdatedAt + lastSeenDrainProcessedCount +
lastChangedAt. Each poll reads the tracker, compares against the
fresh sidecar, and either bumps lastChangedAt (movement) or measures
how long the values have been frozen (stall) — codex finding #2.

The reader uses tmp+rename atomic writes for the tracker.

New gstack-config knob: build_stall_threshold_ms (default 900000 =
15 min). monitor.ts spawns gstack-config once at startup to read it;
falls back to default on missing/unparseable. Pattern matches
fault_investigator_model in drain-faults.ts.

Decision tree extension lives directly above monitor.ts:844 with an
ASCII diagram showing the full tree (state.completed → state.failed →
stale → pidAlive → recentProcessActivity → NEW: heartbeatStateMoved).

Tests cover: stall arm escalates after threshold; doesn't escalate
when state moves; ignores cross-run sidecar (runId mismatch); ignores
sidecar with PID-reuse mismatch; missing sidecar falls through to
existing branch (no new silent failure); tracker survives monitor
restart (first poll seeds, second poll evaluates); default 15-min
threshold applies when no override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(build/orchestrator): wire heartbeat sidecar + 30-min auto-drain budget

Connects the three new pieces (heartbeat sidecar, drain wrapper,
AbortSignal kill path) at the cli.ts main() level:

  - startHeartbeat now receives a getStateSnapshot callback reading
    state.lastUpdatedAt / currentPhaseIndex / drainProgress.count(),
    plus heartbeatFilePath pointing at the per-run path under
    <stateDir>/<slug>.heartbeat.json. pid + runId + stateSlug embedded
    for the monitor's trust gate.

  - A new drainProgress = createDrainProgressCounter() lives alongside
    `heartbeat` in main(). The heartbeat reads it; runAutoDrainIfEnabled
    passes it into drainFaultsForBuildRun so each processed entry bumps
    the counter visible to the next heartbeat tick.

  - runAutoDrainIfEnabled now enforces AbortSignal.timeout(30 * 60 * 1000)
    on the drain. On budget exhaustion, the orchestrator logs
    `auto-drain budget exceeded after Xm; N entries left for next run`,
    preserves the build's existing exitCode, and lets process.exit
    proceed normally. The signal threads down to in-flight investigator
    subprocesses which receive SIGTERM + 5s grace + SIGKILL.

  - finally block now calls removeHeartbeatSidecar(heartbeatPath) right
    after heartbeat.stop(). SIGKILL paths skip finally; the monitor's
    runId+pid trust gate handles those stragglers.

  - The base drainFaultsFromHaltEventsQueue path is preserved for when
    runAutoDrainIfEnabled is called without state (e.g. an early-exit
    code path); the wrapper is only invoked when state is present.

Behavior change to document in CHANGELOG: pre-fix auto-drain worst case
was 200 min (max:20 entries × 10-min investigator timeout). Post-fix
budget caps it at ~3 timeout-class entries before abort. Remaining
entries defer to the next run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: end-to-end integration of heartbeat sidecar → monitor stall arm

Spawns a real bun child that imports the production heartbeat.ts and
writes a sidecar with a FROZEN state snapshot for 3 seconds. The
monitor (in this same process) reads the on-disk sidecar across two
polls separated by 700ms — past a 500ms test-only threshold — and
escalates USER_ACTION_REQUIRED.

Unlike the unit tests in monitor.test.ts which feed synthetic
snapshots, this test exercises the heartbeat.ts → sidecar file →
monitor.ts read path with no mocks. If any wire in the stall-detection
chain breaks (sidecar path mismatch, runId+pid trust gate, tracker
semantics), this test catches it.

Runs in ~1s using the buildStallThresholdMs test hook. No paid LLM,
deterministic, can run in CI gate tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: bump build skill to v1.27.0 + CHANGELOG (orchestrator stall detection)

Build skill 1.27.0:
  - heartbeat sidecar + monitor stall arm + drainFaultsForBuildRun
  - 30-min auto-drain wall-clock budget with AbortSignal kill path
  - build_stall_threshold_ms gstack-config knob

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/orchestrator): address 4 P1 ship-review findings

Pre-landing adversarial review (Claude subagent + Codex structured
review) caught four wiring bugs that made the stall-detection feature
non-functional in production despite passing the unit tests. Fixed
in-place before merge.

1. C1 — heartbeat dies BEFORE auto-drain runs (codex finding)
   cli.ts's `finally` block stopped the heartbeat and removed the
   sidecar BEFORE the post-finally `await runAutoDrainIfEnabled(...)`.
   Result: during auto-drain — the exact scenario this PR is fixing —
   no heartbeat ticks, no sidecar updates, monitor blind. Moved
   heartbeat.stop() + removeHeartbeatSidecar() to AFTER the await.
   Other finally work (lock release, registry update, logActivity)
   stays in finally so it runs on exceptions too.

2. C2 — build_stall_threshold_ms knob never reads (codex finding)
   runMonitorMode called evaluateMonitorOnce without gstackConfigBin,
   so readBuildStallThresholdMs short-circuited to the 15-min default
   and `gstack-config set build_stall_threshold_ms` never took effect
   despite being advertised in CHANGELOG. Added resolveGstackConfigBin
   helper that defaults to ~/.claude/skills/gstack/bin/gstack-config
   (same canonical path as resolveInvestigatorRole). Tests can still
   pass "" to force the no-binary path.

3. A3 — clarified shortCircuited-bumps-progress decision
   Claude adversarial review flagged a dedup-flood concern. Re-examined:
   shortCircuited entries DO call markInvestigated which moves the
   queue file to processed/, so they represent real forward progress.
   Skipped entries (severity/runId filter) do NOT move files and
   correctly stay out of the progress signal. Added a code comment
   pinning the rationale so future refactors don't conflate the two.

4. C3 — abort during in-flight investigator misreported (codex finding)
   When AbortSignal.timeout fired mid-investigator, spawnInvestigatorCapture
   returned null which was unconditionally counted as result.failed.
   If the kill happened on the last entry, `aborted` stayed false and
   `deferred` stayed 0 — the "budget exceeded after Xm; N entries left
   for next run" warning was silently lost. Added an
   `if (opts.signal?.aborted) { abortedDuringLoop = true; break; }`
   check immediately after the spawn await.

All 57 affected-module tests still pass (heartbeat + monitor +
auto-drain + drain-halt-events-resolved-pairing + monitor-heartbeat-integration).

The C3 fix is documented in a code comment in auto-drain.test.ts
because the mockInvestigator path skips spawnInvestigatorCapture and
can't trigger the bug shape; a real-process test would need a binary
that hangs long enough for the abort to land mid-call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anbangr pushed a commit that referenced this pull request May 22, 2026
…s (PR garrytan#1169 follow-up) (garrytan#1592)

* fix(build-app): escape sed replacement metachars in Chromium rebrand

build-app.sh injects \$APP_NAME directly into the replacement half of
sed's s/// when patching Chromium's localized InfoPlist.strings. If
\$APP_NAME ever carries '/', '&', or '\\' — the command either breaks
or starts interpreting input as sed syntax. The trailing '|| true'
would then silently hide the failure and ship a DMG that still says
'Google Chrome for Testing' in the menu bar.

Escape replacement metachars before substitution. No change for the
default name 'GStack Browser'.

* fix(build-app): bail out if 'mktemp -d' fails instead of cp-ing into '/'

The DMG creation step sets DMG_TMP from 'mktemp -d' with no error check.
If mktemp fails (tmpfs full, permissions, TMPDIR misconfigured), DMG_TMP
is empty and the very next line — 'cp -a "\$APP_DIR" "\$DMG_TMP/"' —
expands to 'cp -a "<app>" "/"', which copies the bundle into the root of
the filesystem.

Refuse to continue unless mktemp produced a real directory. Defensive
second check catches the (rare) case where mktemp succeeds but returns
something that isn't a directory we can cp into.

* fix(telemetry-sync): drop predictable $$ tmp-file fallback

gstack-telemetry-sync tried 'mktemp /tmp/gstack-sync-XXXXXX' and on
failure fell back to '/tmp/gstack-sync-$$'. $$ is the PID — predictable
and reusable, so on shared hosts another user can pre-create or symlink
the path and either steal the response body or clobber an unrelated
file when curl writes through it.

Drop the fallback. If mktemp cannot produce a unique file we just skip
this sync cycle — the events stay on disk and the next run picks them
up. Also install an EXIT trap so the response file is cleaned up on
unexpected exit, not just on the happy path.

* fix(verify-rls): drop predictable $$-based tmp file fallback

Same shape as gstack-telemetry-sync: on mktemp failure the script fell
back to '/tmp/verify-rls-$$-$TOTAL', which is fully predictable from the
PID and a per-check counter. On a shared box another user can pre-create
or symlink the path and either capture the HTTP response body (which may
leak what the RLS tests revealed) or corrupt an unrelated file that curl
writes through.

Make mktemp strict. On failure return from the check function; the caller
tallies a FAIL and the run moves on.

* fix(security-classifier): close writer + delete tmp on download error

downloadFile() opens an fs.WriteStream to '<dest>.tmp.<pid>' and drives
it from a fetch body reader, but if reader.read() or writer.write()
throws mid-download the writer is never closed. That leaks an FD per
failed attempt and leaves the half-written tmp on disk. A later retry
can land in renameSync(tmp, dest) with a truncated TestSavantAI /
DeBERTa ONNX file — which then loads but produces garbage classifier
verdicts until the user manually nukes the models cache.

Wrap the download loop in try/catch. On failure, destroy() the writer
and unlink the tmp before rethrowing, so the next attempt starts from a
clean slate.

* fix(meta-commands): guard JSON.parse in pdf --from-file parser

parsePdfFromFile() runs JSON.parse on user-supplied file contents with
no try/catch. A malformed payload surfaces as an uncaught SyntaxError
from the 'pdf' command handler and the user sees an opaque stack trace
instead of "this file isn't valid JSON". Worse, the same call path is
used by make-pdf when header/footer HTML would overflow Windows'
CreateProcess argv cap, so a corrupt payload file there can take down
the make-pdf run.

Wrap JSON.parse. Re-throw with a message that names the offending file
and echoes the parser's own explanation. Also reject top-level non-
objects (null, array, primitive) since the rest of the function treats
json as an object — catching that here produces a clear error instead
of a TypeError further down.

* fix(global-discover): stop dropping sessions when header >8KB

extractCwdFromJsonl() reads the first 8KB of each JSONL session file and
runs JSON.parse on every newline-split line. When a session record
happens to straddle the 8KB cap, the last line ends in a truncated JSON
fragment, JSON.parse throws, the catch block 'continue's silently, and
if that was the only line carrying 'cwd' the whole project gets dropped
from the discovery output without a warning.

Two independent hardening steps:
  1. Raise the read cap to 64KB. Session headers observed in Claude
     Code / Codex / Gemini transcripts fit comfortably; this just moves
     the cliff out of the normal range.
  2. Drop the final segment after splitting on '\\n'. If the read hit
     the cap mid-line, that segment is guaranteed incomplete; if the
     file ended inside the buffer, the split produces an empty final
     segment and dropping it is a no-op.

Together these make the parser robust regardless of how verbose the
leading records are.

* test: export downloadFile, parsePdfFromFile, extractCwdFromJsonl

These three internal helpers are now imported by regression tests
landing in the next commits (PR garrytan#1169 follow-up). Pattern matches the
existing normalizeRemoteUrl export in gstack-global-discover.ts which
test/global-discover.test.ts already imports side-effect-free.

No change to runtime behavior; gstack has no public package entrypoint
that would re-export these, so the in-repo surface is unchanged for
callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security-classifier): await writer close before unlinking tmp on error

The earlier downloadFile() error-path cleanup hit a race: Node's
createWriteStream lazily opens the FD and flushes buffered writes during
destroy(), so a naive `fs.unlinkSync(tmp)` immediately after `writer.destroy()`
hits ENOENT (file not yet on disk), then the writer's destroy finishes on the
next tick and creates the file fresh — leaving the half-written tmp behind
exactly as the original fix tried to prevent.

The new sequence awaits the writer's 'close' event before unlinking, so the FD
is fully torn down and no subsequent flush can re-create the path.

Caught by browse/test/security-classifier-download-cleanup.test.ts in the
next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(browse): regression tests for downloadFile cleanup + parsePdfFromFile guard

Covers PR garrytan#1169 bugs #6 and #7:

- security-classifier-download-cleanup.test.ts pins downloadFile error-path
  cleanup against three failure shapes: reader rejects mid-stream, non-2xx
  response, missing body. Asserts the dest file is not created and no
  <dest>.tmp.* siblings remain (glob-matched, not exact path — codex push:
  if the fix later switches to mkdtempSync, the assertion still holds).
  Includes a happy-path case so the cleanup isn't fighting a correct download.

- regression-pr1169-pdf-from-file-invalid-json.test.ts pins parsePdfFromFile
  to throw a helpful error for: invalid JSON, empty file, top-level array,
  top-level number, top-level string, top-level null, top-level boolean.
  Codex push: JSON.parse accepts primitives too, so Array.isArray + typeof
  guard must be tested separately from the JSON.parse try/catch.

Both files use mkdtempSync(process.cwd()/...) for fixture isolation since
SAFE_DIRECTORIES allows TEMP_DIR or cwd; cwd is universal across CI hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(global-discover): regression for extractCwdFromJsonl 64KB cap

PR garrytan#1169 bug #8: the 8KB read cap landed mid-line on Claude Code session
headers, JSON.parse threw on the truncated tail, the catch silently
continued, and the project disappeared from /gstack discovery output.

Six new cases under describe("extractCwdFromJsonl 64KB cap"):

- happy path: small JSONL with obj.cwd returns it
- 12KB first line with obj.cwd: returns cwd (the bug case)
- 80KB single line overflowing 64KB: returns null without crashing
- complete line followed by partial second line: trailing-partial-drop
  must not poison the result; returns first line's cwd
- missing file: returns null (file read error swallowed)
- malformed first line + valid second line within cap: skips bad,
  returns second's cwd

Tests use the exported extractCwdFromJsonl (added in earlier export
commit) and live in a separate describe block from the existing
"4KB / 128KB buffer" tests, which exercise the unrelated scanCodex
meta.payload.cwd path at L338 — different function, different bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression tests for shell-script bugs in PR garrytan#1169 (#2-#5)

Two new test files pinning the four shell-script invariants from the
external audit:

regression-pr1169-build-app-sed.test.ts — bugs #2 + #3
- Runtime isolation: extracts the sed-escape sequence from build-app.sh
  and runs it against hostile $APP_NAME values ("Foo/Bar&Baz", "Cool\App",
  "A/B\C&D"). Asserts the literal hostile name round-trips through a real
  `sed s///` invocation, locking the metachar safety end-to-end.
- Static check: the rebrand block must contain both the escape line AND
  the sed line referencing $APP_NAME_SED_ESCAPED; bare $APP_NAME
  interpolation directly into the s/// replacement is rejected.
- Static check: DMG_TMP=$(mktemp -d) is followed by an explicit `|| { ... exit }`
  failure handler AND a `[ -z "$DMG_TMP" ] || [ ! -d "$DMG_TMP" ]` validation
  AND the cp -a appears AFTER both guards.
- Runtime fake-bin: extracts the guard shape, runs with a fake mktemp that
  exits 1, asserts the script exits non-zero before any cp block can reach.

regression-pr1169-mktemp-fallbacks.test.ts — bugs #4 + #5
- Per codex pushback, the invariant is "no `mktemp ... || echo <path>`
  fallback shape" — not just "no $$ token." That's a stronger invariant
  that catches future swaps to $RANDOM or hardcoded paths.
- For each of bin/gstack-telemetry-sync and supabase/verify-rls.sh:
  - no echo-based fallback after mktemp
  - no $$ inside any /tmp path literal
  - mktemp failure path explicitly exits / returns non-zero
  - telemetry-sync also pins the `trap rm -f $RESP_FILE EXIT` cleanup
    so success paths don't leak the tmp on normal exit.

All seven new test files are gate-tier (deterministic, sub-second, no LLM,
no network). Runtime shell tests use fake-bin PATH stubs in temp dirs;
no $HOME mutation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.41.1.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: RagavRida <ragavrida@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anbangr pushed a commit that referenced this pull request Jun 1, 2026
…always-loaded) (garrytan#1806)

* feat(test): transcript-section-logger + ship-action fingerprint (T10)

Pure-analysis module over a SkillTestResult/NDJSON transcript:
- extractSectionReads(): which sections/*.md a run opened (post-carve check)
- extractShipActions(): observable action fingerprint (merge/test/bump/
  changelog/commit/push/pr) that works on the MONOLITH too, so a baseline
  captured before the carve can detect a sectioned-ship regression
- baseline read/write + compareShipActions() for baseline-first dogf(T10)

Baseline-first answers the Codex outside-voice critique that a logger in the
same PR as the carve is post-failure telemetry without a pre-carve reference.

11 unit tests, all green. Paid monolith baseline capture runs separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(pipeline): section discovery + generation machinery (T9)

- discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl
- gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext
  as shared helpers (processTemplate and the new processSectionTemplate both call
  them, so a sanitization/rewrite fix can't miss sections) [C1]
- processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice),
  parent-skill TemplateContext (skillName pinned to parent, not 'sections', so
  appliesTo gating + tier behave identically), per-host output routing
- --host all now fails the build on ANY host failure, not just claude, so a stale
  external-host output can't slip the freshness gate [Codex outside-voice #9]

Inert until a skill is carved (no sections/ dirs exist yet). Refactor is
output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE.

5 discovery unit tests + 389 gen-skill-docs tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9)

Two install targets cherry-pick SKILL.md and would leave a carved skill's
sections/ behind, 404ing a runtime 'Read sections/<name>.md':
- link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows
  gets a fresh copy on every ./setup)
- kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under
  ~/.kiro, not ~/.codex/~/.claude

codex/factory/opencode link the whole generated dir, so sections ride free.
Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a
skill is carved. Static-tripwire test + windows-fallback invariant green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9)

Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a
tested CLI instead of bash prose the agent re-derives each run.
- classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION
  vs origin/<base>:VERSION vs package.json.version (pure reader)
- write: validated dual-write to VERSION + package.json (FRESH bump)
- repair: DRIFT_STALE_PKG sync, no re-bump
Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays
bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from
skippable prose into code that can't be skipped or misread.

15 tests (exhaustive state matrix + write/repair fs + real-git classify).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(parity): sectioned-skill parity capability — guards the carve (T9)

Carved skills (skeleton + sections/*.md) need parity checks that see relocated
content, or moving a phrase into a section reads as 'lost':
- readSkillForParity(): union skeleton + all sections/*.md
- checkSkillParity sectioned mode: content checks against the union; minBytes/
  maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes
  asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a
  small skeleton would otherwise make the size floor toothless [Codex #12].

Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the
same commit it lands. Monolith path byte-identical (verified: pre-existing
investigate 1.053 ratio drift fails the same with this change stashed).

7 sectioned-parity tests + existing parity tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(ship): carve into skeleton + on-demand sections (Claude) (T9)

ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving
8 prose-heavy steps into ship/sections/*.md, read on demand:
tests, test-coverage, plan-completion, review-army, greptile, adversarial,
changelog, pr-body. Step 12's version logic now calls the tested
gstack-version-bump CLI instead of inline bash.

Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton +
generated section files) and INLINES the content on every other host, so external
hosts keep the full monolith — verified factory at 162KB with no sections dir.
{{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE
manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures.
Multi-pass resolve expands inlined sections' own resolvers.

Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes
asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/
golden/plan-completion/garrytan#1539/size-budget tests via skeleton+sections union reads.
Free suite green except the pre-existing investigate parity drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): manifest-consistency + context-parity + requiredReads helper (T9)

Free deterministic guards for the carve:
- required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the
  mechanical layer-5 check that the agent Read the sections its situation needs
  (required set comes from the fixture, not the passive manifest)
- section-manifest-consistency: 3-tier orphan classification (generated orphan +
  hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and
  pins the PASSIVE-manifest contract (no applies_when/required_for)
- template-context-parity: generated sections have zero unresolved placeholders
  and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW)
  rendered — proving sections resolve with the parent skillName, not 'sections'

16 tests, all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): section-loading E2E + idempotency CLI detection (T9)

- skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan
  mode against a fresh version-changing fixture and asserts the agent Read the
  required sections (review-army + changelog). Runs against the INSTALLED skill
  (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface
  [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip.
- skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12
  now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead
  of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a
  gstack-version-bump-write re-bump regression signal.
- touchfiles: register ship-section-loading (periodic) + extend idempotency deps
  with bin/gstack-version-bump + scripts/resolvers/sections.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): union-read redaction wiring test for the carve (T9)

main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the
carve, not the skeleton template. Read skeleton + section templates union so the
redaction-wiring assertions follow the relocated content. 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
anbangr pushed a commit that referenced this pull request Jun 13, 2026
…n-consultation (garrytan#1907)

* test: canonical CARVE_GUARDS registry; derive parity + size-budget from it

Single source of truth for the carved-skill set + per-skill invariants
(EQ1). parity-harness.ts sectioned entries and skill-size-budget.ts
SECTIONS_EXTRACTED now derive from it instead of hand-maintained lists.
Closes a pre-existing drift: plan-devex-review was in SECTIONS_EXTRACTED
but had no sectioned parity invariant; now generated. carve-guards.ts is
a pure leaf data module (import type only) to avoid an import cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: shared carve-guard check fns with injectable root

discoverCarvedSkills/checkOrdering/checkCompleteness take a root param so
the negative tests can point the real guards at a fixture dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E2 data-driven carve static ordering guard (gate)

Per-PR backstop for every carved skill, one test() per skill, driven by
CARVE_GUARDS staticInvariants. Generalizes + retires the ceo-specific
ordering test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E1 carve-guard completeness meta-guard (gate)

Asserts filesystem carved set == CARVE_GUARDS set both directions, so a
future carve without a registry entry fails CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: ET1 guard-of-guards negative tests (gate)

Temp fixture broken 3 ways proves E1/E2 actually throw, via the injectable
root. Kills the silent-pass-guard failure class.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: T2 data-driven behavioral section-loading guard (periodic)

One file iterating CARVE_GUARDS, one test() per skill with GSTACK_CARVE_SKILL
cost-scoping (D-CODEX A). external carves (ship, plan-ceo) keep bespoke
tests; testNames aligned to their touchfile keys. Registered in touchfiles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: defer E3 real-session carve canary to TODOS

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve document-release into skeleton + on-demand section

Steps 2-9 (per-file audit, auto-updates, risky-change asks, CHANGELOG
voice polish, cross-doc consistency, TODOS cleanup, VERSION bump, commit +
PR body) move to sections/release-body.md, read on demand after the Step
1.5 coverage map. Skeleton 59,256 -> 45,797 B (-23%); union preserved.
Adds the CARVE_GUARDS entry (auto-extends parity + size-budget via EQ1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve design-consultation into skeleton + on-demand section

Phases 3-6 (complete proposal, drill-downs, design preview, writing
DESIGN.md) move to sections/proposal-and-preview.md, read on demand after
product context + research. Skeleton 80,719 -> 59,229 B (-27%); union
preserved. Adds the CARVE_GUARDS entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve cso into skeleton + on-demand section (security-safe)

Scope-dependent audit Phases 2-11 move to sections/audit-phases.md. Mode
dispatch (## Arguments, ## Mode Resolution), always-run Phases 0/1, and the
Phase 12 false-positive-filtering exceptions stay ALWAYS-LOADED in the
skeleton. Skeleton 79,383 -> 65,117 B (-18%); union preserved.

Adds a cso CARVE_GUARDS entry with an earliest-use invariant (mustPrecedeStop):
mode dispatch must appear before any STOP-Read, so a directive that decides
which sections to read can't be stranded behind the STOP that reads them
(codex outside-voice #6). carve-guard-checks gains the mustPrecedeStop check.
parity moves cso monolith -> generated carved entry. cso-preserved.test.ts
strengthened: phrases checked against the union, plus an always-loaded
contract on the skeleton (dispatch + FP-filtering, codex #5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: make redaction/taxonomy tests union-aware for cso + document-release carves

The cso carve moved Secrets Archaeology (prefixes, lib/redact-patterns.ts
pointer, git-history scan) into sections/audit-phases.md, and the
document-release carve moved the Step 9 PR-body redaction scan into
sections/release-body.md. Three content-presence tests asserted that content
in the skeleton SKILL.md/.md.tmpl; they now read the skeleton+sections union
(same fix as cso-preserved + parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.0.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: address pre-landing review (codex) on the carve

- cso section: add a scope-gate header so '--owasp' (and other scoped modes)
  run only their selected phases, not every phase bundled in the section
  ('execute in full' no longer overrides Mode Resolution).
- carve-guard-checks: gateAfterStop now compares against the LAST STOP, not the
  first, so a gate stranded between two STOPs in a multi-STOP skeleton fails.
- TODOS: behavioral section-loading hermeticity (verifier matches global-install
  path, not the fixture) — pre-existing in auq-sdk-capture.ts, deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant