v1.26.7.0 test: add build skill TDD gate by anbangr · Pull Request #12 · anbangr/gstack

anbangr · 2026-05-06T03:06:18Z

Summary

Build Skill Gate

Added a dedicated GitHub Actions workflow for /build changes that regenerates all host skill docs, checks for stale generated output, and runs the build-skill gate.
Expanded test:build-skill to run the full build orchestrator suite plus generated skill-doc contract tests.
Added a deterministic coverage matrix guard mapping every build orchestrator module and build-critical behavior to explicit test owners.

Build Skill Docs

Updated /build templates, generated docs, and READMEs to state the default TDD lifecycle: Test Specification, Verify Red, Implementation, Green tests, and Review/QA.
Preserved legacy two-checkbox compatibility while making the TDD shape the default for newly synthesized living plans.

Release

Bumped VERSION and package.json to 1.26.7.0 and updated CHANGELOG.md.

Test Coverage

No new application runtime code paths were added. The build tooling path is covered by the new deterministic matrix guard in build/orchestrator/__tests__/coverage-matrix.test.ts.

Tests: +1 build orchestrator test file. Coverage gate: PASS (100% for changed build-skill gate behavior).

Pre-Landing Review

No unresolved issues found. One workflow-trigger gap was auto-fixed during /review before shipping.

Design Review

No frontend files changed, so design review was skipped.

Eval Results

No prompt-related files changed, so evals were skipped.

Scope Drift

Scope Check: CLEAN. The branch is limited to the requested build-skill TDD gate, CI workflow, coverage matrix guard, docs, and release metadata.

Plan Completion

test:build-skill runs the full orchestrator suite plus generated docs tests.
Dedicated GitHub workflow enforces generated-doc freshness and the build-skill gate.
Coverage matrix guard maps orchestrator modules and build-critical behaviors to deterministic tests.
/build docs and templates describe the full TDD lifecycle.
Legacy two-checkbox plan compatibility remains documented.

Verification Results

Browser verification skipped: no dev server, UI surface, or route changed.

TODOS

No completed TODO items detected in this PR.

Documentation

Updated build/README.md, build/orchestrator/README.md, build/SKILL.md.tmpl, generated build/SKILL.md, and CHANGELOG.md. bun run gen:skill-docs --host all && git diff --exit-code passed.

Test plan

bun run test:build-skill — 805 pass, 0 fail, 6894 expect calls across 21 files.
bun run gen:skill-docs --host all && git diff --exit-code
git diff --check origin/main..HEAD

Add a dedicated build-skill CI gate, expand the focused test command to the full orchestrator suite, and document the TDD lifecycle contract.

Record the build skill TDD gate release. Co-Authored-By: OpenAI Codex <noreply@openai.com>

… rename (garrytan#1351) * feat: gstack-gbrain-mcp-verify helper for remote MCP probe Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize, classifies failures into NETWORK / AUTH / MALFORMED with one-line remediation hints, and runs a tools/list capability probe to detect sources_add MCP support (forward-compat for when gbrain ships URL ingest). Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set both 'application/json' AND 'text/event-stream' in Accept; that gotcha costs 10 minutes of debugging when missed (regression-tested). Live-verified against wintermute (gbrain v0.27.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: gstack-artifacts-init + gstack-artifacts-url helpers artifacts-init replaces brain-init with provider choice (gh / glab / manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in ~/.gstack-artifacts-remote.txt, and a "send this to your brain admin" hookup printout. Always prints the command, never auto-executes — gbrain v0.26.x has no admin-scope MCP probe (codex Finding #3). artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers don't each string-mangle (codex Finding #10). The remote-conflict check in artifacts-init compares at the canonical level so re-running with HTTPS input doesn't trip on a stored SSH URL for the same logical repo. The "URL form not supported" branch prints a two-line clone-then-path form for gbrain v0.26.x; the supported branch is a one-liner with --url ready for when gbrain ships URL ingest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote Adds two new fields to detect's JSON output: - gbrain_mcp_mode: local-stdio | remote-http | none Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves the file format, the first two tiers absorb it. - gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration window so detect doesn't return empty between upgrade and migration. Existing detect tests still pass (15/15). New 19 tests cover every fallback tier independently, plus a schema regression for /sync-gbrain compat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: setup-gbrain Path 4 (remote MCP) + artifacts rename Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it as an HTTP-transport MCP without needing a local gbrain CLI install. The flow: - Step 2 gains a fourth option (Remote gbrain MCP) - Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier) - Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio branch all skip on Path 4 - Step 5a adds an HTTP+bearer registration form: claude mcp add --transport http --header "Authorization: Bearer ..." - Step 7 renamed "session memory sync" → "artifacts sync" and now calls gstack-artifacts-init (which always prints the brain-admin hookup command — no auto-execute, codex Finding #3) - Step 8 CLAUDE.md block branches: remote-http includes URL + server version (never the token); local-stdio keeps engine + config-file - Step 9 smoke test on Path 4 prints the curl-equivalent for post-restart verification (MCP tools aren't visible mid-session) - Step 10 verdict block has separate templates per mode Idempotency: re-running with gbrain_mcp_mode=remote-http already in detect output skips Step 2 entirely and goes to verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep) Hard rename, no dual-read alias (codex Finding D4). The on-disk migration script (Phase C, separate commit) renames the config key in users' ~/.gstack/config.yaml and any CLAUDE.md blocks. Touched call sites: - bin/gstack-config defaults + validation + list/defaults output - bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted with the same name for downstream-tool compat; reads new key) - bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall - bin/gstack-timeline-log (comment ref) - scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key, branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC: remote-mode (managed by brain server <host>)" instead of the local mode/queue/last_push line (codex Finding #11) - bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback during the migration window - bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local paths, file://, self-hosted gitea) so test infrastructure and unusual remotes work without canonicalization - test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init - test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys - test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode probe in the preamble resolver - health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line Hard delete: - bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0) - test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after artifacts-sync rename Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md files reflect the renamed config key (gbrain_sync_mode → artifacts_sync_mode), the renamed remote-helper file (~/.gstack-artifacts-remote.txt with brain fallback), the renamed init script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode status line that fires when a remote-http MCP is registered. Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match the regenerated default-ship output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename Journaled, interruption-safe migration. Six steps, each writes to ~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes from the next un-done step. On final success, journal is replaced by ~/.gstack/.migrations/v1.27.0.0.done. Steps: 1. gh_repo_renamed gh/glab repo rename gstack-brain-$USER → gstack-artifacts-$USER (idempotent: detects already-renamed and skips) 2. remote_txt_renamed mv ~/.gstack-brain-remote.txt → artifacts file, rewriting URL path to match the new repo name 3. config_key_renamed sed -i in ~/.gstack/config.yaml flips gbrain_sync_mode → artifacts_sync_mode 4. claude_md_block sed flips "- Memory sync:" → "- Artifacts sync:" in cwd CLAUDE.md and ~/.gstack/CLAUDE.md 5. sources_swapped gbrain sources add NEW (verify) → remove OLD (codex Finding #6: add-before-remove ordering, no downtime window). On remote-MCP mode, prints commands for the brain admin instead of executing. 6. done touchfile + delete journal User opt-out: any "n" or "skip-for-now" answer at the initial prompt writes a marker file that prevents re-prompting; user can re-invoke via /setup-gbrain --rerun-migration. 11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent re-run, journal-resume mid-flight, remote-MCP print-only path, add-before-remove ordering verification, add-fail → old source stays registered, CLAUDE.md field rewrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: regression suite + E2E for v1.27.0.0 rename Three new regression tests guard the rename's blast radius (per codex Findings #1, #8, #9, #12): - test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl, test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode); fails CI if any non-allowlisted file references them. - test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has no stale references in any */SKILL.md (the cross-product blind spot). - test/setup-gbrain-path4-structure.test.ts: structural lint over the Path 4 prose contract — STOP gates after verify failure, never-write- token rules, mode-aware CLAUDE.md block, bearer always via env-var. Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs): - test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs an HTTP MCP server, drives the skill via Agent SDK with a stubbed bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets the remote-http block, the secret token NEVER leaks to CLAUDE.md. - test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401; asserts the AUTH classifier hint surfaces, no MCP registration occurs, CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP" rule. touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at gate-tier so CI catches Path 4 regressions on every PR. Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log (legacy gstack-brain-init mentions in headers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump guidance: ~1500 line net change including a new path in /setup-gbrain, two new bin helpers, a journaled migration, 59 new tests, and a config key rename across the codebase). CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain → artifacts rename, the journaled migration, the verify-helper error classifier, the artifacts-init multi-host provider choice. Includes the canonical Garry-voice headline + numbers table + audience close per the release-summary format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: demote setup-gbrain Path 4 E2E to periodic-tier The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic — the model interprets "follow Path 4 only" prompts flexibly and can skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which makes the gate-tier assertions flaky. The deterministic gate coverage for Path 4 is in test/setup-gbrain-path4-structure.test.ts: a fast structural lint that catches AUQ-pacing regressions and prose contract drift in <200ms with zero token spend. That test is the right tool for catching the failure mode the gate-tier was meant to guard against. The Agent SDK E2E tests stay available on-demand for periodic-tier runs (EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts). Also tightened the verify-error assertion to the literal field shape ("error_class": "AUTH") instead of a substring match that false-matches the parent claude session's "needs-auth" MCP discovery markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: sync package.json version to 1.27.0.0 VERSION was bumped to 1.27.0.0 in f6ec11e but package.json was not updated in the same commit. The gen-skill-docs.test.ts assertion "package.json version matches VERSION file" caught the drift. This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check is designed for; the fix is the documented sync-only repair (no re-bump, package.json synced to existing VERSION). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…for code) (garrytan#1500) * feat(gbrain): add lib/gbrain-local-status classifier with 5-state engine status + 60s cache Foundation for split-engine gbrain: shared classifier used by both bin/gstack-gbrain-detect (preamble probe) and bin/gstack-gbrain-sync.ts (orchestrator SKIP-when-not-ok). Single source of truth. Probes via `gbrain sources list --json` and classifies stderr against the same patterns lib/gbrain-sources.ts:66-67 already uses ("Cannot connect to database", "config.json"). Returns one of: ok, no-cli, missing-config, broken-config, broken-db. Defensive default: unrecognized failures classify as broken-config so the raw stderr can be surfaced upstream. Cache at ~/.gstack/.gbrain-local-status-cache.json keyed on {home, path_hash, gbrain_bin_path, gbrain_version, config_mtime, config_size} with 60s TTL. Cache invalidates on any invariant change. --no-cache option busts the cache for callers that just mutated state (/setup-gbrain, /sync-gbrain after init/migration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(gbrain): rewrite gstack-gbrain-detect bash→TS + add gbrain_local_status field Replaces the bash detect helper with a bun shebang script sharing the gbrain_local_status classifier from lib/gbrain-local-status.ts with the sync orchestrator. Single source of truth for engine-status classification between preamble-probe and orchestrator-skip paths. Filename stays gstack-gbrain-detect (no .ts extension) so existing skill preamble callers shell out unchanged. Shebang `#!/usr/bin/env -S bun run` resolves bun at runtime. Output is key/type backward-compatible with the bash version per plan codex #5: the 9 pre-existing keys (gbrain_on_path, gbrain_version, gbrain_config_exists, gbrain_engine, gbrain_doctor_ok, gbrain_mcp_mode, gstack_brain_sync_mode, gstack_brain_git, gstack_artifacts_remote) stay identical in name + type + value semantics. One new key added: gbrain_local_status (5-state string enum). Updates the existing schema regression at test/gstack-gbrain-detect-mcp-mode.test.ts to include the new key. Adds test/gbrain-detect-shape.test.ts asserting the regression contract for future changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gbrain): orchestrator SKIP when local engine not ok + remote-http transcripts via artifacts pipeline Two changes in the sync orchestrator, both per plan D11/D12: 1. bin/gstack-gbrain-sync.ts: runCodeImport + runMemoryIngest call localEngineStatus() (shared classifier from lib/gbrain-local-status.ts). When status is not 'ok', return a SKIP stage result with a clear reason instead of crashing with "source registration failed: gbrain not configured". Brain-sync stage runs regardless — it doesn't depend on local engine. dry-run preview path is gated above the check so it continues to show would-do steps even when the engine is broken. 2. bin/gstack-memory-ingest.ts: when gbrain MCP is registered as remote-http (Path 4), persist staged transcripts to ~/.gstack/transcripts/run-<pid>-<ts>/ instead of the ephemeral ~/.gstack/.staging-ingest-<pid>-<ts>/ tmp dir, and SKIP the local `gbrain import` call entirely. The artifacts pipeline (gstack-brain-sync push to git, brain admin pulls and indexes) handles routing to the remote brain. Local PGLite (when present via Step 4.5) stays code-only. State recording still happens — prepared pages get their mtime+sha256 stamped under remote-http mode so the next /sync-gbrain doesn't re-stage them. Cleanup is skipped intentionally so the persisted dir survives until gstack-brain-sync moves it. Adds test/gbrain-sync-skip.test.ts covering 5 SKIP scenarios (broken-db, broken-config, no-cli, missing-config, ok pass-through). All 25 sync-related unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gbrain): v1.34.0.0 migration notice + transcripts allowlist for artifacts pipeline Per plan D5 + D11. Two pieces of the split-engine rollout: 1. gstack-upgrade/migrations/v1.34.0.0.sh — prints a one-time discoverability notice for existing Path 4 (remote-http MCP) users whose machine has no local engine yet. Tells them about /setup-gbrain Step 4.5 (the new local-PGLite opt-in). Silent for everyone else. User can suppress permanently via `gstack-config set local_code_index_offered true`. Touchfile at ~/.gstack/.migrations/v1.34.0.0.done makes it idempotent. 2. bin/gstack-artifacts-init — adds `transcripts/run-*/*.md` and `transcripts/run-*/**/*.md` to the managed allowlist so the gstack-memory-ingest persistent staging dir (used in remote-http mode per D11) gets pushed to the artifacts repo. Brain admin's pull job then indexes transcripts into the remote brain. Privacy class: behavioral (matches transcript content). Adds test/gstack-upgrade-migration-v1_34_0_0.test.ts with 5 cases: state match, no-MCP, local-config-present, opt-out, and idempotency. All 5 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gbrain): /setup-gbrain Step 1.5/4.5 + /sync-gbrain Step 1.5 templates Per plan D4, D10, D11, D12. Wires the skill prose to the new split-engine flow + classifier introduced in earlier commits. setup-gbrain/SKILL.md.tmpl: - Step 1: detect output description now includes the v1.34.0.0 gbrain_local_status field (5 values). - Step 1.5 (NEW): broken-db / broken-config remediation. AskUserQuestion with 4 options — Retry / Switch to PGLite / Switch brain mode / Quit (plan D4). Retry is recommended first since broken-db often = transient Postgres outage. PGLite is explicitly one-way + destructive (moves existing config to ~/.gbrain/config.json.gstack-bak-<ts>); rollback on init failure restores the .bak (plan D7). - Step 4d → Step 4.5 (NEW): in Path 4, after the verify step, offer local PGLite for code search. AskUserQuestion Yes/No (plan D10/D11). Yes path runs gstack-gbrain-install + `gbrain init --pglite --json` with the same rollback-safe sequence. No path skips Steps 3/4/5/7.5. - Step 10 verdict (Path 4): adds "Code search" row reflecting Step 4.5 choice. Updates "Transcripts" row to describe the new D11 routing (artifacts repo → remote brain). sync-gbrain/SKILL.md.tmpl: - Step 1 split-engine prose: corrects the prior misleading claim that "memory routes through whatever setup-gbrain configured, including remote-MCP" (codex finding #3). Memory stage shells out to local `gbrain import` in local-stdio mode; in remote-http mode it persists to ~/.gstack/transcripts/ for the artifacts pipeline. - Step 1.5 (NEW): local-engine pre-flight. STOP on no-cli, broken-config, broken-db. Soft skip (continue with code+memory SKIP) on missing-config + remote-http per plan D12. Surfaces actionable user remediation message instead of the orchestrator crashing two stages with ERR. Regenerated SKILL.md for all hosts (claude, kiro, opencode, slate, cursor, openclaw, hermes, gbrain). All 712 skill-validation + gen-skill-docs tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain): .bak-rollback contract for Step 1.5 / 4.5 init failure path Per plan D7 (rollback semantics) and codex #10 (rollback scope). The /setup-gbrain skill instructs the model to follow a specific shell sequence when running `gbrain init --pglite` against an existing config: 1. mv ~/.gbrain/config.json ~/.gbrain/config.json.gstack-bak-<ts> 2. gbrain init --pglite --json 3. on non-zero exit: mv .bak back; surface error This test verifies that contract using a fake `gbrain` binary that fails on init. Three cases: - FAILURE: gbrain init exits non-zero → broken config restored to original path, no leftover .bak. - SUCCESS: gbrain init exits 0 → new config in place, .bak survives for audit (user reviews + deletes manually). - SCOPE: any partial PGLite directory at ~/.gbrain/pglite/ is NOT auto-cleaned. We only promise to restore config.json; PGLite cleanup is the user's call (codex #10). If the skill template rewrites this sequence in a future change, this test should fail until the test's shell is updated too. That's the point — keep the test and the skill template aligned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain): periodic E2E for /setup-gbrain Path 4 + Step 4.5 Yes flow End-to-end coverage of the new opt-in question via runAgentSdkTest. Stubs the MCP endpoint at /tools/list with a 200 response carrying a fake gbrain v0.32.3.0 serverInfo, and fakes the gbrain + claude CLIs so init writes a PGLite config and mcp add succeeds. Asserts the model: 1. invokes gstack-gbrain-install (Step 4.5 Yes branch) 2. invokes `gbrain init --pglite --json` 3. writes a working ~/.gbrain/config.json with engine=pglite 4. registers the remote MCP via `claude mcp add --transport http` 5. never leaks the bearer token to CLAUDE.md Classified as periodic-tier per plan D6 (codex #12 flagged AgentSDK flakiness; gate-tier coverage of the split-engine behavior lives in the deterministic unit tests at gbrain-local-status.test.ts and gbrain-sync-skip.test.ts). Touchfile fires the test when the skill template, install/verify/init helpers, the local-status classifier, or the agent-sdk-runner harness changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(gbrain): bump migration to v1.35.0.0 after main merge main shipped v1.34.0.0 (factory-export submodule) + v1.34.1.0 (update-check hardening) while this branch was in flight. The migration file I named v1.34.0.0.sh now belongs at v1.35.0.0 — the next minor on top of main, matching the scale of split-engine work (new lib + orchestrator skip + template overhaul + transcripts routing). Renames the migration script and its test file; updates all internal version references in both files. Behavior unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(gbrain): memoize gbrain resolution + use --fast doctor in detect Cuts detect's wall time substantially by sharing fork-exec results between the helper that walks the JSON output and the localEngineStatus classifier from lib/gbrain-local-status.ts. Before: detect made 2x `command -v gbrain` calls (one in detect's detectGbrain, one in the classifier's resolveGbrainBin) and 2x `gbrain --version` calls. With memoization keyed on PATH, both collapse to one fork each (~400ms saved per skill preamble). Also adds `--fast` to the `gbrain doctor --json` call in detect so a broken-db config (Garry's repro) doesn't burn a full 5s timeout on the doctor's DB-connection check. The classifier still probes the DB directly via `gbrain sources list --json` for engine reachability — that's `gbrain_local_status`, separate from the coarse `gbrain_doctor_ok` summary flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain): relax E2E assertions to smoke-test contract Per codex #12 (AgentSDK harness is non-deterministic): the E2E now asserts the model followed the split-engine path WITHOUT requiring a specific subcommand sequence. Three assertions: 1. AskUserQuestion was called (model reached interactive branches) 2. At least one of {gstack-gbrain-install, `gbrain init --pglite`, `claude mcp add`} fired (model followed the skill, not a no-op) 3. The fake bearer token never leaked to CLAUDE.md (security regression) Deterministic per-step coverage of the same flow lives in the gate-tier unit tests (gbrain-local-status, gbrain-sync-skip, init-rollback, upgrade-migration). The E2E exists to catch the "model can't follow the skill at all" regression class, not to pin the exact tool sequence. Test passes in 280s against the live Agent SDK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(version): bump CLI smoke-test timeout to 15s (flaky at 5s under load) The gstack-next-version integration smoke test spawns a child process that does git operations + sibling-worktree probing. Wall time hovers 4-5s on M-series Macs; flakes at exactly 5001-5002ms when the test suite runs under load (bun's parallel scheduling). Bumping per-test timeout to 15s eliminates the flake without changing test logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.37.0.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: design for build plan-review convergence loop Spec for /build's planSynthesizer ↔ planReviewer loop: in-process round loop, mid-loop user triage gate, plan-file-as-ledger cross-round memory, set-aware adaptive cap. Triggering case was bundle-1 (5→3→2→manual r4, ~$5-10) where rigor caught real bugs but the operator was locked out until round 3. New default brings user in at round 1, makes each round cheaper via in-process loop, and adapts the cap to actual convergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): pin adaptive-cap exit_reason mapping in convergence design Self-review caught a small ambiguity: the decision table listed two bail-out triggers (re-raises-only and regression) but the exit_reason enum had three adaptive-cap values without an explicit mapping. Fixed: enum now has exactly adaptive_cap_re_raises_only and adaptive_cap_regression, and the decision table rows reference which exit_reason each triggers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): implementation plan for build plan-review convergence 19-task TDD-structured plan for the convergence loop spec. Each task is self-contained: failing test → minimal impl → passing test → commit. Covers types (T1), annotation contract (T2), history JSONL (T3), convergence aggregate (T4), adaptive cap (T5), TTY triage (T6), non-TTY triage (T7), prompts (T8), main loop (T9), CLI wire-in (T10), disputed counting (T11), three integration tests (T12-T14), E2E (T15), SKILL.md shrink + version bump (T16), README (T17), CHANGELOG (T18), final verification (T19). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/types): add convergence types for plan-review loop Extends PlanReviewVerdict with optional triage_decisions, round_history_path, convergence, interrupted_at_objection fields. Adds TriageDecision and ConvergenceSnapshot interfaces and ROUND_HISTORY_FORMAT_VERSION constant. All new fields are optional so existing call sites compile unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(build/types): camelCase convergence field names; drop NEW: prefix Code-quality review flagged inconsistency: existing PlanReviewVerdict fields use camelCase (reviewedBy, round) but the new convergence fields landed as snake_case (triage_decisions, round_history_path, etc). Converts all new TypeScript interface fields to camelCase to match the file's established convention. JSONL wire formats in later tasks can still use snake_case via JSON.stringify of manually-shaped objects -- TypeScript types do not need to mirror JSONL key shape. Also drops the informal "NEW:" prefix from JSDoc comments and adds a one-line doc for TriageDecision.decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-reviewer): add round-annotation read/write contract Adds parseRoundAnnotations, writeRoundAnnotation, parseRoundHistoryHeader, updateRoundHistoryHeader exported from plan-reviewer.ts. These implement the cross-round memory contract: each round's triage decisions and synth resolutions are written into the plan file as HTML comment blocks above the matching '### Phase N' heading, plus a top-of-plan history block. The next round's reviewer reads these to know what's already been decided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-reviewer): ROUND N REVIEWER attaches to round N's entry The original Task 2 commit had the parser attach ROUND N REVIEWER lines to round N-1's entry. That choice was forced by a self-contradicting test fixture (a ROUND 2 REVIEWER: line inside an annotation the test required to have rounds.length === 1). Both the design spec and Task 9's planned writer in runPlanReviewLoop treat ROUND N REVIEWER as a round-N observation paired with round N's USER decision (if any). The N-1 offset would have corrupted every annotation round-trip in Tasks 9, 11, and the integration tests. Fixes the parser to attach directly to round N. Updates the broken test fixture to assert the realistic multi-round shape: round 1 carries USER/RESOLUTION, round 2 carries only REVIEWER (because no round-2 USER decision happened on this annotation -- the reviewer did not re-raise). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-reviewer): silent corruption + missing JSDoc on annotation writers Three issues from code-quality review: 1. writeRoundAnnotation's two String.replace() call sites used string replacements, which interpret \$&, \$', \$\` and \$N as substitution patterns. A plan annotation whose field text contains those tokens (plausible when reviewing regex / shell / template code) would have produced silently corrupted output. Both call sites now use function replacements which suppress the interpolation. 2. Misleading comment in parseRoundAnnotations claimed the header always names "round 1". An annotation first written in round 2+ opens with ROUND 2 CRITICAL [...], which the parser already handles correctly -- the comment was the only wrong thing. 3. RoundHistoryEntry, parseRoundHistoryHeader, and updateRoundHistoryHeader gained JSDoc explaining purpose, the optional finalLine parameter (used on loop exit for the 'final: APPROVED after N rounds, ...' line), and the atomic-write invariant. New regression test covers the \$& interpolation foot-gun by round-tripping fields containing every replacement-pattern token. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): add round-history JSONL writer New module plan-review-loop.ts will house the in-process round loop, triage gate, and adaptive-cap. First commit establishes the per-build-state history JSONL: append-only, corruption-tolerant reads, round-counter derivation. appendHistoryEntry / readHistoryEntries / deriveRoundNumber pair with the existing plan-reviewer.ts::readPlanReviewRound for cross-launch resume. HistoryEntry uses camelCase field names (objectionCountRaw, noForwardProgress, reRaises, newObjections) matching the Task 1 convention; the JSONL on disk serializes the same camelCase, so jq queries can use the field names verbatim. Also registers plan-review-loop.ts in coverage-matrix.test.ts so the MODULE_TEST_OWNERS invariant stays green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): cross-build convergence aggregate writer writeConvergenceAggregate appends one line per completed build to ~/.gstack/analytics/convergence.jsonl. Captures trajectory, exitReason, total accept/reject/defer counts, wall time, and annotation parse errors -- the tuning signal needed to validate MAX_ROUNDS=5 and the adaptive cap rule over weeks of builds. Best-effort write: aggregate analytics never block the build path. ConvergenceAggregate uses camelCase fields matching the Task 1 convention; jq queries use the field names verbatim. Adds the ExitReason union as a distinct exported type so Task 5's adaptive-cap decision and Task 9's loop exit can return values type-checked against it. Also extends MODULE_TEST_OWNERS coverage entry for plan-review-loop.ts to include the new test file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): set-aware adaptive cap rule computeConvergenceSnapshot compares round-k accepted objections against prior-round annotations parsed from the plan file. Classifies each as re-raise (prior accepted-and-resolved, same (location, severity)) or new objection. Rejected-prior matches are deliberately neither -- they are reviewer-prompt-fidelity signals, not synth-failure signals. shouldBailAdaptive implements the decision table from the design spec: hard cap at MAX_ROUNDS triggers stalemate_gate, accepted-count regression triggers adaptive_cap_regression, re-raises-with-no-new triggers adaptive_cap_re_raises_only. Precedence is explicit and tested across six scenarios covering round 1, mid-loop continue, both bail paths, the hard cap, and adaptive-disabled. camelCase fields throughout (priorRoundAccepted, reRaises, newObjections, noForwardProgress) matching Task 1 convention. Exports ConvergenceSnapshotInput, RoundConvergenceSnapshot, AdaptiveCapInput, AdaptiveCapDecision so Task 9's runPlanReviewLoop can compose with them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): TTY triage gate for per-objection decisions runTriageGateTTY prompts user per CRITICAL objection with the 8-key menu (a/r/d/v/A/R/s/q + Enter), captures optional rationale, surfaces re-raises with prior rejection context, and supports fast-path A/R or early quit. Stream-based input/output so tests can drive it without a real TTY, following the existing build/orchestrator/__tests__/feature-review-prompt.test.ts pattern: Readable.push(Buffer.from(...)) to avoid object-mode readline pitfalls. Uses a line-queue/waiter pattern to decouple readline event emission from the sequential ask() calls — avoids the ERR_USE_AFTER_CLOSE trap that occurs when readline output ownership conflicts with the injected output stream. 8 tests cover: per-objection accept with rationale capture, mixed reject/defer/accept, [A]ccept-ALL and [R]eject-ALL fast paths, [s]top remainder, [q]uit early, and re-raise framing with prior-rejection context. Returns TriageGateResult with quitEarly / fastPathed flags so Task 9's loop can route exit codes correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): non-TTY triage modes for CI / scripts / agents runTriageGateNonTTY: synchronous, no readline. Three modes, picked via the --plan-review-noninteractive CLI flag added in Task 10: - auto-accept (default, extends the existing IMPORTANT-objection non-TTY behavior to CRITICAL): accept every objection, re-synth, continue - fail-fast: exit code 3 on the first round with CRITICAL — strict CI gate - auto-reject: reject every objection, annotate as rejected, proceed — escape hatch for known-noisy reviewer runs Returns NonTTYTriageResult with decisions[] (one per input objection) and shouldFailFast flag. camelCase fields throughout matching Task 1 convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-reviewer): annotation-aware reviewer + synth prompts Extends PLAN_REVIEW_PROMPT with a paragraph teaching the reviewer to read prior-round annotations (USER:accept/reject, RESOLUTION:pending/disputed, REVIEWER:re-raised) and not re-raise settled concerns. Promotes the const to an export so the new snapshot test can verify it without dynamic import gymnastics. Adds new exported SYNTH_REVISION_PROMPT (formerly inline in build/SKILL.md.tmpl Step 5.5 -- moved to plan-reviewer.ts so Task 10's cli.ts can use it from a typed import) instructing the synthesizer to honor user triage decisions, write RESOLUTION lines, and mark disputes when the user accepted something the synth thinks is wrong. Snapshot-tested so unintended drift surfaces in CI; non-snapshot assertions pin the core annotation contract phrases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): in-process round loop runPlanReviewLoop Composes the reviewer call, triage gate (TTY or non-TTY), annotation writes, convergence snapshot, adaptive-cap decision, history JSONL append, top-of-plan history header update, and synth dispatch -- all in-process so re-launch overhead between rounds is eliminated. runStalemateGate handles the user-facing AskUser at adaptive-cap-bail or MAX_ROUNDS exit. Uses the same line-queue/waiter readline pattern as runTriageGateTTY (Task 6) to work around Bun's readline ERR_USE_AFTER_CLOSE on multi-question sequences. Single-readline-per-loop: when isTTY, runPlanReviewLoop stands up one shared readline + ask function and threads it into runTriageGateTTY and runStalemateGate via an optional askFn injection point. Per-call readlines drain the entire buffered input stream on the first close, starving later rounds. Existing standalone-call tests for the gates keep working because askFn defaults to undefined and each gate opens its own readline in that case. Tests cover: APPROVE round 1 (no synth, fastest path), bundle-1 trajectory 5->3->2->0 with three synth invocations, adaptive bail on re-raises stall at round 2. disputed_resolutions per-round count is a placeholder zero in this commit; Task 11 wires it to the real RESOLUTION: disputed annotation detection after each synth call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/cli): wire runPlanReviewLoop into startup path Replaces the single-shot runPlanReview + reconcilePlanReview call with the new in-process loop. Adds three CLI flags: --plan-review-max-rounds=N (default 5; range 1..20) --plan-review-no-adaptive-cap (off; disables forward-progress bail) --plan-review-noninteractive=<auto-accept|fail-fast|auto-reject> (default auto-accept) Exit codes preserved on the existing side (0 approve, 1 runtime, 2 test fail); new contract: 3 = stalemate (user picked [m] or non-TTY fail-fast hit CRITICAL) 4 = user abort (user picked [q] at gate) 130 = SIGINT during triage (unchanged) Legacy plan-review-report.json is still written from loopResult.finalVerdict so SKILL.md.tmpl Step 5.5 stalemate handler keeps working without changes until Task 16 shrinks it. synthFn adapter dispatches a configured Claude role with the SYNTH_REVISION_PROMPT from Task 8 against the now-annotated plan file. Since the dedicated planSynthesizer role does not yet exist in role-config.ts, the adapter falls back to planReviewer's role config and the timeout falls back to BUILD_DEFAULTS.timeoutsMs.planReview. Both are forward-compatible: once planSynthesizer lands, the `as any` accessors pick it up automatically. reconcilePlanReview and readPlanReviewRound are no longer imported in cli.ts now that the loop owns reconciliation and round tracking; they remain exported from plan-reviewer.ts for tests that exercise them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/plan-review-loop): count disputed resolutions per round Replaces the Task 9 placeholder disputedResolutions.push(0) after the synth call with a real count: re-parse plan annotations, find this round's entries with RESOLUTION starting "disputed", tally them. The synth writes RESOLUTION: disputed -- <reason> when it disagrees with a user-accepted objection. Each disputed entry surfaces in the next round's triage so the user can re-decide. The aggregate count in convergence.jsonl is the tuning signal for "how often does synth disagree with user accepts" -- high counts suggest reviewer prompt fidelity issues or user triage rationale weakness. Annotation parse errors during this re-parse increment annotationParseErrors and log to console.warn; they do not crash the loop. The placeholder pushes on adaptive-bail-continue and on APPROVE paths stay 0 (no synth invocation means no resolutions to count). Also fixes annotationParseErrors declaration from const to let, which was previously a no-op placeholder but now receives increments inside the error catch path. New test covers the disputed-resolution path: round 1 has 2 accepted CRITICAL, synth marks one disputed and one applied, aggregate's disputedResolutions[0] is 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(build/integration): bundle-1 trajectory converges 5→3→2→0 Integration test using stub reviewer (replaying fixture data from the real bundle-1 trajectory: 5 CRITICAL → 3 new → 2 new → APPROVE) and a stub synth that replaces RESOLUTION:pending with RESOLUTION:applied. Verifies the end-to-end story: - 4 rounds, 3 synth invocations, exit APPROVE - history JSONL has 4 lines (one per round) - convergence aggregate has correct trajectory and totals: trajectoryRaw [5, 3, 2, 0], totalAccepted 10, finalVerdict APPROVE - plan file accumulates ROUND 1/2/3 annotations + top-of-plan history block Fixture data lives in test/fixtures/build-convergence/ for sharing across Tasks 13, 14, 15 (the other integration + E2E tests). Coverage-matrix updated: plan-review-loop.ts now lists the integration test as an additional owner alongside the unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(build/integration): adaptive cap bails on re-raises-only Integration test verifying the no-forward-progress bail path: Round 1 raises 2 CRITICAL, user accepts both, synth marks RESOLUTION non-pending (a real fix attempt). Round 2 raises the same 2 CRITICAL with zero new objections — adaptive cap detects this as a true stall (reRaises=2, newObjections=0). Bail-out gate fires. User picks [m]anual mode → exit code 3, outcome user_manual, exit_reason user_manual in convergence.jsonl. Synth is invoked exactly once (round 1) — the bail-out fires before the round-2 synth call, saving the wasted API spend the design was built to prevent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(build/integration): synth disputes a user-accepted objection Integration test for the synth-dispute escape hatch: Round 1 reviewer raises 1 CRITICAL ("use bcrypt"). User accepts. Synth disagrees -- instead of applying the fix, it writes RESOLUTION: disputed -- bcrypt conflicts with FIPS in this build. The dispute is captured in two telemetry surfaces: - disputedResolutions[0] === 1 in convergence.jsonl - The plan annotation preserves the synth's reasoning verbatim so the next round's reviewer (or user, if it re-surfaces) sees WHY synth declined to apply the fix. Round 2 reviewer reads the disputed annotation and approves (validating that a dispute does not force a re-raise -- the reviewer can decide). Loop exits clean with outcome "approved". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): real Codex respects round-annotation contract Layer 4 E2E from the convergence design spec. Drives runPlanReviewLoop with the real Codex planReviewer against the bundle-1 fixture plan created in Task 12. Stub synth rewrites RESOLUTION: pending so the round-2 reviewer call sees a non-pending resolution annotation. Asserts that once round 1's objections are accepted and annotated as RESOLUTION: applied, round 2's Codex call does not re-raise them (reRaises[1] === 0) -- proving the reviewer prompt addition (Task 8) actually changes real-Codex behavior, not just our parser tests. Classified gate tier per CLAUDE.md; ~$0.50/run. Gated by EVALS=1; free tests run without invoking Codex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(build/skill): shrink Step 5.5 to handle exit codes only The in-process loop in plan-review-loop.ts now resolves most rounds inside the CLI. Step 5.5 shrinks to a four-branch dispatch on exit codes 0/3/4/130, plus the existing exit-1/2 paths. The cross-round annotation history lives in the plan file, so manual edits between launches just need to preserve those annotations so the next round's reviewer has context. Bumps skill version 1.24.0 -> 1.25.0 (MINOR -- new capability). NOT bumping top-level VERSION per fork rule in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(build/orchestrator): document the plan-review convergence loop Adds a section to the orchestrator README covering the loop architecture, exit code contract, flags, telemetry file layout, triage gate key map, and module boundaries. Cross-references the design spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): build skill 1.25.0 — in-process plan-review loop User-facing entry for the convergence loop change. NOT bumping top-level VERSION per CLAUDE.md fork rule -- only the build skill frontmatter bumped (Task 16). Entry covers what changed for the user, projected metrics from the bundle-1 case study, itemized add/changed/contributor changes, with spec cross-reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(build/orchestrator): clarify exit code 3 ambiguity + fix dropped indent Pre-landing review surfaced two README findings: 1. Exit code 3 has two meanings since this PR: lock contention at startup AND plan-review stalemate. The general exit-code table now reflects both. Exit codes 4 (user abort) and 130 (SIGINT) added to the table for completeness. 2. The troubleshooting bullet for --mark-phase-committed lost its 2-space leading indent during the table reformatting pass. Restored so the bullet continuation renders correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(build/orchestrator): strengthen 8 test gaps from pre-landing review Pre-landing review found 8 test-quality findings. Addressed here: 1. types-convergence.test.ts: trivial echo-assertions replaced with behavioral tests that exercise types via real function calls. 2. plan-annotation-round-trip.test.ts: new test for the fallback-prepend path in writeRoundAnnotation when the referenced Phase heading is absent from the plan text. 3. loop-converge-bundle-1.test.ts: added structural placement assertion to confirm annotations land in the expected position (the fixture intentionally exercises the fallback-prepend path for rounds 2-3 which raise objections at non-existent phases). 4. plan-reviewer-triage-tty.test.ts: new test for the [v]iew prose key path, verifying the assessment text is shown before the re-prompt. 5. skill-e2e-build-convergence.test.ts: tier-gate pattern aligned with other E2E tests (EVALS=1 && EVALS_TIER==='gate', no 'all' fallback). Core contract assertion no longer wrapped in a conditional that would let the test pass vacuously if Codex approves round 1. 6. plan-reviewer-loop.test.ts: new test exercising the max-rounds stalemate path (reviewer always REVISE, non-TTY auto-accept, confirms rounds === maxRounds at exit). 7. adaptive-cap-set-aware.test.ts: new test for the 'fewer accepted with new objections' branch, confirming the loop continues when accepted count decreases but new dimensions appeared. 8. History-entry expectations updated for tests where the production fix (IMPORTANT/SUGGESTION counts now recorded) changes the asserted shape. Most tests use CRITICAL-only fixtures and are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-review-loop): restore IMPORTANT/SUGGESTION handling + 9 cleanups Pre-landing review surfaced one CRITICAL contract violation and 9 informational items. All addressed in this commit: CRITICAL — IMPORTANT/SUGGESTION objections silently dropped (regression). The early-return at the APPROVE / zero-CRITICAL branch treated REVISE-with- non-CRITICAL-only as APPROVE without annotating IMPORTANT/SUGGESTION objections in the plan file or recording them in the history. The pre- loop reconcilePlanReview path handled them via inline annotate-and- proceed; that path is no longer reached because cli.ts dropped the reconcilePlanReview call when wiring runPlanReviewLoop. Restored by filtering important/suggestion, calling writeRoundAnnotation for each, and recording real counts in HistoryEntry. Maintainability cleanups: - LoopOutcome aliased to ExitReason (byte-identical union); the as-cast in writeAggregate removed. - Dead sigint literal removed from ExitReason; SIGINT handling lives in cli.ts signal handler, not the loop return type. - ROUND_HISTORY_FORMAT_VERSION removed from types.ts; was never imported. - Always-false interrupted field removed from ConvergenceAggregate; no code path mutated it. - Stale "Task 11 placeholder" comment on disputedResolutions replaced with accurate description of what the array contains. - makeReadlineAsk helper extracted; the ERR_USE_AFTER_CLOSE workaround for Bun readline is now in one place instead of three duplicated blocks (triage TTY standalone, loop shared, stalemate standalone). Performance cleanups: - Two redundant readFileSync calls per round eliminated: the plan-text read for re-raise detection is reused for annotation-write, and the in-memory updatedPlan is passed to updateRoundHistoryHeader instead of re-reading from disk after writing. - existsSync + statSync collapsed to statSync with try/catch. Performance finding #4 (pre-parse annotations once before writeRoundAnnotation loop) deferred — requires writeRoundAnnotation signature change in plan-reviewer.ts; sub-ms cost at typical scale (K=5-10 objections, 20-50KB plan). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-reviewer): close 4 silent-data-loss annotation bugs Red Team review verified 4 round-trip corruption modes in the annotation parser/writer. Each fails silently with no warning. 1. '-->' in any user-supplied field closes the HTML comment early. Subsequent lines bleed into plan text; parser silently drops rounds[]. 2. '"' in rationale stops the [^"]* regex group at the first inner quote. Rationale field silently dropped. 3. ']' in location stops the [^\]\n]+ regex group at the first inner bracket. ENTIRE annotation silently lost. 4. Non-canonical whitespace in existing in-file annotations makes writeRoundAnnotation's merge path a silent no-op. Round 2's decision silently lost. Fix #1-3: HTML-entity encoding on serialize, decode on parse. Encode ampersand first (so user input containing literal escape sequences round-trips correctly); decode ampersand last. Encoded sequences: ']' -> ']', '"' -> '"', '-->' -> '-->', '&' -> '&'. Applied to userRationale, issue, suggestion, location, resolution. userDecision is a fixed enum (no encoding). Operator warning logged on first character requiring encoding per field so silent encoding never lands without an audit trail. Fix #4: writeRoundAnnotation merge path walks the regex over planText directly to locate the actual byte range of each existing block (handling non-canonical whitespace, CRLF, external-editor reformat). A fresh local regex avoids the shared ANNOTATION_BLOCK_RE's lastIndex being reset inside parseRoundAnnotations during the loop, which otherwise caused infinite loops on the same span. Twelve new tests pin the round-trip invariants: - '-->' / '"' / ']' / '%' / '&' all round-trip correctly - merge into non-canonical 8-space-indent block preserves round 2 - literal ']' does not decode to ']' on parse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-review-loop, cli): close 8 Red Team findings CRITICAL #1 (regression from round-1 fix): TTY mode now prompts per IMPORTANT objection (y/skip/all) instead of unconditionally auto-accepting. Restores the pre-loop reconcilePlanReview contract that was lost when the loop went in-process. SUGGESTION objections remain auto-annotated (no prompt). Non-TTY mode still auto-accepts per existing CI contract. CRITICAL #6: runPlanReviewLoop now calls deriveRoundNumber on startup, reads prior history, and starts the round counter at the next available number on resume. Trajectory arrays are hydrated from history so shouldBailAdaptive has the right context. Resume after exit-3 or exit-130 no longer duplicates round numbers in history.jsonl or writes a second convergence aggregate record. INFO #7: objectionCountRaw set to critical.length only (matches JSDoc and other paths). Verdict written as verdict.verdict, not hardcoded "APPROVE", in the REVISE-with-zero-CRITICAL branch. INFO #8: Early-exit paths (user quit, non-TTY fail-fast) now push sentinel -1 to trajectoryAccepted, reRaisesArr, reRejectedArr, disputedResolutions to maintain length parity with trajectoryRaw. User-quit path also writes an INTERRUPTED history entry. INFO #9: synthFn rejection wrapped in try/catch. On failure, the loop writes an INTERRUPTED history entry, writes the aggregate with exitReason: reviewer_unavailable, closes shared resources, and exits with code 1. INFO #10: Plan-file writes use atomic tmp + rename pattern. SIGINT during a write can no longer leave a half-written plan that the parser would silently skip on resume. INFO #11: HOME fallback uses os.homedir() with os.tmpdir() as secondary. Containers without HOME no longer pollute the project worktree. INFO #12: [s]top key in runTriageGateTTY no longer prompts for the current-objection rationale before exiting. The current decision is pushed with empty rationale and the loop moves on. Tests added: resume-from-history, synth-failure-handled, TTY IMPORTANT prompt, array-length parity on quit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-review-loop): close adversarial review CRITICALs Claude adversarial review found two real bugs the 3-specialist + Red Team rounds missed. Both verified by reading the code. C1 (CRITICAL): TTY infinite-loop on stdin close. The decision-loop default case in runTriageGateTTY (and two siblings in the IMPORTANT prompt + runStalemateGate) was `opts.output.write("Invalid input '${ans}'...")` with no EOF handling. Once makeReadlineAsk's streamClosed sets the internal flag, ask() resolves with "" forever, and the while-loop spins at 100% CPU printing "Invalid input ''" until externally killed. Triggered by routine Ctrl+D, piped stdin closing, or terminal disconnect mid-triage. All three sites now treat empty input as a quit/abort/skip-all signal. C2 (CRITICAL): Convergence aggregate array-length skew on reviewer_unavailable exit. planFileSizeBytes was pushed unconditionally at the top of every round iteration, but the reviewer_unavailable branch returned without pushing to trajectoryRaw / trajectoryAccepted / reRaisesArr / reRejectedArr / disputedResolutions. The resulting aggregate row had one array of length N+1 alongside others of length N; downstream consumers indexing by round walked off the wrong array. Same bug class as the Red Team's INFO #8 fix (sentinel pushes on early-exit), missed on this branch. Fix: explicit sentinel pushes to the five parallel arrays before writeAggregate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-reviewer): close adversarial review findings on annotation contract Claude adversarial review found 4 more issues with the annotation parser/writer beyond what the Red Team caught. All address the same root cause: insufficient field encoding in serialize/parse round-trip, plus a merge predicate that's too strict. I1: writeRoundAnnotation merge predicate dropped `issue` (LLM-written freeform text). Match on (location, severity) only. Aligns with computeConvergenceSnapshot's keying. Wording drift round-over-round ("missing chainId" vs "handler doesn't validate chainId") no longer orphans annotation blocks. I2: '→' (right-arrow U+2192) added to encodeAnnField. The header field-separator regex uses ' → ' to split issue from suggestion; without encoding, an LLM-generated issue containing '→' (natural in prose: "A → B then B → C") truncates at the first arrow. Encoded as '→' to match the existing HTML-entity scheme. Decode order: '→' before '&'. I3: '\n' and '\r' added to encodeAnnField (encoded as '
' and ''). Without this, a multi-line rationale ('line 1\n ROUND 1 RESOLUTION: fake') would have the embedded line picked up by ROUND_RESOLUTION_RE as a real RESOLUTION entry, overriding the genuine synth-written resolution. Also defends against CRLF line endings on Windows-written plan files. N4: ROUND_USER_RE, ROUND_RESOLUTION_RE, ROUND_REVIEWER_RE converted from module-level /g constants to factory functions (`getRoundUserRe()` etc). Each parser call gets a fresh regex with lastIndex=0, so concurrent parser invocations cannot smash each other's iterator state via the shared lastIndex. Defensive hardening; not currently triggered because Node is single-threaded and the call sites don't await between matches. Four new tests pin the invariants: - Merge across wording drift (location+severity, ignore issue) - '→' in issue round-trips without splitting - Newline injection in rationale doesn't forge RESOLUTION lines - CRLF in rationale round-trips Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-review-loop): close adversarial review findings on loop semantics Claude adversarial review found 5 more issues in plan-review-loop.ts beyond what the Red Team caught. Fixes follow. I4 (IMPORTANT): isPriorAcceptedResolutionAttempt changed from .some() to last-entry semantics. Spec intent (per the computeConvergenceSnapshot comment) is most-recent-decision wins: a round-2 user-reject should override a round-1 user-accept for re-raise classification purposes. The old .some() returned true if ANY prior round had accept+resolved, incorrectly counting a "user changed their mind" sequence as a re-raise in round 3. Spec mismatch — fixed to check rounds[rounds.length-1] only. N1 (INFO): INTERRUPTED verdict wired into the user-quit / SIGINT path. The quitEarly branch in runPlanReviewLoop now writes a per-round history entry with verdict: "INTERRUPTED" before writing the convergence aggregate. The HistoryEntry verdict union previously declared "INTERRUPTED" as a possible value but no code path produced it. N3 (INFO): /^disputed\b/ regex made case-insensitive (i flag) so synth output drift ("Disputed", "DISPUTED") is still counted as a disputed resolution. Tuning signal robustness across model behavior changes. N5 (INFO): priorRejectRationale Map now hydrates from prior plan annotations on resume (startRound > 1). Without this, re-raise framing on resumed runs showed empty strings for the user's earlier rejection rationale even though the rationale was present in the plan file. N6 (INFO): atomicWriteFile tmp filename suffix extended with crypto.randomBytes(8) for cross-process collision safety. The pid+ms approach was unique within a single Node process but container-in-container scenarios could share both. N7 (empty rationale to undefined on round-trip) is owned by plan-reviewer.ts (parallel scope) — not addressed here. Three new tests cover: - last-round-wins re-raise classification (positive + negative) - case-insensitive 'Disputed' counted as disputed resolution - priorRejectRationale hydration on resume shows prior rationale Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/plan-review-loop, cli): close 3 Codex HIGH findings on loop semantics H2: synthFn now propagates the SubAgentResult's exit-code as `ok`, and runPlanReviewLoop checks the returned ok. A synth that hit a timeout, model-not-found error, or non-zero exit no longer silently masquerades as a successful round. Failure routes through the synth_failure exit reason added in c5e7b3e. H3: [c] Continue anyway at the adaptive bail gate now runs the synth before looping. Previously the `continue` statement jumped straight to the next round iteration, skipping synth dispatch entirely — the next paid reviewer round then reviewed the same unresolved plan with RESOLUTION: pending unchanged, virtually guaranteeing re-raises and burning the round. Extracted the synth-call into a helper so the normal-flow path and the continue path share one code path. H4: Resume past maxRounds (startRound > maxRounds, e.g., user re- launched after exiting manual mode at the cap) no longer falls through the empty for-loop to a `lastVerdict!` non-null assertion on a null value. Added a startRound > maxRounds guard that synthesizes an APPROVE verdict ("Resume past maxRounds; review skipped.") and returns with outcome: "approved", round: startRound - 1. Three new tests pin the invariants: - synthFn returning ok:false routes to synth_failure exit code 1 - [c] continue invokes synth before next reviewer round - resume with startRound > maxRounds doesn't crash Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build/cli, plan-review-loop): close 3 Codex structured review P1/P2 findings Codex structured review (the formal --base main P0/P1/P2/P3 pass) found three issues that bypass the plan-review gate. All closed here. P1 #1: cli.ts:9935-9963 only handled exit codes 3 / 4 / 130 then fell through to `state.planReview = loopResult.finalVerdict; saveState; ...` and proceeded into implementation. exitCode 1 (synth_failure) silently bypassed the gate. Now exitCode 1 persists state.planReview with status: "synth_failure" and throws ExitError(1). The build stops; the resume gate (see next fix) picks it up. P1 #2: cli.ts:9814-9818 resume gate only re-ran the loop when state.planReview was missing or status === "critical_exit_pending". Status values "user_aborted" and "synth_failure" were treated as "already reviewed" — the next resume saw a REVISE verdict and proceeded to phases. Both pending statuses now route through the gate so resume picks up the loop as the docs promise. P2: plan-review-loop.ts:851 resume-past-maxRounds auto-approved with a synthetic APPROVE verdict purely based on history length. A user who typed `gstack-build resume` without actually editing the plan bypassed the review gate. Now runs ONE more verification reviewer call on the (potentially-edited) plan. APPROVE → proceed. REVISE with CRITICAL → exit STALEMATE (3); the user must edit further or pass --no-plan-review to explicitly override. The existing H4 test ("returns approved with synthetic verdict") was updated to assert the new behavior (one reviewer call, APPROVE result), and a companion H4-P2 test covers the REVISE-from-verification path that exits STALEMATE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: post-ship documentation update for plan-review convergence (v1.40.5.0) - CHANGELOG.md: remove stray merge-artifact line (> > > > > > > origin/main) and fix stray # in Design spec bullet point in For contributors section - build/orchestrator/README.md: add plan-reviewer.ts, plan-review-loop.ts, drain-faults.ts, halt-event-helpers.ts, halt-events.ts to Architecture module map (all added on this branch, were missing from the module list) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…always-loaded) (garrytan#1806) * feat(test): transcript-section-logger + ship-action fingerprint (T10) Pure-analysis module over a SkillTestResult/NDJSON transcript: - extractSectionReads(): which sections/*.md a run opened (post-carve check) - extractShipActions(): observable action fingerprint (merge/test/bump/ changelog/commit/push/pr) that works on the MONOLITH too, so a baseline captured before the carve can detect a sectioned-ship regression - baseline read/write + compareShipActions() for baseline-first dogf(T10) Baseline-first answers the Codex outside-voice critique that a logger in the same PR as the carve is post-failure telemetry without a pre-carve reference. 11 unit tests, all green. Paid monolith baseline capture runs separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(pipeline): section discovery + generation machinery (T9) - discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl - gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext as shared helpers (processTemplate and the new processSectionTemplate both call them, so a sanitization/rewrite fix can't miss sections) [C1] - processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice), parent-skill TemplateContext (skillName pinned to parent, not 'sections', so appliesTo gating + tier behave identically), per-host output routing - --host all now fails the build on ANY host failure, not just claude, so a stale external-host output can't slip the freshness gate [Codex outside-voice #9] Inert until a skill is carved (no sections/ dirs exist yet). Refactor is output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE. 5 discovery unit tests + 389 gen-skill-docs tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9) Two install targets cherry-pick SKILL.md and would leave a carved skill's sections/ behind, 404ing a runtime 'Read sections/<name>.md': - link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows gets a fresh copy on every ./setup) - kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under ~/.kiro, not ~/.codex/~/.claude codex/factory/opencode link the whole generated dir, so sections ride free. Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a skill is carved. Static-tripwire test + windows-fallback invariant green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9) Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a tested CLI instead of bash prose the agent re-derives each run. - classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION vs origin/<base>:VERSION vs package.json.version (pure reader) - write: validated dual-write to VERSION + package.json (FRESH bump) - repair: DRIFT_STALE_PKG sync, no re-bump Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from skippable prose into code that can't be skipped or misread. 15 tests (exhaustive state matrix + write/repair fs + real-git classify). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): sectioned-skill parity capability — guards the carve (T9) Carved skills (skeleton + sections/*.md) need parity checks that see relocated content, or moving a phrase into a section reads as 'lost': - readSkillForParity(): union skeleton + all sections/*.md - checkSkillParity sectioned mode: content checks against the union; minBytes/ maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a small skeleton would otherwise make the size floor toothless [Codex #12]. Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the same commit it lands. Monolith path byte-identical (verified: pre-existing investigate 1.053 ratio drift fails the same with this change stashed). 7 sectioned-parity tests + existing parity tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(ship): carve into skeleton + on-demand sections (Claude) (T9) ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving 8 prose-heavy steps into ship/sections/*.md, read on demand: tests, test-coverage, plan-completion, review-army, greptile, adversarial, changelog, pr-body. Step 12's version logic now calls the tested gstack-version-bump CLI instead of inline bash. Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton + generated section files) and INLINES the content on every other host, so external hosts keep the full monolith — verified factory at 162KB with no sections dir. {{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures. Multi-pass resolve expands inlined sections' own resolvers. Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/ golden/plan-completion/garrytan#1539/size-budget tests via skeleton+sections union reads. Free suite green except the pre-existing investigate parity drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): manifest-consistency + context-parity + requiredReads helper (T9) Free deterministic guards for the carve: - required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the mechanical layer-5 check that the agent Read the sections its situation needs (required set comes from the fixture, not the passive manifest) - section-manifest-consistency: 3-tier orphan classification (generated orphan + hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and pins the PASSIVE-manifest contract (no applies_when/required_for) - template-context-parity: generated sections have zero unresolved placeholders and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW) rendered — proving sections resolve with the parent skillName, not 'sections' 16 tests, all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): section-loading E2E + idempotency CLI detection (T9) - skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan mode against a fresh version-changing fixture and asserts the agent Read the required sections (review-army + changelog). Runs against the INSTALLED skill (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip. - skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12 now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a gstack-version-bump-write re-bump regression signal. - touchfiles: register ship-section-loading (periodic) + extend idempotency deps with bin/gstack-version-bump + scripts/resolvers/sections.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): union-read redaction wiring test for the carve (T9) main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the carve, not the skeleton template. Read skeleton + section templates union so the redaction-wiring assertions follow the relocated content. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

anbangr and others added 2 commits May 6, 2026 11:03

test: add build skill TDD gate

88370eb

Add a dedicated build-skill CI gate, expand the focused test command to the full orchestrator suite, and document the TDD lifecycle contract.

chore: release 1.26.7.0

9143090

Record the build skill TDD gate release. Co-Authored-By: OpenAI Codex <noreply@openai.com>

anbangr merged commit 73aea01 into main May 6, 2026

anbangr deleted the build-skill-tdd-coverage-gate branch May 6, 2026 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.26.7.0 test: add build skill TDD gate#12

v1.26.7.0 test: add build skill TDD gate#12
anbangr merged 2 commits into
mainfrom
build-skill-tdd-coverage-gate

anbangr commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anbangr commented May 6, 2026

Summary

Test Coverage

Pre-Landing Review

Design Review

Eval Results

Scope Drift

Plan Completion

Verification Results

TODOS

Documentation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant