Skip to content

fix(build): force Step 3 after monitor exit 0 (FINALIZATION_REQUIRED)#26

Merged
anbangr merged 10 commits into
mainfrom
fix/step-transition-clean
May 11, 2026
Merged

fix(build): force Step 3 after monitor exit 0 (FINALIZATION_REQUIRED)#26
anbangr merged 10 commits into
mainfrom
fix/step-transition-clean

Conversation

@anbangr

@anbangr anbangr commented May 11, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes the build skill's M3.5 step-transition bug: when `gstack-build --skip-ship` exits 0, the agent stops instead of proceeding to Step 3 (Final Ship & Completion). Branches stay unshipped and plans stay unarchived.

Root cause: After `exit "$_MONITOR_EXIT"` in the M3.5 bash block, there was no compaction-resistant instruction directing the agent to Step 3. With long builds and context compaction, the agent inferred "done" from the exit 0.

Fix (4 changes + tests):

  • Change 1 — Compaction-resistant printf in M3.5 block: Fires on exit 0/13 and appears as a tool result in active context, surviving compaction.
  • Change 2 — MANDATORY prose after M3.5 block: Explicit "do NOT stop" instruction between the bash block and the `---` separator.
  • Change 3 — ALWAYS RUN callout at Step 3 header: Mandatory marker so agents resuming mid-skill can't miss it.
  • Change 4 — Exit code 13 (FINALIZATION_REQUIRED): Monitor exits 13 instead of 0 when features land at `origin_verified`. Structurally stronger than prose — agent can't infer "done" from a non-zero exit code. Analytics treats 13 as "success" (not failure).
  • Tests: New periodic E2E eval (`test/skill-e2e-build-step-transition.test.ts`) + integration tests updated with `--no-plan-review` on `--skip-ship` invocations (prevents LLM calls in non-API test runs).

Test plan

  • All 12 integration tests pass (`bun test build/orchestrator/tests/integration.test.ts`)
  • `bun test` mandatory suite passes
  • Exit code 13 fires when `--skip-ship` leaves features at `origin_verified`
  • Exit code 13 logs as "success" in activity analytics
  • Skill template regenerated via `bun run gen:skill-docs`
  • New E2E eval wired in `test/helpers/touchfiles.ts` (periodic tier)

🤖 Generated with Claude Code

anbangr and others added 10 commits May 11, 2026 17:21
Add writing/experiment/research/manual phase support to the orchestrator
parser, plan-mutator, and plan-reviewer. Non-code phases skip TDD loops
and go directly through a content-review gate before being marked done.

Update review/ship SKILL.md templates to surface the content-review gate
and ensure plan-verification covers qa-only skill changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…TION_REQUIRED, eval)

After gstack-build monitor exits 0 (ALL_RUNS_COMPLETE) or 13
(FINALIZATION_REQUIRED), agents were stopping without running Step 3:
Final Ship & Completion, leaving branches unmerged and plans unarchived.

Three-layer defense:
1. Compaction-resistant bash printf in M3.5 block — survives context compaction
   as a tool result visible in active context
2. MANDATORY prose block after M3.5 fence — explicit direction to Step 3
3. ALWAYS RUN callout at Step 3 header — second reminder at the destination

Also adds exit code 13 (FINALIZATION_REQUIRED) to the exit code table and
a periodic LLM-judge eval (skill-e2e-build-step-transition) that verifies
agents correctly proceed to Step 3 after monitor exit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…addition

The review resolver now includes content-review in the dashboard skill list.
Update the gen-skill-docs test assertions to match the new ordering:
  plan-eng-review, review, content-review, plan-design-review

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New capability (non-code phase kinds, step-transition guardrails) and
concurrent-build fix warrant a MINOR bump per fork versioning convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The skill-md.test.ts assertions hardcoded the version string to verify
TDD changes are present. After the v1.22.0 bump these checks failed.
Update to match the current version so the same guard continues to work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When --skip-ship leaves all features at origin_verified, gstack-build
exits with code 13 (FINALIZATION_REQUIRED). Two integration tests were
asserting exit 0 — a pre-existing failure on main. Update both assertions
to expect 13, with comments explaining the code meaning.

Also treat exit 13 as "success" in the activity log so skip-ship sessions
don't show up as failures in dashboards/retros.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When all features reach origin_verified but --skip-ship prevents the
ship step, exit with code 13 (FINALIZATION_REQUIRED) instead of 0.
Exit 0 would signal "done" to the orchestrating skill, but Step 3
(ship + archive) is still required — the user must explicitly ship.

This pairs with the integration test update (expect 13 not 0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Regenerate build/SKILL.md from template after step-transition and
  content-review gate changes
- Add --no-plan-review to integration test invocations that use
  --skip-ship and --skip-clean-check to prevent LLM calls in CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… regex

- Add buildKindInstructions(phase: Phase): string[] to cli.ts — exported
  function for kind-specific implementation prompts (writing/experiment/
  research/manual/code). Tested by cli.test.ts:3714-3760.
- Fix extractCoverageTarget regex to support decimal targets like ≥90.5%.
  Previous pattern (\d+) truncated at the decimal point.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant