Skip to content

codex + Apple Silicon hardening wave (v0.18.4.0)#1056

Merged
garrytan merged 5 commits into
mainfrom
garrytan/fix-codex-and-apple
Apr 18, 2026
Merged

codex + Apple Silicon hardening wave (v0.18.4.0)#1056
garrytan merged 5 commits into
mainfrom
garrytan/fix-codex-and-apple

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

v0.18.4.0: hardening wave for /codex + /autoplan + Apple Silicon first-run install. Lands 2 community PRs as-is + 4 fresh fixes + 4 expansion items. All 478 free-tier tests pass; paid E2E running.

What's in the box

Community cherry-picks (as-is per "own-the-fixes" memory rule):

Fresh fixes:

  • Timeout wrapper: every codex exec/codex review runs under a gtimeout → timeout → unwrapped fallback chain. On exit 124, auto-logs telemetry + operational learning + surfaces actionable error message. Guards against model-API stalls fix: prevent codex exec stdin deadlock with </dev/null redirect #972 doesn't cover.
  • Multi-signal auth gate: env-based ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives for CI/platform-engineer users that file-only rejected. Fails fast before expensive prompt construction.
  • Challenge mode stderr capture: 2>/dev/null2>$TMPERR + auth-error grep + JSON stream completeness check.
  • Plan-review counterbalance: /plan-ceo-review + /plan-eng-review no longer quietly bias toward minimal-diff. "Right-sized diff" framing with explicit permission to recommend rewrites when warranted.

Accepted expansions:

  • Codex CLI version warning: anchored regex (^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$) flags known-bad versions (0.120.0-0.120.2 — the range that shipped the stdin deadlock fix: prevent codex exec stdin deadlock with </dev/null redirect #972 fixes). No false positives on 0.120.10 / 0.120.20.
  • Failure telemetry: codex_timeout, codex_auth_failed, codex_cli_missing, codex_version_warning events land in ~/.gstack/analytics/skill-usage.jsonl behind existing opt-in. Payload schema verified to exclude prompt content / env values / auth tokens.
  • Auto-logged learnings: on timeout (exit 124), invokes gstack-learnings-log with type=operational. Future /investigate sessions surface prior hang patterns automatically.
  • Dual-voice E2E regression test (periodic tier): asserts both Claude subagent and Codex voices produce output in /autoplan Phase 1, or gracefully marks [codex-unavailable].

DX polish (from /plan-devex-review):

  • ./setup auto-installs coreutils on macOS (Darwin, arch-agnostic) so gtimeout is available without asking users to run brew install coreutils. Opt-out via GSTACK_SKIP_COREUTILS=1.
  • Step 9 verification exercises the full Playwright stack (./browse/dist/browse goto https://example.com), not just --version.
  • Timeout error messages name common causes + actionable next step.

Shared helper: bin/gstack-codex-probe

Single bash file with _gstack_codex_auth_probe, _gstack_codex_version_check, _gstack_codex_timeout_wrapper, _gstack_codex_log_event, _gstack_codex_log_hang. Namespace-prefixed (_gstack_codex_*), no shell-option side effects (unit test enforces). Templates source it via the existing ~/.claude/skills/gstack/bin/... pattern; host configs (hosts/codex.ts, hosts/factory.ts, etc.) rewrite the path via pathRewrites. When a second backend lands (Gemini CLI), this is the file to extend.

Test coverage

  • 25 deterministic unit tests in test/codex-hardening.test.ts:
    • 8 auth probe combinations (env precedence, whitespace, alternate $CODEX_HOME, corrupt file paths)
    • 10 version regex cases (0.120.10 false-positive guards, v-prefixed/multiline output)
    • 4 timeout wrapper + namespace hygiene (bash -n, gtimeout preference, set-option leak check)
    • 3 telemetry payload schema (confirms env/auth never leak into events)
  • 1 periodic-tier E2E in test/skill-e2e-autoplan-dual-voice.test.ts: gates the /autoplan dual-voice path.

All 478 free-tier tests pass. Paid bun run test:e2e running in background against this branch.

Review provenance

9 cross-model tensions surfaced by 2 Codex outside-voice rounds; all 9 user-resolved. CEO + DX + Eng reviews all CLEARED. Full plan + review report: ~/.claude/plans/system-instruction-you-are-working-temporal-star.md.

Test plan

  • bun test — 478/478 pass (free, <10s)
  • Probe smoke test: auth probe returns AUTH_OK, version check silent on 0.116.0, timeout wrapper passes through
  • Code signing verified: all 4 binaries codesign-valid on Apple Silicon dev box (codesign --verify --strict ...)
  • bun run test:e2e — running in background (94 tests, ~30-45 min, ~$4-8)
  • Manual smoke: /codex consult "say hi" completes in <30s without hang
  • Manual smoke: rm ~/.codex/auth.json && /codex consult "x" → fails fast with auth gate message
  • Manual smoke on fresh Apple Silicon Mac: git clone && ./setup && ./browse/dist/browse goto https://example.com works first try

🤖 Generated with Claude Code

voidborne-d and others added 5 commits April 18, 2026 11:38
On some Apple Silicon machines, Bun's --compile produces a corrupt or
linker-only code signature. macOS kills these binaries with SIGKILL
(exit 137, zsh: killed) before they execute a single instruction.

Add a post-build codesign step to setup that runs only on Darwin arm64:
1. Remove the corrupt/linker-only signature (required — a direct re-sign
   fails with 'invalid or unsupported format for signature')
2. Apply a fresh ad-hoc signature

The step is idempotent, costs <1s, and is what Bun's own docs recommend
for distributed standalone executables. All four compiled binaries are
covered: browse, find-browse, design, and gstack-global-discover.
Failure is a non-fatal warning so Intel/CI builds are unaffected.

Fixes #997
codex CLI 0.120.0+ blocks indefinitely when stdin is a non-TTY pipe
(Claude Code Bash tool, background bash, CI). The CLI sees a non-TTY
stdin and waits for EOF to append it as a <stdin> block, even when the
prompt is passed as a positional argument.

Fix: add < /dev/null to every codex exec and codex review invocation
in the source-of-truth files (scripts/resolvers/*.ts and *.md.tmpl).
Generated SKILL.md files will be produced by bun run gen:skill-docs
in a subsequent commit (Tension D: template+resolver only, generator
is authoritative, not cherry-picked artifacts).

Affected source files (16 total invocations):
- scripts/resolvers/review.ts (4)
- scripts/resolvers/design.ts (3)
- codex/SKILL.md.tmpl (5)
- autoplan/SKILL.md.tmpl (4)

Fixes #971

Co-Authored-By: loning <loning@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hardens /codex and /autoplan against silent failures surfaced by the #972
stdin fix and #1003 Apple Silicon codesign. Six-layer defense:

1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth
   ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth
   (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the
   old file-only check produced for CI / platform-engineer users.

2. **Timeout wrapper** around every codex exec / codex review invocation:
   gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces
   common causes + actionable next step. Guards against model-API stalls
   not covered by the #972 stdin fix.

3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208):
   2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized
   surfaces errors that were previously dropped silently.

4. **Completeness check** in the Python JSON parser: tracks turn.completed
   events and warns on zero (possible mid-stream disconnect).

5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range
   that introduced the stdin deadlock #972 fixes). Anchored regex
   `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20
   false positives.

6. **Failure telemetry + operational learnings**: codex_timeout,
   codex_auth_failed, codex_cli_missing, codex_version_warning events
   land in ~/.gstack/analytics/skill-usage.jsonl behind the existing
   telemetry opt-in. On timeout (exit 124), auto-logs an operational
   learning via gstack-learnings-log so future /investigate sessions
   surface prior hang patterns automatically.

**Shared helper** (bin/gstack-codex-probe): consolidates all four pieces
(auth probe, version check, timeout wrapper, telemetry logger) into one
bash file that /codex and /autoplan source. Namespace-prefixed
(_gstack_codex_*) with a unit test that verifies sourcing does not leak
shell options into the caller. pathRewrites in host configs rewrite
~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for
Factory/Cursor/etc.

**Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU
timeout by default; Homebrew's coreutils installs it as gtimeout to
avoid shadowing BSD utilities. ./setup now auto-installs coreutils on
Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither
gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1
for CI, managed machines, or offline envs.

**25 deterministic unit tests** (test/codex-hardening.test.ts):
- 8 auth probe combinations (env precedence, whitespace, alternate
  $CODEX_HOME, corrupt file paths)
- 10 version regex cases including 0.120.10 false-positive guards
  and v-prefixed / multiline output
- 4 timeout wrapper + namespace hygiene (bash -n, gtimeout
  preference, set-option leak check)
- 3 telemetry payload schema checks (confirms env values + auth
  tokens never leak into emitted events)

**1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts):
gates the /autoplan dual-voice path — asserts both Claude subagent
and Codex voices produce output in Phase 1, OR that [codex-unavailable]
is logged when Codex is absent. ~\$1/run, not a CI gate.

Golden baseline + gen-skill-docs exclusion list updated for the new
codex path references and the 16 < /dev/null redirects from #972.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fault)

/plan-ceo-review and /plan-eng-review listed "minimal diff" as an
engineering preference without counterbalancing language. Reviewers
picked up on that and rejected rewrites that should have been approved.

The preference is now framed as "right-sized diff" with explicit
permission to recommend a rewrite when the existing foundation is
broken. Implementation alternatives section in CEO review gets an
equal-weight clarification: don't default to minimal viable just
because it is smaller. Recommend whichever best serves the user's
goal; if the right answer is a rewrite, say so.

Three-line tone edit per template, no voice / ETHOS / YC / promotional
content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Apple Silicon codesign fix (#1003 @voidborne-d)
- Codex stdin deadlock fix (#972 @loning)
- Codex timeout wrapper (gtimeout → timeout → unwrapped fallback)
- Multi-signal auth gate for /codex + /autoplan
- Codex version warning for known-bad CLI (0.120.0-0.120.2)
- Challenge mode stderr capture + completeness check
- Plan-review right-sized diff counterbalance
- Failure telemetry + auto-log timeout as operational learning
- 25 deterministic unit tests + dual-voice periodic E2E

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

E2E Evals: ✅ PASS

61/61 tests passed | $6.80 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 7/7 $0.33
e2e-deploy 6/6 $1.02
e2e-design 3/3 $0.41
e2e-plan 7/7 $1.06
e2e-qa-workflow 3/3 $1.09
e2e-review 6/6 $1.91
e2e-workflow 4/4 $0.48
llm-judge 25/25 $0.5

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit 9ec4ab7 into main Apr 18, 2026
18 checks passed
garrytan added a commit that referenced this pull request Apr 18, 2026
…ening

Resolves conflicts in VERSION (kept 0.19.0.0), package.json (kept 0.19.0.0),
and CHANGELOG.md (preserved 0.19.0.0 at top, inserted 0.18.4.0 below).

Main brought v0.18.4.0's codex + Apple Silicon hardening wave (PR #1056):
- Apple Silicon ad-hoc codesigning in ./setup (fixes SIGKILL on first run)
- /codex stdin deadlock fix (redirect from /dev/null)
- /codex + /autoplan preflight auth + version checks
- 10-minute timeout wrapper via gtimeout/timeout
- New bin/gstack-codex-probe consolidates auth/version/timeout logic
- test/codex-hardening.test.ts (25 unit tests, gate tier)
- test/setup-codesign.test.ts
- test/skill-e2e-autoplan-dual-voice.test.ts (periodic tier)

Auto-merged SKILL.md.tmpl updates across autoplan, codex, plan-ceo-review,
plan-eng-review, and the scripts/resolvers/{design,review}.ts modules.
None conflicted with v0.19's preamble refactor or new benchmark-models
skill — clean 3-way merge.

Regenerated all SKILL.md files. Ship golden fixtures refreshed for
claude/codex/factory hosts. 423 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Apr 18, 2026
Conflicts resolved:
- VERSION / package.json: keep 1.0.0.0 (our v1 milestone bump from
  commit 18fc95c, supersedes main's 0.18.4.0).
- CHANGELOG.md: preserved both entries in chronological order — v1.0.0.0
  (writing-style + throughput receipts) above v0.18.4.0 (codex + Apple
  Silicon hardening). No gaps in sequence: 1.0.0.0 → 0.19.0.0 → 0.18.4.0
  → 0.18.3.0 → ...

Main added v0.18.4.0 (#1056): codex + Apple Silicon hardening wave.

Regenerated all SKILL.md files after merge so they pick up both main's
template changes AND our v1 writing-style + question-tuning additions.

Full free test suite: 1288 pass, 0 fail, 113 skip across 37 files, 8555
expect() calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RyotaKun pushed a commit to RyotaKun/gstack that referenced this pull request May 18, 2026
* fix: ad-hoc codesign compiled binaries on Apple Silicon after build

On some Apple Silicon machines, Bun's --compile produces a corrupt or
linker-only code signature. macOS kills these binaries with SIGKILL
(exit 137, zsh: killed) before they execute a single instruction.

Add a post-build codesign step to setup that runs only on Darwin arm64:
1. Remove the corrupt/linker-only signature (required — a direct re-sign
   fails with 'invalid or unsupported format for signature')
2. Apply a fresh ad-hoc signature

The step is idempotent, costs <1s, and is what Bun's own docs recommend
for distributed standalone executables. All four compiled binaries are
covered: browse, find-browse, design, and gstack-global-discover.
Failure is a non-fatal warning so Intel/CI builds are unaffected.

Fixes garrytan#997

* fix: prevent codex exec stdin deadlock with </dev/null redirect

codex CLI 0.120.0+ blocks indefinitely when stdin is a non-TTY pipe
(Claude Code Bash tool, background bash, CI). The CLI sees a non-TTY
stdin and waits for EOF to append it as a <stdin> block, even when the
prompt is passed as a positional argument.

Fix: add < /dev/null to every codex exec and codex review invocation
in the source-of-truth files (scripts/resolvers/*.ts and *.md.tmpl).
Generated SKILL.md files will be produced by bun run gen:skill-docs
in a subsequent commit (Tension D: template+resolver only, generator
is authoritative, not cherry-picked artifacts).

Affected source files (16 total invocations):
- scripts/resolvers/review.ts (4)
- scripts/resolvers/design.ts (3)
- codex/SKILL.md.tmpl (5)
- autoplan/SKILL.md.tmpl (4)

Fixes garrytan#971

Co-Authored-By: loning <loning@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: codex/autoplan hardening + Apple Silicon coreutils auto-install

Hardens /codex and /autoplan against silent failures surfaced by the garrytan#972
stdin fix and garrytan#1003 Apple Silicon codesign. Six-layer defense:

1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth
   ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth
   (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the
   old file-only check produced for CI / platform-engineer users.

2. **Timeout wrapper** around every codex exec / codex review invocation:
   gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces
   common causes + actionable next step. Guards against model-API stalls
   not covered by the garrytan#972 stdin fix.

3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208):
   2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized
   surfaces errors that were previously dropped silently.

4. **Completeness check** in the Python JSON parser: tracks turn.completed
   events and warns on zero (possible mid-stream disconnect).

5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range
   that introduced the stdin deadlock garrytan#972 fixes). Anchored regex
   `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20
   false positives.

6. **Failure telemetry + operational learnings**: codex_timeout,
   codex_auth_failed, codex_cli_missing, codex_version_warning events
   land in ~/.gstack/analytics/skill-usage.jsonl behind the existing
   telemetry opt-in. On timeout (exit 124), auto-logs an operational
   learning via gstack-learnings-log so future /investigate sessions
   surface prior hang patterns automatically.

**Shared helper** (bin/gstack-codex-probe): consolidates all four pieces
(auth probe, version check, timeout wrapper, telemetry logger) into one
bash file that /codex and /autoplan source. Namespace-prefixed
(_gstack_codex_*) with a unit test that verifies sourcing does not leak
shell options into the caller. pathRewrites in host configs rewrite
~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for
Factory/Cursor/etc.

**Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU
timeout by default; Homebrew's coreutils installs it as gtimeout to
avoid shadowing BSD utilities. ./setup now auto-installs coreutils on
Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither
gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1
for CI, managed machines, or offline envs.

**25 deterministic unit tests** (test/codex-hardening.test.ts):
- 8 auth probe combinations (env precedence, whitespace, alternate
  $CODEX_HOME, corrupt file paths)
- 10 version regex cases including 0.120.10 false-positive guards
  and v-prefixed / multiline output
- 4 timeout wrapper + namespace hygiene (bash -n, gtimeout
  preference, set-option leak check)
- 3 telemetry payload schema checks (confirms env values + auth
  tokens never leak into emitted events)

**1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts):
gates the /autoplan dual-voice path — asserts both Claude subagent
and Codex voices produce output in Phase 1, OR that [codex-unavailable]
is logged when Codex is absent. ~\$1/run, not a CI gate.

Golden baseline + gen-skill-docs exclusion list updated for the new
codex path references and the 16 < /dev/null redirects from garrytan#972.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: plan-review right-sized diff counterbalance (not minimal-diff default)

/plan-ceo-review and /plan-eng-review listed "minimal diff" as an
engineering preference without counterbalancing language. Reviewers
picked up on that and rejected rewrites that should have been approved.

The preference is now framed as "right-sized diff" with explicit
permission to recommend a rewrite when the existing foundation is
broken. Implementation alternatives section in CEO review gets an
equal-weight clarification: don't default to minimal viable just
because it is smaller. Recommend whichever best serves the user's
goal; if the right answer is a rewrite, say so.

Three-line tone edit per template, no voice / ETHOS / YC / promotional
content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v0.18.4.0 — codex + Apple Silicon hardening wave

- Apple Silicon codesign fix (garrytan#1003 @voidborne-d)
- Codex stdin deadlock fix (garrytan#972 @loning)
- Codex timeout wrapper (gtimeout → timeout → unwrapped fallback)
- Multi-signal auth gate for /codex + /autoplan
- Codex version warning for known-bad CLI (0.120.0-0.120.2)
- Challenge mode stderr capture + completeness check
- Plan-review right-sized diff counterbalance
- Failure telemetry + auto-log timeout as operational learning
- 25 deterministic unit tests + dual-voice periodic E2E

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: loning <loning@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mathiasmora2232 pushed a commit to mathiasmora2232/gstack that referenced this pull request May 30, 2026
* fix: ad-hoc codesign compiled binaries on Apple Silicon after build

On some Apple Silicon machines, Bun's --compile produces a corrupt or
linker-only code signature. macOS kills these binaries with SIGKILL
(exit 137, zsh: killed) before they execute a single instruction.

Add a post-build codesign step to setup that runs only on Darwin arm64:
1. Remove the corrupt/linker-only signature (required — a direct re-sign
   fails with 'invalid or unsupported format for signature')
2. Apply a fresh ad-hoc signature

The step is idempotent, costs <1s, and is what Bun's own docs recommend
for distributed standalone executables. All four compiled binaries are
covered: browse, find-browse, design, and gstack-global-discover.
Failure is a non-fatal warning so Intel/CI builds are unaffected.

Fixes garrytan#997

* fix: prevent codex exec stdin deadlock with </dev/null redirect

codex CLI 0.120.0+ blocks indefinitely when stdin is a non-TTY pipe
(Claude Code Bash tool, background bash, CI). The CLI sees a non-TTY
stdin and waits for EOF to append it as a <stdin> block, even when the
prompt is passed as a positional argument.

Fix: add < /dev/null to every codex exec and codex review invocation
in the source-of-truth files (scripts/resolvers/*.ts and *.md.tmpl).
Generated SKILL.md files will be produced by bun run gen:skill-docs
in a subsequent commit (Tension D: template+resolver only, generator
is authoritative, not cherry-picked artifacts).

Affected source files (16 total invocations):
- scripts/resolvers/review.ts (4)
- scripts/resolvers/design.ts (3)
- codex/SKILL.md.tmpl (5)
- autoplan/SKILL.md.tmpl (4)

Fixes garrytan#971

Co-Authored-By: loning <loning@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: codex/autoplan hardening + Apple Silicon coreutils auto-install

Hardens /codex and /autoplan against silent failures surfaced by the garrytan#972
stdin fix and garrytan#1003 Apple Silicon codesign. Six-layer defense:

1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth
   ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth
   (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the
   old file-only check produced for CI / platform-engineer users.

2. **Timeout wrapper** around every codex exec / codex review invocation:
   gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces
   common causes + actionable next step. Guards against model-API stalls
   not covered by the garrytan#972 stdin fix.

3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208):
   2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized
   surfaces errors that were previously dropped silently.

4. **Completeness check** in the Python JSON parser: tracks turn.completed
   events and warns on zero (possible mid-stream disconnect).

5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range
   that introduced the stdin deadlock garrytan#972 fixes). Anchored regex
   `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20
   false positives.

6. **Failure telemetry + operational learnings**: codex_timeout,
   codex_auth_failed, codex_cli_missing, codex_version_warning events
   land in ~/.gstack/analytics/skill-usage.jsonl behind the existing
   telemetry opt-in. On timeout (exit 124), auto-logs an operational
   learning via gstack-learnings-log so future /investigate sessions
   surface prior hang patterns automatically.

**Shared helper** (bin/gstack-codex-probe): consolidates all four pieces
(auth probe, version check, timeout wrapper, telemetry logger) into one
bash file that /codex and /autoplan source. Namespace-prefixed
(_gstack_codex_*) with a unit test that verifies sourcing does not leak
shell options into the caller. pathRewrites in host configs rewrite
~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for
Factory/Cursor/etc.

**Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU
timeout by default; Homebrew's coreutils installs it as gtimeout to
avoid shadowing BSD utilities. ./setup now auto-installs coreutils on
Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither
gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1
for CI, managed machines, or offline envs.

**25 deterministic unit tests** (test/codex-hardening.test.ts):
- 8 auth probe combinations (env precedence, whitespace, alternate
  $CODEX_HOME, corrupt file paths)
- 10 version regex cases including 0.120.10 false-positive guards
  and v-prefixed / multiline output
- 4 timeout wrapper + namespace hygiene (bash -n, gtimeout
  preference, set-option leak check)
- 3 telemetry payload schema checks (confirms env values + auth
  tokens never leak into emitted events)

**1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts):
gates the /autoplan dual-voice path — asserts both Claude subagent
and Codex voices produce output in Phase 1, OR that [codex-unavailable]
is logged when Codex is absent. ~\$1/run, not a CI gate.

Golden baseline + gen-skill-docs exclusion list updated for the new
codex path references and the 16 < /dev/null redirects from garrytan#972.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: plan-review right-sized diff counterbalance (not minimal-diff default)

/plan-ceo-review and /plan-eng-review listed "minimal diff" as an
engineering preference without counterbalancing language. Reviewers
picked up on that and rejected rewrites that should have been approved.

The preference is now framed as "right-sized diff" with explicit
permission to recommend a rewrite when the existing foundation is
broken. Implementation alternatives section in CEO review gets an
equal-weight clarification: don't default to minimal viable just
because it is smaller. Recommend whichever best serves the user's
goal; if the right answer is a rewrite, say so.

Three-line tone edit per template, no voice / ETHOS / YC / promotional
content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v0.18.4.0 — codex + Apple Silicon hardening wave

- Apple Silicon codesign fix (garrytan#1003 @voidborne-d)
- Codex stdin deadlock fix (garrytan#972 @loning)
- Codex timeout wrapper (gtimeout → timeout → unwrapped fallback)
- Multi-signal auth gate for /codex + /autoplan
- Codex version warning for known-bad CLI (0.120.0-0.120.2)
- Challenge mode stderr capture + completeness check
- Plan-review right-sized diff counterbalance
- Failure telemetry + auto-log timeout as operational learning
- 25 deterministic unit tests + dual-voice periodic E2E

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: loning <loning@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant