Fix Codex native hook relay after restart#87272
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 2:45 PM ET / 18:45 UTC. Summary PR surface: Source +112, Tests +259. Total +371 across 10 files. Reproducibility: yes. source inspection gives a high-confidence reproduction shape: current main rotates relay generations on registration and rejects cached hook commands with the old generation after resume. The linked reports provide concrete 2026.5.26 runtime symptoms, though I did not run a live Gateway restart in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the focused fix after a maintainer accepts the bounded legacy grace path, preferably with redacted live restart proof if runtime confidence is needed beyond the regression tests. Do we have a high-confidence way to reproduce the issue? Yes, source inspection gives a high-confidence reproduction shape: current main rotates relay generations on registration and rejects cached hook commands with the old generation after resume. The linked reports provide concrete 2026.5.26 runtime symptoms, though I did not run a live Gateway restart in this read-only review. Is this the best way to solve the issue? Yes, the core repair direction is maintainable: persist the active generation for resumed threads, rotate on fresh-thread fallback, and cover old bindings with a bounded grace path. The grace-window tradeoff is the part that needs explicit maintainer acceptance. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 9755241b56f0. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +112, Tests +259. Total +371 across 10 files. View PR surface stats
Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress. Hatch commandComment Hatchability rules:
What is this egg doing here?
|
2ab2be0 to
b337411
Compare
|
Verification before merge: Behavior addressed: Codex native hook relay generations are now persisted only for real resumed app-server threads; fresh-thread fallback rotates generation so stale hook subprocesses from abandoned threads remain rejected. Real environment tested: local OpenClaw source checkout on macOS, plus GitHub PR checks on head b337411. Exact steps or command run after this patch:
Evidence after fix: Vitest passed 5 files / 182 tests; oxfmt passed; both tsgo probes passed; autoreview reported no accepted/actionable findings. PR checks passed for Real behavior proof, dependency-change-awareness, selected Security High jobs, Scan changed paths, Opengrep OSS, Socket, actionlint, label, no-tabs, and auto-response. CodeQL aggregate is neutral because selected shards were skipped. Observed result after fix: resumed Codex app-server threads reuse the persisted relay generation; legacy pre-generation bindings get the bounded grace path; invalidated bindings and resume-failure fallbacks rotate generation and reject stale hook calls. What was not tested: full release validation and live production Gateway restart were not rerun locally for this narrow PR. |
Summary
Fixes #87331.
Fixes the Codex native hook relay generation mismatch that can make the first native tool call after a Gateway restart fail with
Native hook relay unavailable.The root cause is that OpenClaw rotated the relay generation on every registration while resumed Codex app-server threads can briefly keep using the previous hook command. After a Gateway restart, the old generation was not available in process memory, so the resumed thread's first hook invocation was rejected as stale even though it targeted the same stable relay id.
Changes
Validation
node scripts/run-vitest.mjs src/agents/harness/native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/thread-lifecycle.binding.test.ts --run./node_modules/.bin/oxfmt --check extensions/codex/src/app-server/attempt-startup.ts extensions/codex/src/app-server/native-hook-relay.ts extensions/codex/src/app-server/run-attempt-test-harness.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/session-binding.ts extensions/codex/src/app-server/thread-lifecycle.ts src/agents/harness/native-hook-relay.test.ts src/agents/harness/native-hook-relay.tsnode scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfonode scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo