Skip to content

Fix Codex native hook relay after restart#87272

Merged
steipete merged 3 commits into
mainfrom
codex/native-hook-restart-generation
May 27, 2026
Merged

Fix Codex native hook relay after restart#87272
steipete merged 3 commits into
mainfrom
codex/native-hook-restart-generation

Conversation

@amknight

@amknight amknight commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #87331.

Fixes the Codex native hook relay generation mismatch that can make the first native tool call after a Gateway restart fail with Native hook relay unavailable.

The root cause is that OpenClaw rotated the relay generation on every registration while resumed Codex app-server threads can briefly keep using the previous hook command. After a Gateway restart, the old generation was not available in process memory, so the resumed thread's first hook invocation was rejected as stale even though it targeted the same stable relay id.

Changes

  • Persist the active Codex native hook relay generation in the Codex app-server thread binding.
  • Reuse that persisted generation when resuming an existing Codex thread after restart.
  • Add a bounded compatibility grace window for older bindings that do not yet have a persisted generation, so upgraded installs recover immediately.
  • Keep normal stale-generation rejection behavior outside that explicit grace window.
  • Add regression coverage for the direct relay bridge, resumed-thread generation reuse, and old binding compatibility.

Validation

  • node scripts/run-vitest.mjs src/agents/harness/native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/thread-lifecycle.binding.test.ts --run
  • ./node_modules/.bin/oxfmt --check extensions/codex/src/app-server/attempt-startup.ts extensions/codex/src/app-server/native-hook-relay.ts extensions/codex/src/app-server/run-attempt-test-harness.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/session-binding.ts extensions/codex/src/app-server/thread-lifecycle.ts src/agents/harness/native-hook-relay.test.ts src/agents/harness/native-hook-relay.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo

@openclaw-barnacle openclaw-barnacle Bot added size: M maintainer Maintainer-authored PR labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 2:45 PM ET / 18:45 UTC.

Summary
The PR persists Codex native hook relay generations in app-server thread bindings, reuses them on resumed threads, adds a short legacy-binding mismatch grace window, and adds relay regression coverage.

PR surface: Source +112, Tests +259. Total +371 across 10 files.

Reproducibility: yes. source inspection gives a high-confidence reproduction shape: current main rotates relay generations on registration and rejects cached hook commands with the old generation after resume. The linked reports provide concrete 2026.5.26 runtime symptoms, though I did not run a live Gateway restart in this read-only review.

Review metrics: 1 noteworthy metric.

  • Upgrade-sensitive relay state: 1 persisted binding field added, 1 legacy grace path added. The PR changes persisted Codex session state and temporarily relaxes relay generation fencing for upgraded installs, which maintainers should review before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Have a maintainer explicitly accept or revise the five-minute legacy generation-mismatch grace.
  • Add redacted live Gateway restart proof if maintainers want runtime evidence beyond targeted regression tests.

Mantis proof suggestion
A live Telegram direct Codex session is the reported user path, and a short transcript would materially prove the restart recovery for native tool calls. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram live: verify a Telegram direct Codex session can run a native shell tool after a Gateway restart and resumed Codex app-server thread.

Risk before merge

  • The five-minute legacy-binding grace deliberately accepts a mismatched native hook relay generation, so maintainers need to accept that compatibility/security-boundary tradeoff before merge.
  • The provided proof is targeted tests, formatting, and type checks; I did not see a live Gateway restart plus resumed Codex app-server run artifact in the available context.

Maintainer options:

  1. Accept the bounded legacy grace (recommended)
    Maintainers can explicitly accept the five-minute mismatch grace as the upgrade compatibility path for old Codex app-server bindings and land the PR with the added regression coverage.
  2. Require live restart proof first
    Before landing, ask for a redacted live Gateway restart proof showing a resumed Codex session can run the first native tool call and still rejects stale generations outside the grace path.
  3. Narrow the security relaxation
    If the grace window is not acceptable, revise the patch to preserve restart recovery without accepting arbitrary mismatched generations during the bootstrap window.

Next step before merge
Manual review is appropriate because this protected maintainer PR fixes a P1 regression but needs a human decision on the bounded generation-mismatch grace path.

Security
Needs attention: The diff has no obvious supply-chain issue, but it intentionally relaxes native hook relay generation fencing for legacy bindings during a bounded grace window.

Review details

Best possible solution:

Land the focused fix after a maintainer accepts the bounded legacy grace path, preferably with redacted live restart proof if runtime confidence is needed beyond the regression tests.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection gives a high-confidence reproduction shape: current main rotates relay generations on registration and rejects cached hook commands with the old generation after resume. The linked reports provide concrete 2026.5.26 runtime symptoms, though I did not run a live Gateway restart in this read-only review.

Is this the best way to solve the issue?

Yes, the core repair direction is maintainable: persist the active generation for resumed threads, rotate on fresh-thread fallback, and cover old bindings with a bounded grace path. The grace-window tradeoff is the part that needs explicit maintainer acceptance.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9755241b56f0.

Label changes

Label changes:

  • add P1: The PR addresses a 2026.5.26 regression that can block native Codex shell/file tools across turns after a Gateway restart.
  • add merge-risk: 🚨 compatibility: The patch changes persisted Codex app-server binding state and upgrade behavior for old bindings without nativeHookRelayGeneration.
  • add merge-risk: 🚨 security-boundary: The patch intentionally permits mismatched relay generations during a bounded bootstrap grace window.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: This is a MEMBER/maintainer-labeled PR, so the external contributor real-behavior proof gate does not apply; the PR body lists targeted test, format, and typecheck commands but no live restart artifact.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: ⏳ waiting on author.

Label justifications:

  • P1: The PR addresses a 2026.5.26 regression that can block native Codex shell/file tools across turns after a Gateway restart.
  • merge-risk: 🚨 compatibility: The patch changes persisted Codex app-server binding state and upgrade behavior for old bindings without nativeHookRelayGeneration.
  • merge-risk: 🚨 security-boundary: The patch intentionally permits mismatched relay generations during a bounded bootstrap grace window.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: This is a MEMBER/maintainer-labeled PR, so the external contributor real-behavior proof gate does not apply; the PR body lists targeted test, format, and typecheck commands but no live restart artifact.
Evidence reviewed

PR surface:

Source +112, Tests +259. Total +371 across 10 files.

View PR surface stats
Area Files Added Removed Net
Source 7 137 25 +112
Tests 3 259 0 +259
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 10 396 25 +371

Security concerns:

  • [medium] Generation mismatch grace needs maintainer acceptance — src/agents/harness/native-hook-relay.ts:545
    During the grace window, a hook command with a mismatched generation is accepted for the same relay id; that is a deliberate compatibility tradeoff but it weakens the stale-registration fence while active.
    Confidence: 0.86

Acceptance criteria:

  • node scripts/run-vitest.mjs src/agents/harness/native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/thread-lifecycle.binding.test.ts --run
  • ./node_modules/.bin/oxfmt --check extensions/codex/src/app-server/attempt-startup.ts extensions/codex/src/app-server/native-hook-relay.ts extensions/codex/src/app-server/run-attempt-test-harness.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/session-binding.ts extensions/codex/src/app-server/thread-lifecycle.ts src/agents/harness/native-hook-relay.test.ts src/agents/harness/native-hook-relay.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo

What I checked:

Likely related people:

  • vincentkoc: Blame and -S history on current main point the random relay generation and strict stale-generation rejection to the current imported native-hook-relay implementation around the latest release state. (role: introduced behavior in current relay implementation; confidence: medium; commits: 46f5905498dc, 10ad3aa16068; files: src/agents/harness/native-hook-relay.ts, extensions/codex/src/app-server/run-attempt.ts)
  • steipete: Recent current-main history split the Codex app-server startup seams, and the PR timeline/commits show follow-up changes guarding generation reuse and fresh-thread rotation. (role: recent app-server lifecycle contributor and PR follow-up owner; confidence: high; commits: a4c2e7f5cf1b, 668590b0e8f6, b337411d2ddd; files: extensions/codex/src/app-server/attempt-startup.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress.

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.
What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@steipete steipete force-pushed the codex/native-hook-restart-generation branch from 2ab2be0 to b337411 Compare May 27, 2026 18:35
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex labels May 27, 2026
@RomneyDa RomneyDa marked this pull request as ready for review May 27, 2026 18:37
@clawsweeper clawsweeper Bot added status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 27, 2026
@steipete

Copy link
Copy Markdown
Contributor

Verification before merge:

Behavior addressed: Codex native hook relay generations are now persisted only for real resumed app-server threads; fresh-thread fallback rotates generation so stale hook subprocesses from abandoned threads remain rejected.

Real environment tested: local OpenClaw source checkout on macOS, plus GitHub PR checks on head b337411.

Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs src/agents/harness/native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/thread-lifecycle.binding.test.ts --run
  • ./node_modules/.bin/oxfmt --check extensions/codex/src/app-server/attempt-startup.ts extensions/codex/src/app-server/native-hook-relay.ts extensions/codex/src/app-server/run-attempt-test-harness.ts extensions/codex/src/app-server/run-attempt.native-hook-relay.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/session-binding.test.ts extensions/codex/src/app-server/session-binding.ts extensions/codex/src/app-server/thread-lifecycle.ts src/agents/harness/native-hook-relay.test.ts src/agents/harness/native-hook-relay.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo
  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Evidence after fix: Vitest passed 5 files / 182 tests; oxfmt passed; both tsgo probes passed; autoreview reported no accepted/actionable findings. PR checks passed for Real behavior proof, dependency-change-awareness, selected Security High jobs, Scan changed paths, Opengrep OSS, Socket, actionlint, label, no-tabs, and auto-response. CodeQL aggregate is neutral because selected shards were skipped.

Observed result after fix: resumed Codex app-server threads reuse the persisted relay generation; legacy pre-generation bindings get the bounded grace path; invalidated bindings and resume-failure fallbacks rotate generation and reject stale hook calls.

What was not tested: full release validation and live production Gateway restart were not rerun locally for this narrow PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5.26 regression: "Native hook relay unavailable" after relay re-register due to generation UUID staleness

2 participants