fix(codex): share native hook relay state#86340
Conversation
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Keep open for merge review. The branch addresses a real Codex native hook relay bug with source-aligned shared state, stale-generation protection, focused coverage, and sufficient exact-head Crabbox proof; I found no blocking correctness or security finding. The remaining merge question is maintainer acceptance of the hidden relay compatibility behavior plus normal required CI. Canonical path: Close this PR as superseded by #73950. So I’m closing this here and keeping the remaining discussion on #73950. Review detailsBest possible solution: Close this PR as superseded by #73950. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main still keeps relay state module-local and throws the reported relay-not-found error when the active id is missing from that module instance. I did not rerun the live current-main Telegram/Slack scenario, but the linked issue and comments provide repeated live logs. Is this the best way to solve the issue? Yes, the proposed direction is the narrow maintainable fix for the duplicate-module root cause: share relay-owned state once and require per-registration generation for stale stable ids. The only caveat is the deliberate fail-closed handling for generationless hidden commands across upgrades. Security review: Security review cleared: The security-sensitive native hook relay and approval path was reviewed; the diff tightens stale relay invocation and introduces no dependency, workflow, secret, package, or supply-chain change. AGENTS.md: found and applied where relevant. What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against a5d5604198c3. |
|
ClawSweeper applied the proposed close for this PR.
|
|
@clawsweeper reopen For maintainer context after the automated duplicate/superseded closure: I don’t think #73950 fully supersedes this branch in its current state. #73950 is still the original shared-registry direction, but as of this check it is stale/conflicting and its own ClawSweeper review says it needs changes before merge: rebase, include the This PR’s branch was built from current
If maintainers prefer #73950 as the canonical PR, the actionable path is probably to carry this branch’s additional generation/stale-id and app-server/gateway/CLI coverage into #73950. Otherwise, #86340 is the current complete branch I’d recommend reviewing for the fix. |
Problem
Codex native hook relays can be registered in one module instance and invoked from another module instance in the same process. When those instances each keep their own in-memory relay maps, OpenClaw can install Codex hook commands but later fail to find the active relay for the same run. A separate race also lets a late hook command for an old run reuse the stable relay id after a replacement run has registered.
Fixes #73723 and replaces the abandoned approach in #73950.
Change and value
This PR moves native hook relay registry state onto a process-global symbol so duplicate module instances share the same relay, bridge, invocation, and permission-approval state. It also adds a per-registration generation token to newly generated hidden hook commands and validates that token through the direct bridge, gateway fallback, and Codex app-server approval bridge.
The result is that current hook commands still reach the active relay, while stale commands fail closed instead of invoking a replacement registration with the same stable relay id.
Who is affected, and who is not
hooks relayremains a hidden internal command.buildNativeHookRelayCommand()and the hidden CLI still tolerate omitted generations so older already-written commands fail through the normal unavailable path rather than crashing at option parsing.Why now
#73723 is still open, and #73950 has gone stale. Recent cleanup work fixed one pending-unregister race, but it did not make duplicate module instances share relay state or prevent late stale commands from crossing into replacement registrations.
Implementation
Symbol.for("openclaw.nativeHookRelay.state")object.generationto active registrations while keeping the exported handle field optional for SDK/source compatibility.--generationin newly generated hidden hook commands.nativeHook.invoke, and the Codex app-server approval bridge's in-process relay invocation.Verification
git diff --check origin/main...HEAD./node_modules/.bin/oxfmt --check --threads=1 extensions/codex/src/app-server/approval-bridge.ts extensions/codex/src/app-server/approval-bridge.test.ts extensions/codex/src/app-server/native-hook-relay.test.ts src/agents/harness/native-hook-relay.ts src/agents/harness/native-hook-relay.test.ts src/cli/hooks-cli.ts src/cli/native-hook-relay-cli.ts src/cli/native-hook-relay-cli.test.ts src/gateway/server-methods/native-hook-relay.ts src/gateway/server-methods/native-hook-relay.test.tsorigin/main: no actionable findings.cbx_9c0044dd81ae, slugbrisk-hermit, GCPus-east1-b, Spote2-standard-8, no local sync.openclaw/openclaw, fetchedKaspre/openclawbranchfix/native-hook-relay-shared-state, checked out exact head2a37ac2fc5945c7a3020dfa92c47625ea51dbd67.src/agents/harness/native-hook-relay.test.ts,src/cli/native-hook-relay-cli.test.ts,src/gateway/server-methods/native-hook-relay.test.ts,extensions/codex/src/app-server/native-hook-relay.test.ts,extensions/codex/src/app-server/approval-bridge.test.ts(118tests across4Vitest shards).pnpm check:changedpassed on the same exact head.Real behavior proof
Behavior or issue addressed: Codex native hook relay state remains available across duplicate module instances, and a stale relay command from an old stable relay-id registration cannot invoke the replacement registration.
Real environment tested: Crabbox GCP Spot VM (
us-east1-b,e2-standard-8) from a fresh GitHub clone, checking out pushed fork head2a37ac2fc5945c7a3020dfa92c47625ea51dbd67.Exact steps or command run after this patch: On the Crabbox VM, cloned
https://github.com/openclaw/openclaw.git, fetchedhttps://github.com/Kaspre/openclaw.git fix/native-hook-relay-shared-state, checked outFETCH_HEAD, installed Node24.15.0andpnpm@11.0.8, then ran anode --import tsx --input-type=module -e ...script that importedsrc/agents/harness/native-hook-relay.tstwice, registered a relay in one module instance, invoked it from the duplicate module instance, replaced the same stable relay id, attempted a stale-generation bridge invocation, and then invoked the replacement generation. The same VM then ran the focused Vitest files andpnpm check:changed.Evidence after fix: Terminal capture from lease
cbx_9c0044dd81ae:Observed result after fix: The duplicate module successfully invoked the current relay (
exitCode=0), the stale generation was rejected withnative hook relay bridge stale registration, the replacement generation invoked successfully (exitCode=0), focused changed-path tests passed, andpnpm check:changedexited0.What was not tested: No changed-path behavior gap remains. The full unrelated repository test suite is left to GitHub CI; this PR ran live relay proof, focused changed-path tests, and
check:changedon the exact pushed head.AI Assistance
I used Codex to prepare the patch and verification.