Skip to content

[Bug]: Native Hook Relay — Stale Relay After Gateway Restart/Drain #89325

@NexusLoom

Description

@NexusLoom

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

After any gateway restart or drain, the main Codex session loses all native shell tools (exec, read, etc) with Native hook relay unavailable. Only a further openclaw gateway restart recovers it, and it recurs. Sub-sessions are unaffected because they create fresh relay descriptors.

Steps to reproduce

NOT_ENOUGH_INFO

Expected behavior

NOT_ENOUGH_INFO

Actual behavior

NOT_ENOUGH_INFO

OpenClaw version

2026.5.26

Operating system

Windows 10 WSL2

Install method

docker

Model

openai/gpt-5.5 codex runtime

Provider / routing chain

openclaw > openai through oauth

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: me
Severity: annoying and blocks workflows
Frequency: was every time after gateway restart, workaround seems to have resolved but will fall over on openclaw update
Consequence: Agent has to spin up a fresh sub-agent for basic tasks.

Additional information

Verbatim output from agent:

Bug: Native Hook Relay — Stale Relay After Gateway Restart/Drain
OpenClaw 2026.5.26 · Node v24.15.0 · WSL2

Summary
After any gateway restart or drain, the main Codex session loses all native shell tools (exec, read, etc) with Native hook relay unavailable. Only a further openclaw gateway restart recovers it, and it recurs. Sub-sessions are unaffected because they create fresh relay descriptors.

Root cause — two related issues:

Issue A — Stale descriptor survival:
openclaw-clean-stale-native-relays only quarantines descriptors whose PID is dead. Because of PID reuse, a descriptor for the old gateway can survive pointing at the wrong process. The check needs to verify the live PID is actually an OpenClaw gateway, not any process.

Issue B — Hard deny on relay unavailability:
When the relay bridge is unavailable (stale, port mismatch, restarting), the Codex approval path in hooks-cli-.js / vision-tools-.js throws Native hook relay unavailable and blocks the command as a hard deny rather than falling back to normal OpenClaw policy. A transient relay liveness issue causes permanent session denial for the life of the Codex run.

Suggested fixes:

  1. Relay cleanup: verify PID is alive and running an OpenClaw gateway process, not just any PID.
  2. Fallback behaviour: relay unavailability should fall back to normal policy eval, not hard-block.
  3. Session re-registration: allow existing Codex sessions to refresh their relay descriptor after gateway restart.

Workaround applied locally (dist patches, will be overwritten by updates):

• hooks-cli-Cy_Rqd6n.js — retry via gateway instead of immediately returning unavailable
• vision-tools-sfmDmVa9.js and @openclaw/codex/dist/vision-tools-DqpLmF5H.js — fall back to normal policy on relay failure
• openclaw-clean-stale-native-relays — also quarantines wrong-process PIDs

Post-patch: openclaw hooks check 7/7 ready.

Impact: Happened 4+ times in a single session (2026-06-02). Blocks all native tools; requires manual user intervention each time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.bugSomething isn't workingbug:behaviorIncorrect behavior without a crashclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions