Skip to content

fix: route embedded tool calls through in-process dispatch (#40237)#42497

Open
frankbuild wants to merge 2 commits intoopenclaw:mainfrom
frankbuild:fix/ws-self-contention-local-dispatch
Open

fix: route embedded tool calls through in-process dispatch (#40237)#42497
frankbuild wants to merge 2 commits intoopenclaw:mainfrom
frankbuild:fix/ws-self-contention-local-dispatch

Conversation

@frankbuild
Copy link
Copy Markdown
Contributor

Summary

Fixes #40237 (also resolves #5703 and #6508 which were circular-duped shut).

When agent tools like cron.run, cron.list, cron.add are called from within an active LLM session, callGatewayTool opens a new WebSocket connection to the same gateway. Because the gateway's Node.js event loop is busy processing the current LLM turn, the second WS connection sits in queue and times out after 10s.

This PR routes embedded tool calls through the existing in-process dispatch path (dispatchGatewayMethod) when running inside the gateway process, bypassing WS entirely. External CLI tool calls continue to use WS unchanged.

Changes

  • src/gateway/local-dispatch.ts (new): lightweight registry that the gateway server populates at startup. Exposes tryLocalGatewayDispatch which returns a promise when in-process dispatch is available, or undefined when running outside the gateway (e.g. CLI).

  • src/gateway/server-plugins.ts: registers dispatchGatewayMethod as the local dispatcher when setFallbackGatewayContext is called during gateway startup.

  • src/agents/tools/gateway.ts: callGatewayTool now checks tryLocalGatewayDispatch first. If available (inside gateway), dispatches in-process with zero WS overhead. Otherwise falls through to the existing WS path.

How it works

CLI tool call (external):
  callGatewayTool → tryLocalGatewayDispatch returns undefined → WS path (unchanged)

Embedded tool call (inside gateway LLM session):
  callGatewayTool → tryLocalGatewayDispatch returns Promise → dispatchGatewayMethod
    → handleGatewayRequest (in-process, no WS) → immediate response

The dispatchGatewayMethod function already exists and is battle-tested — it's used by plugin subagent runtime for sessions, agent dispatch, etc. This PR simply makes it available to agent tools via the registry pattern.

Test plan

  • Unit tests for local-dispatch.ts (registry behavior, reset, error propagation)
  • Integration tests in gateway.test.ts verifying:
    • Embedded tool calls use in-process dispatch (no WS callGateway mock called)
    • External tool calls fall back to WS when no dispatcher registered
    • undefined params correctly normalized to {}
  • Existing server-plugins.test.ts tests pass (5/5)
  • Type check clean (pnpm tsgo)
  • Lint/format clean (pnpm check)

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: S labels Mar 10, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 10, 2026

Greptile Summary

This PR fixes a real WebSocket self-contention bug (#40237) by routing embedded agent tool calls through in-process dispatch instead of opening a new WS connection. The local-dispatch.ts registry is well-designed and well-tested, and the integration point via setFallbackGatewayContext is correct.

However, the interception in callGatewayTool (lines 147-152 of gateway.ts) is too broad. Two critical issues need to be addressed before merge:

  1. Unconditional opts bypass: The early return skips resolveGatewayOptions, silently ignoring opts.gatewayUrl, opts.gatewayToken, and opts.timeoutMs. This breaks the supported use case of explicitly routing to a remote gateway while inside the gateway process, and breaks bash-tools.exec-approval-followup.ts which passes timeoutMs: 60_000 expecting it to be respected.

  2. expectFinal semantics lost: The extra?.expectFinal parameter is not forwarded to the local dispatcher. dispatchGatewayMethod only captures the first respond call, losing the completion-waiting semantics. This breaks bash-tools.exec-approval-followup.ts which calls with expectFinal: true to wait for the agent run to complete before continuing.

Missing test coverage: No test asserts that explicit gatewayUrl overrides reach the WS path, which would catch the first issue.

Confidence Score: 2/5

  • Not safe to merge as-is. Two logic bugs silently break existing caller semantics and routing overrides.
  • The core design (local-dispatch.ts registry and setFallbackGatewayContext integration) is sound and well-tested. However, the callGatewayTool guard in gateway.ts has two critical flaws: (1) it unconditionally bypasses opts without checking for explicit gatewayUrl overrides, breaking the supported use case of routing to a remote gateway from inside the gateway, and explicitly-passed timeoutMs; (2) it drops the expectFinal parameter, changing completion-waiting semantics for existing callers like bash-tools.exec-approval-followup.ts. Both issues affect real code paths and would need to be fixed before merge.
  • src/agents/tools/gateway.ts — the local dispatch guard at lines 147-152 must check opts.gatewayUrl and forward extra?.expectFinal to preserve existing semantics.

Last reviewed commit: af6f728

Comment thread src/agents/tools/gateway.ts
Comment thread src/agents/tools/gateway.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af6f728b21

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/tools/gateway.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23746d5191

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +154 to +158
if (!hasExplicitUrl && !needsExpectFinal) {
const localResult = tryLocalGatewayDispatch<T>(
method,
(params ?? {}) as Record<string, unknown>,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve least-privilege scopes for local dispatch

By returning early through tryLocalGatewayDispatch, this path skips resolveLeastPrivilegeOperatorScopesForMethod and executes requests as dispatchGatewayMethod's synthetic admin client, which changes authorization-sensitive behavior for existing flows. A concrete regression is the node exec timeout fallback: bash-tools.exec-host-node.ts retries node.invoke with approved=true/approvalDecision='allow-once' when approval times out, and sanitizeSystemRunParamsForForwarding only allows that override for admin/approvals clients; before this commit callGatewayTool("node.invoke") used operator.write and was rejected, but local dispatch now grants admin and allows the run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff85b436a6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +154 to +160
if (!hasExplicitUrl && !needsExpectFinal) {
const localResult = tryLocalGatewayDispatch<T>(
method,
(params ?? {}) as Record<string, unknown>,
);
if (localResult !== undefined) {
return await localResult;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve timeoutMs behavior in local dispatch path

When local dispatch is available, this branch returns tryLocalGatewayDispatch directly and never applies opts.timeoutMs, so embedded tool calls can block forever instead of failing fast like the WS path. This is a regression for callers that rely on transport timeouts to bound execution (for example src/agents/bash-tools.exec-host-node.ts passes invokeTimeoutMs to callGatewayTool("node.invoke", ...) specifically to cap long/stuck runs); in an in-process gateway session, that timeout is now ignored and the turn can hang indefinitely if the handler never responds.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43433d2b97

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +152 to +154
const hasExplicitUrl = trimToUndefined(opts.gatewayUrl) !== undefined;
const needsExpectFinal = extra?.expectFinal === true;
if (!hasExplicitUrl && !needsExpectFinal) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect explicit gatewayToken overrides in local dispatch

callGatewayTool now takes the in-process path whenever gatewayUrl is absent, but that decision ignores opts.gatewayToken. In embedded gateway runs (where a local dispatcher is registered), a call that explicitly supplies a token now skips resolveGatewayOptions/WS authentication and executes under ambient request credentials instead, so invalid or intentionally constrained tokens are silently ignored. This regresses auth override semantics for tools that expose gatewayToken and can make auth-sensitive calls succeed when they should fail.

Useful? React with 👍 / 👎.

@frankbuild
Copy link
Copy Markdown
Contributor Author

Hi @tyler6204 @joshavant — this PR addresses the WS self-contention bug (#40237, #5703) by routing embedded tool calls through in-process dispatch. Greptile flagged two concerns on the initial review (opts bypass and expectFinal semantics) — both have been addressed in subsequent commits. CI is green. Would appreciate a review when you get a chance 🙏

@frankbuild frankbuild force-pushed the fix/ws-self-contention-local-dispatch branch from 43433d2 to 44b0452 Compare March 14, 2026 09:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44b045211e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +155 to +158
const localResult = tryLocalGatewayDispatch<T>(
method,
(params ?? {}) as Record<string, unknown>,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize params before in-process dispatch

The new local path forwards params directly to dispatchGatewayMethod (tryLocalGatewayDispatch(method, (params ?? {})...)), which skips the JSON serialization behavior of the WS path. This changes semantics for existing callers that pass optional fields as undefined (for example exec.approval.request payloads built in src/agents/tools/nodes-tool.ts), because validators like validateExecApprovalRequestParams accept omitted fields but reject fields explicitly set to undefined. As a result, embedded gateway tool calls can now fail with INVALID_REQUEST even though the same call succeeds over WebSocket.

Useful? React with 👍 / 👎.

@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 27, 2026

Codex review: found issues before merge.

Summary
The PR adds an in-process gateway dispatch registry, registers it from gateway fallback setup, and makes embedded agent gateway tool calls prefer local dispatch for non-final calls.

Reproducibility: yes. for the PR regressions by source inspection: current startup uses setFallbackGatewayContextResolver, and the local branch would bypass the existing scope/auth/timeout/serialization path. I did not live-run the original cron timeout scenario.

Next step before merge
A maintainer should decide whether to repair this stale auth-sensitive branch or replace it with a rebased implementation before automation proceeds.

Security
Needs attention: The diff introduces an in-process gateway control-plane route that does not preserve the existing authorization, credential, and timeout contract.

Review findings

  • [P1] Register local dispatch on the resolver startup path — src/gateway/server-plugins.ts:44-47
  • [P1] Pass effective scopes into local dispatch — src/agents/tools/gateway.ts:152-158
  • [P1] Honor explicit auth and timeout options locally — src/agents/tools/gateway.ts:152-158
Review details

Best possible solution:

Rebase and rework the local dispatch adapter so it registers on the current resolver startup path and preserves effective scopes, explicit credential intent, bounded timeouts, JSON-frame normalization, and final-response behavior with focused regression tests.

Do we have a high-confidence way to reproduce the issue?

Yes for the PR regressions by source inspection: current startup uses setFallbackGatewayContextResolver, and the local branch would bypass the existing scope/auth/timeout/serialization path. I did not live-run the original cron timeout scenario.

Is this the best way to solve the issue?

No. In-process dispatch is a reasonable direction for the linked bug, but this implementation returns before the established gateway transport and authorization contract is applied and registers on a stale startup hook.

Full review comments:

  • [P1] Register local dispatch on the resolver startup path — src/gateway/server-plugins.ts:44-47
    Current main starts the gateway by calling setFallbackGatewayContextResolver, not setFallbackGatewayContext. Registering the dispatcher only from setFallbackGatewayContext means the production startup path will not enable the new local dispatch path after rebase, so the central fix will not run.
    Confidence: 0.91
  • [P1] Pass effective scopes into local dispatch — src/agents/tools/gateway.ts:152-158
    The local branch returns before extra.scopes or resolveLeastPrivilegeOperatorScopesForMethod(method) are applied. dispatchGatewayMethod then uses an ambient client or a synthetic write-scoped client, so embedded calls can run with different authorization than the WebSocket path or fail for admin methods such as cron.add and cron.run.
    Confidence: 0.9
  • [P1] Honor explicit auth and timeout options locally — src/agents/tools/gateway.ts:152-158
    The guard skips local dispatch for gatewayUrl and expectFinal, but calls with gatewayToken or timeoutMs still take the local path. That ignores explicit credential intent and removes the bounded timeout behavior that callers rely on for stuck gateway methods.
    Confidence: 0.88
  • [P2] Normalize params before in-process dispatch — src/agents/tools/gateway.ts:154-155
    The local path forwards params directly instead of the JSON request frame used by the WebSocket path. Optional properties set to undefined can therefore reach validators as present invalid values even though the same call over WebSocket would omit them.
    Confidence: 0.83

Overall correctness: patch is incorrect
Overall confidence: 0.93

Security concerns:

  • [high] Local dispatch bypasses caller authorization intent — src/agents/tools/gateway.ts:152
    The early local return can execute a gateway method before resolving explicit tokens, explicit scope overrides, or least-privilege scopes, so embedded calls may run under ambient or synthetic credentials rather than the caller's intended authorization contract.
    Confidence: 0.88
  • [medium] Local dispatch skips bounded request timeout — src/agents/tools/gateway.ts:152
    Because timeoutMs is not applied on the local path, a stuck in-process handler can block an active agent turn instead of failing under the timeout behavior used by the WebSocket request path.
    Confidence: 0.84

Acceptance criteria:

  • pnpm test src/agents/tools/gateway.test.ts src/gateway/server-plugins.test.ts src/gateway/client.watchdog.test.ts src/gateway/server.node-invoke-approval-bypass.test.ts
  • pnpm test src/gateway/gateway.test.ts
  • pnpm check:changed

What I checked:

Likely related people:

  • vincentkoc: Current-main blame for the central gateway tool and server dispatch paths points to commit 257c5a5 by Vincent Koc, including callGatewayTool and fallback dispatch context code. (role: recent maintainer; confidence: medium; commits: 257c5a517fa1; files: src/agents/tools/gateway.ts, src/gateway/server-plugins.ts, src/gateway/server.impl.ts)
  • joshavant: The PR author explicitly routed the gateway self-contention fix to this maintainer in the PR discussion; I did not find stronger local code-history evidence in the inspected checkout. (role: requested reviewer; confidence: low; files: src/agents/tools/gateway.ts, src/gateway/server-plugins.ts)
  • tyler6204: The PR author explicitly routed the gateway self-contention fix to this maintainer in the PR discussion; I did not find stronger local code-history evidence in the inspected checkout. (role: requested reviewer; confidence: low; files: src/agents/tools/gateway.ts, src/gateway/server-plugins.ts)

Remaining risk / open question:

  • The branch is stale against current main's fallback-context resolver startup path.
  • The diff changes gateway authorization, credential, and timeout behavior in an embedded control-plane path.
  • The original WS self-contention scenario was not live-reproduced in this read-only pass.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 392897304cc5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime size: M

Projects

None yet

1 participant