fix: route embedded tool calls through in-process dispatch (#40237)#42497
fix: route embedded tool calls through in-process dispatch (#40237)#42497frankbuild wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a real WebSocket self-contention bug (#40237) by routing embedded agent tool calls through in-process dispatch instead of opening a new WS connection. The However, the interception in
Missing test coverage: No test asserts that explicit Confidence Score: 2/5
Last reviewed commit: af6f728 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af6f728b21
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23746d5191
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!hasExplicitUrl && !needsExpectFinal) { | ||
| const localResult = tryLocalGatewayDispatch<T>( | ||
| method, | ||
| (params ?? {}) as Record<string, unknown>, | ||
| ); |
There was a problem hiding this comment.
Preserve least-privilege scopes for local dispatch
By returning early through tryLocalGatewayDispatch, this path skips resolveLeastPrivilegeOperatorScopesForMethod and executes requests as dispatchGatewayMethod's synthetic admin client, which changes authorization-sensitive behavior for existing flows. A concrete regression is the node exec timeout fallback: bash-tools.exec-host-node.ts retries node.invoke with approved=true/approvalDecision='allow-once' when approval times out, and sanitizeSystemRunParamsForForwarding only allows that override for admin/approvals clients; before this commit callGatewayTool("node.invoke") used operator.write and was rejected, but local dispatch now grants admin and allows the run.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff85b436a6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!hasExplicitUrl && !needsExpectFinal) { | ||
| const localResult = tryLocalGatewayDispatch<T>( | ||
| method, | ||
| (params ?? {}) as Record<string, unknown>, | ||
| ); | ||
| if (localResult !== undefined) { | ||
| return await localResult; |
There was a problem hiding this comment.
Preserve timeoutMs behavior in local dispatch path
When local dispatch is available, this branch returns tryLocalGatewayDispatch directly and never applies opts.timeoutMs, so embedded tool calls can block forever instead of failing fast like the WS path. This is a regression for callers that rely on transport timeouts to bound execution (for example src/agents/bash-tools.exec-host-node.ts passes invokeTimeoutMs to callGatewayTool("node.invoke", ...) specifically to cap long/stuck runs); in an in-process gateway session, that timeout is now ignored and the turn can hang indefinitely if the handler never responds.
Useful? React with 👍 / 👎.
ff85b43 to
43433d2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43433d2b97
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const hasExplicitUrl = trimToUndefined(opts.gatewayUrl) !== undefined; | ||
| const needsExpectFinal = extra?.expectFinal === true; | ||
| if (!hasExplicitUrl && !needsExpectFinal) { |
There was a problem hiding this comment.
Respect explicit gatewayToken overrides in local dispatch
callGatewayTool now takes the in-process path whenever gatewayUrl is absent, but that decision ignores opts.gatewayToken. In embedded gateway runs (where a local dispatcher is registered), a call that explicitly supplies a token now skips resolveGatewayOptions/WS authentication and executes under ambient request credentials instead, so invalid or intentionally constrained tokens are silently ignored. This regresses auth override semantics for tools that expose gatewayToken and can make auth-sensitive calls succeed when they should fail.
Useful? React with 👍 / 👎.
|
Hi @tyler6204 @joshavant — this PR addresses the WS self-contention bug (#40237, #5703) by routing embedded tool calls through in-process dispatch. Greptile flagged two concerns on the initial review (opts bypass and expectFinal semantics) — both have been addressed in subsequent commits. CI is green. Would appreciate a review when you get a chance 🙏 |
43433d2 to
44b0452
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44b045211e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const localResult = tryLocalGatewayDispatch<T>( | ||
| method, | ||
| (params ?? {}) as Record<string, unknown>, | ||
| ); |
There was a problem hiding this comment.
Normalize params before in-process dispatch
The new local path forwards params directly to dispatchGatewayMethod (tryLocalGatewayDispatch(method, (params ?? {})...)), which skips the JSON serialization behavior of the WS path. This changes semantics for existing callers that pass optional fields as undefined (for example exec.approval.request payloads built in src/agents/tools/nodes-tool.ts), because validators like validateExecApprovalRequestParams accept omitted fields but reject fields explicitly set to undefined. As a result, embedded gateway tool calls can now fail with INVALID_REQUEST even though the same call succeeds over WebSocket.
Useful? React with 👍 / 👎.
|
This pull request has been automatically marked as stale due to inactivity. |
|
Codex review: found issues before merge. Summary Reproducibility: yes. for the PR regressions by source inspection: current startup uses setFallbackGatewayContextResolver, and the local branch would bypass the existing scope/auth/timeout/serialization path. I did not live-run the original cron timeout scenario. Next step before merge Security Review findings
Review detailsBest possible solution: Rebase and rework the local dispatch adapter so it registers on the current resolver startup path and preserves effective scopes, explicit credential intent, bounded timeouts, JSON-frame normalization, and final-response behavior with focused regression tests. Do we have a high-confidence way to reproduce the issue? Yes for the PR regressions by source inspection: current startup uses setFallbackGatewayContextResolver, and the local branch would bypass the existing scope/auth/timeout/serialization path. I did not live-run the original cron timeout scenario. Is this the best way to solve the issue? No. In-process dispatch is a reasonable direction for the linked bug, but this implementation returns before the established gateway transport and authorization contract is applied and registers on a stale startup hook. Full review comments:
Overall correctness: patch is incorrect Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 392897304cc5. |
Summary
Fixes #40237 (also resolves #5703 and #6508 which were circular-duped shut).
When agent tools like
cron.run,cron.list,cron.addare called from within an active LLM session,callGatewayToolopens a new WebSocket connection to the same gateway. Because the gateway's Node.js event loop is busy processing the current LLM turn, the second WS connection sits in queue and times out after 10s.This PR routes embedded tool calls through the existing in-process dispatch path (
dispatchGatewayMethod) when running inside the gateway process, bypassing WS entirely. External CLI tool calls continue to use WS unchanged.Changes
src/gateway/local-dispatch.ts(new): lightweight registry that the gateway server populates at startup. ExposestryLocalGatewayDispatchwhich returns a promise when in-process dispatch is available, orundefinedwhen running outside the gateway (e.g. CLI).src/gateway/server-plugins.ts: registersdispatchGatewayMethodas the local dispatcher whensetFallbackGatewayContextis called during gateway startup.src/agents/tools/gateway.ts:callGatewayToolnow checkstryLocalGatewayDispatchfirst. If available (inside gateway), dispatches in-process with zero WS overhead. Otherwise falls through to the existing WS path.How it works
The
dispatchGatewayMethodfunction already exists and is battle-tested — it's used by plugin subagent runtime for sessions, agent dispatch, etc. This PR simply makes it available to agent tools via the registry pattern.Test plan
local-dispatch.ts(registry behavior, reset, error propagation)gateway.test.tsverifying:callGatewaymock called)undefinedparams correctly normalized to{}server-plugins.test.tstests pass (5/5)pnpm tsgo)pnpm check)