Agents: run bundle MCP tools in embedded Pi#48611
Conversation
|
Follow-up fixes pushed after review:
Re-verified:
|
|
Second follow-up pushed after another review pass:
Re-verified:
|
1c2cae4 to
751beb2
Compare
Greptile SummaryThis PR implements the missing runtime half of bundle MCP support: enabled bundle Key findings from this review:
The disposal patterns in both Confidence Score: 2/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-bundle-mcp-tools.ts
Line: 171-173
Comment:
**No timeout on server connect / tool listing**
`client.connect(transport)` and `listAllTools(client)` are awaited without any timeout. The `@modelcontextprotocol/sdk` `StdioClientTransport` does not impose a startup or capability-negotiation deadline on its own. If a bundle MCP server process hangs at startup (e.g., waits for stdin, never writes its capabilities message, or enters an infinite loop), `createBundleMcpToolRuntime` will never resolve, blocking every embedded Pi run and compaction that calls it.
The `catch` block on line 207 only fires if the promise *rejects*, not if it *stalls*. A stalling process will keep the Pi agent suspended indefinitely.
Consider wrapping both calls with a `Promise.race` against a short deadline (e.g., 10 s) and treating the timeout as a startup failure that triggers the existing `logWarn + disposeSession` path:
```ts
const CONNECT_TIMEOUT_MS = 10_000;
async function withTimeout<T>(promise: Promise<T>, ms: number, label: string): Promise<T> {
let timer: ReturnType<typeof setTimeout>;
const timeout = new Promise<never>((_, reject) => {
timer = setTimeout(() => reject(new Error(`timed out after ${ms}ms: ${label}`)), ms);
});
try {
return await Promise.race([promise, timeout]);
} finally {
clearTimeout(timer);
}
}
// in the server loop:
await withTimeout(client.connect(transport), CONNECT_TIMEOUT_MS, `connect ${serverName}`);
const listedTools = await withTimeout(listAllTools(client), CONNECT_TIMEOUT_MS, `listTools ${serverName}`);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-bundle-mcp-tools.ts
Line: 194-204
Comment:
**No timeout on tool call execution**
`client.callTool(...)` is awaited with no deadline. If a bundle MCP server becomes unresponsive after startup (e.g., the subprocess hangs mid-execution), the Pi agent will stall indefinitely waiting for the tool result. This is separate from the startup timeout on `connect`/`listTools`.
The MCP SDK does not add a per-call timeout by default. A timeout here is especially important because tool calls happen inside an active agent turn, and a hung tool call will suspend the entire conversation loop.
Consider applying the same `withTimeout` helper described in the `connect`/`listTools` comment above, using a longer deadline appropriate for tool execution (e.g., the same timeout that guards the overall Pi attempt, or a configurable per-tool limit).
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-bundle-mcp-tools.ts
Line: 164-169
Comment:
**`attachStderrLogging` called before the transport process is spawned**
`attachStderrLogging` reads `transport.stderr` at construction time (line 168), but the `StdioClientTransport` only spawns the child process — and therefore only populates `transport.stderr` — inside its `start()` method, which is invoked by `client.connect(transport)` on line 172.
At line 168 `transport.stderr` is `undefined`, so `attachStderrLogging` hits the early-return guard (`if (!stderr || ...)`) and returns `undefined` for every server. As a result `session.detachStderr` is always `undefined` and no stderr output from bundle MCP servers is ever logged.
Move the `attachStderrLogging` call to after `client.connect(transport)` and assign the result back to `session.detachStderr`:
```ts
await client.connect(transport);
session.detachStderr = attachStderrLogging(serverName, transport);
const listedTools = await listAllTools(client);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/mcp-stdio.ts
Line: 34-40
Comment:
**Non-string args silently dropped; inconsistent with `toStringRecord`**
`toStringArray` filters out any non-`string` element — including numbers — without a warning. For example an MCP config with `args: ["--port", 8080]` would silently become `["--port"]`, causing the server to launch without the port argument.
By contrast, `toStringRecord` (which normalises the `env` map) explicitly converts numbers and booleans to strings:
```ts
if (typeof entry === "number" || typeof entry === "boolean") {
return [key, String(entry)] as const;
}
```
Either apply the same coercion to `toStringArray`, or emit a warning when non-string elements are encountered so the misconfiguration is visible in logs:
```ts
function toStringArray(value: unknown): string[] | undefined {
if (!Array.isArray(value)) return undefined;
const entries = value.map((entry) => {
if (typeof entry === "string") return entry;
if (typeof entry === "number" || typeof entry === "boolean") return String(entry);
return null; // unrepresentable — will be dropped with a log below
});
const result = entries.filter((e): e is string => e !== null);
return result;
}
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 751beb2 |
| try { | ||
| await client.connect(transport); | ||
| const listedTools = await listAllTools(client); |
There was a problem hiding this comment.
No timeout on server connect / tool listing
client.connect(transport) and listAllTools(client) are awaited without any timeout. The @modelcontextprotocol/sdk StdioClientTransport does not impose a startup or capability-negotiation deadline on its own. If a bundle MCP server process hangs at startup (e.g., waits for stdin, never writes its capabilities message, or enters an infinite loop), createBundleMcpToolRuntime will never resolve, blocking every embedded Pi run and compaction that calls it.
The catch block on line 207 only fires if the promise rejects, not if it stalls. A stalling process will keep the Pi agent suspended indefinitely.
Consider wrapping both calls with a Promise.race against a short deadline (e.g., 10 s) and treating the timeout as a startup failure that triggers the existing logWarn + disposeSession path:
const CONNECT_TIMEOUT_MS = 10_000;
async function withTimeout<T>(promise: Promise<T>, ms: number, label: string): Promise<T> {
let timer: ReturnType<typeof setTimeout>;
const timeout = new Promise<never>((_, reject) => {
timer = setTimeout(() => reject(new Error(`timed out after ${ms}ms: ${label}`)), ms);
});
try {
return await Promise.race([promise, timeout]);
} finally {
clearTimeout(timer);
}
}
// in the server loop:
await withTimeout(client.connect(transport), CONNECT_TIMEOUT_MS, `connect ${serverName}`);
const listedTools = await withTimeout(listAllTools(client), CONNECT_TIMEOUT_MS, `listTools ${serverName}`);Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-bundle-mcp-tools.ts
Line: 171-173
Comment:
**No timeout on server connect / tool listing**
`client.connect(transport)` and `listAllTools(client)` are awaited without any timeout. The `@modelcontextprotocol/sdk` `StdioClientTransport` does not impose a startup or capability-negotiation deadline on its own. If a bundle MCP server process hangs at startup (e.g., waits for stdin, never writes its capabilities message, or enters an infinite loop), `createBundleMcpToolRuntime` will never resolve, blocking every embedded Pi run and compaction that calls it.
The `catch` block on line 207 only fires if the promise *rejects*, not if it *stalls*. A stalling process will keep the Pi agent suspended indefinitely.
Consider wrapping both calls with a `Promise.race` against a short deadline (e.g., 10 s) and treating the timeout as a startup failure that triggers the existing `logWarn + disposeSession` path:
```ts
const CONNECT_TIMEOUT_MS = 10_000;
async function withTimeout<T>(promise: Promise<T>, ms: number, label: string): Promise<T> {
let timer: ReturnType<typeof setTimeout>;
const timeout = new Promise<never>((_, reject) => {
timer = setTimeout(() => reject(new Error(`timed out after ${ms}ms: ${label}`)), ms);
});
try {
return await Promise.race([promise, timeout]);
} finally {
clearTimeout(timer);
}
}
// in the server loop:
await withTimeout(client.connect(transport), CONNECT_TIMEOUT_MS, `connect ${serverName}`);
const listedTools = await withTimeout(listAllTools(client), CONNECT_TIMEOUT_MS, `listTools ${serverName}`);
```
How can I resolve this? If you propose a fix, please make it concise.| execute: async (_toolCallId, input) => { | ||
| const result = (await client.callTool({ | ||
| name: tool.name, | ||
| arguments: isRecord(input) ? input : {}, | ||
| })) as CallToolResult; | ||
| return toAgentToolResult({ | ||
| serverName, | ||
| toolName: tool.name, | ||
| result, | ||
| }); | ||
| }, |
There was a problem hiding this comment.
No timeout on tool call execution
client.callTool(...) is awaited with no deadline. If a bundle MCP server becomes unresponsive after startup (e.g., the subprocess hangs mid-execution), the Pi agent will stall indefinitely waiting for the tool result. This is separate from the startup timeout on connect/listTools.
The MCP SDK does not add a per-call timeout by default. A timeout here is especially important because tool calls happen inside an active agent turn, and a hung tool call will suspend the entire conversation loop.
Consider applying the same withTimeout helper described in the connect/listTools comment above, using a longer deadline appropriate for tool execution (e.g., the same timeout that guards the overall Pi attempt, or a configurable per-tool limit).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-bundle-mcp-tools.ts
Line: 194-204
Comment:
**No timeout on tool call execution**
`client.callTool(...)` is awaited with no deadline. If a bundle MCP server becomes unresponsive after startup (e.g., the subprocess hangs mid-execution), the Pi agent will stall indefinitely waiting for the tool result. This is separate from the startup timeout on `connect`/`listTools`.
The MCP SDK does not add a per-call timeout by default. A timeout here is especially important because tool calls happen inside an active agent turn, and a hung tool call will suspend the entire conversation loop.
Consider applying the same `withTimeout` helper described in the `connect`/`listTools` comment above, using a longer deadline appropriate for tool execution (e.g., the same timeout that guards the overall Pi attempt, or a configurable per-tool limit).
How can I resolve this? If you propose a fix, please make it concise.| const session: BundleMcpSession = { | ||
| serverName, | ||
| client, | ||
| transport, | ||
| detachStderr: attachStderrLogging(serverName, transport), | ||
| }; |
There was a problem hiding this comment.
attachStderrLogging called before the transport process is spawned
attachStderrLogging reads transport.stderr at construction time (line 168), but the StdioClientTransport only spawns the child process — and therefore only populates transport.stderr — inside its start() method, which is invoked by client.connect(transport) on line 172.
At line 168 transport.stderr is undefined, so attachStderrLogging hits the early-return guard (if (!stderr || ...)) and returns undefined for every server. As a result session.detachStderr is always undefined and no stderr output from bundle MCP servers is ever logged.
Move the attachStderrLogging call to after client.connect(transport) and assign the result back to session.detachStderr:
await client.connect(transport);
session.detachStderr = attachStderrLogging(serverName, transport);
const listedTools = await listAllTools(client);Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-bundle-mcp-tools.ts
Line: 164-169
Comment:
**`attachStderrLogging` called before the transport process is spawned**
`attachStderrLogging` reads `transport.stderr` at construction time (line 168), but the `StdioClientTransport` only spawns the child process — and therefore only populates `transport.stderr` — inside its `start()` method, which is invoked by `client.connect(transport)` on line 172.
At line 168 `transport.stderr` is `undefined`, so `attachStderrLogging` hits the early-return guard (`if (!stderr || ...)`) and returns `undefined` for every server. As a result `session.detachStderr` is always `undefined` and no stderr output from bundle MCP servers is ever logged.
Move the `attachStderrLogging` call to after `client.connect(transport)` and assign the result back to `session.detachStderr`:
```ts
await client.connect(transport);
session.detachStderr = attachStderrLogging(serverName, transport);
const listedTools = await listAllTools(client);
```
How can I resolve this? If you propose a fix, please make it concise.| function toStringArray(value: unknown): string[] | undefined { | ||
| if (!Array.isArray(value)) { | ||
| return undefined; | ||
| } | ||
| const entries = value.filter((entry): entry is string => typeof entry === "string"); | ||
| return entries.length > 0 ? entries : []; | ||
| } |
There was a problem hiding this comment.
Non-string args silently dropped; inconsistent with
toStringRecord
toStringArray filters out any non-string element — including numbers — without a warning. For example an MCP config with args: ["--port", 8080] would silently become ["--port"], causing the server to launch without the port argument.
By contrast, toStringRecord (which normalises the env map) explicitly converts numbers and booleans to strings:
if (typeof entry === "number" || typeof entry === "boolean") {
return [key, String(entry)] as const;
}Either apply the same coercion to toStringArray, or emit a warning when non-string elements are encountered so the misconfiguration is visible in logs:
function toStringArray(value: unknown): string[] | undefined {
if (!Array.isArray(value)) return undefined;
const entries = value.map((entry) => {
if (typeof entry === "string") return entry;
if (typeof entry === "number" || typeof entry === "boolean") return String(entry);
return null; // unrepresentable — will be dropped with a log below
});
const result = entries.filter((e): e is string => e !== null);
return result;
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/mcp-stdio.ts
Line: 34-40
Comment:
**Non-string args silently dropped; inconsistent with `toStringRecord`**
`toStringArray` filters out any non-`string` element — including numbers — without a warning. For example an MCP config with `args: ["--port", 8080]` would silently become `["--port"]`, causing the server to launch without the port argument.
By contrast, `toStringRecord` (which normalises the `env` map) explicitly converts numbers and booleans to strings:
```ts
if (typeof entry === "number" || typeof entry === "boolean") {
return [key, String(entry)] as const;
}
```
Either apply the same coercion to `toStringArray`, or emit a warning when non-string elements are encountered so the misconfiguration is visible in logs:
```ts
function toStringArray(value: unknown): string[] | undefined {
if (!Array.isArray(value)) return undefined;
const entries = value.map((entry) => {
if (typeof entry === "string") return entry;
if (typeof entry === "number" || typeof entry === "boolean") return String(entry);
return null; // unrepresentable — will be dropped with a log below
});
const result = entries.filter((e): e is string => e !== null);
return result;
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 751beb26ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const runtimeSupport = inspectBundleMcpRuntimeSupport({ | ||
| pluginId: record.id, | ||
| rootDir: record.rootDir, | ||
| bundleFormat: record.bundleFormat, | ||
| }); |
There was a problem hiding this comment.
Guard bundle MCP runtime inspection from parse failures
This inspection runs during plugin loading for every enabled bundle that advertises mcpServers, but inspectBundleMcpRuntimeSupport() reads/parses bundle MCP config and can throw on malformed JSON (for example a broken .mcp.json), which aborts loadOpenClawPlugins() instead of emitting a warning. In that scenario, a single bad bundle MCP file can take down plugin loading/startup rather than being skipped as a diagnostic, so this call site should catch inspection errors and downgrade them to plugin diagnostics.
Useful? React with 👍 / 👎.
| const bundleMcpRuntime = toolsEnabled | ||
| ? await createBundleMcpToolRuntime({ | ||
| workspaceDir: effectiveWorkspace, | ||
| cfg: params.config, |
There was a problem hiding this comment.
Dispose bundle MCP runtime when lock acquisition fails
The bundle MCP runtime is created before acquireSessionWriteLock, but cleanup is only performed in the inner finally that executes after the lock is acquired; if lock acquisition times out or throws (contention/stale-lock scenarios), these spawned MCP subprocesses are never disposed. That leaks child processes/resources on failed attempts, so disposal needs to be guaranteed in an outer finally or runtime creation should be moved after lock acquisition.
Useful? React with 👍 / 👎.
| const bundleMcpRuntime = toolsEnabled | ||
| ? await createBundleMcpToolRuntime({ | ||
| workspaceDir: effectiveWorkspace, | ||
| cfg: params.config, |
There was a problem hiding this comment.
Close compaction MCP runtime on session-lock errors
Compaction has the same lifecycle gap: the MCP runtime is started before acquireSessionWriteLock, while bundleMcpRuntime.dispose() only runs in the inner finally after lock acquisition. When lock acquisition fails (e.g., timeout under contention), the runtime is left running and leaks subprocesses, so cleanup must also be guaranteed for pre-lock failure paths.
Useful? React with 👍 / 👎.
* Agents: run bundle MCP tools in embedded Pi * Plugins: fix bundle MCP path resolution * Plugins: warn on unsupported bundle MCP transports * Commands: add embedded Pi MCP management * Config: move MCP management to top-level config
* Agents: run bundle MCP tools in embedded Pi * Plugins: fix bundle MCP path resolution * Plugins: warn on unsupported bundle MCP transports * Commands: add embedded Pi MCP management * Config: move MCP management to top-level config
* Agents: run bundle MCP tools in embedded Pi * Plugins: fix bundle MCP path resolution * Plugins: warn on unsupported bundle MCP transports * Commands: add embedded Pi MCP management * Config: move MCP management to top-level config
* Agents: run bundle MCP tools in embedded Pi * Plugins: fix bundle MCP path resolution * Plugins: warn on unsupported bundle MCP transports * Commands: add embedded Pi MCP management * Config: move MCP management to top-level config
* Agents: run bundle MCP tools in embedded Pi * Plugins: fix bundle MCP path resolution * Plugins: warn on unsupported bundle MCP transports * Commands: add embedded Pi MCP management * Config: move MCP management to top-level config
Summary
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
mcpServersnow expose runnable tools during embedded Pi agent turns.Security Impact (required)
Yes/No) YesYes/No) NoYes/No) NoYes/No) YesYes/No) NoYes, explain risk + mitigation:Repro + Verification
Environment
plugins.entries.bundle-probe.enabled=trueSteps
.mcp.jsonpointing at a local stdio MCP server script.Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
pnpm test -- src/agents/pi-bundle-mcp-tools.test.ts src/agents/pi-embedded-runner.bundle-mcp.e2e.test.ts src/plugins/bundle-mcp.test.ts src/agents/pi-project-settings.bundle.test.ts src/agents/pi-project-settings.test.tspnpm buildpnpm docs:check-i18n-glossarypnpm docs:check-linkscreateBundleMcpToolRuntime()against a real stdio MCP server script returningFROM-BUNDLEReview Conversations
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoFailure Recovery (if this breaks)
053580884dplugins.entriesRisks and Mitigations
createBundleMcpToolRuntime()bridge, and both isolated + e2e tests cover the runtime path.