Conversation
🔒 Aisle Security AnalysisWe found 8 potential security issue(s) in this PR:
1. 🟠 Approval requests not bound to current thread/turn in Codex app-server approval bridge
Description
Vulnerable code: function matchesCurrentTurn(requestParams: JsonObject | undefined, threadId: string, turnId: string): boolean {
if (!requestParams) {
return true;
}
const requestThreadId = readString(requestParams, "threadId") ?? readString(requestParams, "conversationId");
const requestTurnId = readString(requestParams, "turnId");
if (requestThreadId && requestThreadId !== threadId) return false;
if (requestTurnId && requestTurnId !== turnId) return false;
return true;
}RecommendationFail closed and require turn binding for approval requests. Suggested approach:
Example fix: function matchesCurrentTurn(
requestParams: JsonObject | undefined,
threadId: string,
turnId: string,
): boolean {
if (!requestParams) return false;
const requestThreadId =
readString(requestParams, "threadId") ?? readString(requestParams, "conversationId");
const requestTurnId = readString(requestParams, "turnId");
if (!requestThreadId || !requestTurnId) return false;
return requestThreadId === threadId && requestTurnId === turnId;
}This ensures approvals cannot be processed unless they are explicitly associated with the active thread/turn. 2. 🟠 Codex app-server runs with approvals disabled by default (approvalPolicy="never")
Description
Vulnerable code (defaulting to no approvals): function resolveAppServerApprovalPolicy(): "never" | "on-request" | "on-failure" | "untrusted" {
const raw = process.env.OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY?.trim();
if (raw === "on-request" || raw === "on-failure" || raw === "untrusted") {
return raw;
}
return "never";
}And it is used here: turn = await client.request("turn/start", {
...,
approvalPolicy: resolveAppServerApprovalPolicy(),
});RecommendationMake the default approval policy safe-by-default (typically Suggested change: function resolveAppServerApprovalPolicy(): "never" | "on-request" | "on-failure" | "untrusted" {
const raw = process.env.OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY?.trim();
if (raw === "never" || raw === "on-request" || raw === "on-failure" || raw === "untrusted") {
return raw;
}
// Safe default
return "on-request";
}Additionally:
3. 🟠 Over-broad permission granting via unvalidated `item/permissions/requestApproval` payload (Codex app-server approval bridge)
Description
Vulnerable code: if (method === "item/permissions/requestApproval") {
if (outcome === "approved-session" || outcome === "approved-once") {
return {
permissions: requestedPermissions(requestParams),
scope: outcome === "approved-session" ? "session" : "turn",
};
}
return { permissions: {}, scope: "turn" };
}
function requestedPermissions(requestParams: JsonObject | undefined): JsonObject {
const permissions = isJsonObject(requestParams?.permissions) ? requestParams.permissions : {};
const granted: JsonObject = {};
if (isJsonObject(permissions.network)) {
granted.network = permissions.network;
}
if (isJsonObject(permissions.fileSystem)) {
granted.fileSystem = permissions.fileSystem;
}
return granted;
}Additionally, the approval prompt context built in RecommendationTreat the app-server permission request as untrusted and only grant permissions that pass a strict schema + policy.
Example (sketch): import { z } from "zod";
const PermissionRequestSchema = z.object({
permissions: z.object({
network: z
.object({ allowedHosts: z.array(z.string()).max(100) })
.strict()
.optional(),
fileSystem: z
.object({ allowedPaths: z.array(z.string()).max(100), mode: z.enum(["read","write","readwrite"]) })
.strict()
.optional(),
}).strict(),
}).strict();
function requestedPermissions(requestParams: JsonObject | undefined): JsonObject {
const parsed = PermissionRequestSchema.safeParse(requestParams);
if (!parsed.success) return {};
return parsed.data.permissions;
}If permissive behavior is required for compatibility, at minimum log/deny requests that contain unknown keys or wildcards, and show the full permission JSON in the approval description. 4. 🟠 Untrusted tool-provided media URLs can trigger local file read in webchat audio embedding
Description
Impact:
Vulnerable code (propagation): const mediaUrls = Array.from(
new Set(params.toolMediaUrls?.map((url) => url.trim()).filter(Boolean) ?? []),
);
...
const mergedMediaUrls = Array.from(new Set([...(payload.mediaUrls ?? []), ...mediaUrls]));
...
mediaUrls: mergedMediaUrls.length ? mergedMediaUrls : undefined,
mediaUrl: payload.mediaUrl ?? mergedMediaUrls[0],Downstream sink (for context, not in this diff): RecommendationTreat tool-returned media URLs as untrusted input and enforce a strict allowlist before merging them into outbound payloads. Suggested fixes (choose one or combine):
function isSafeOutboundMediaUrl(raw: string): boolean {
const u = raw.trim();
if (!u) return false;
if (u.startsWith("file:")) return false;
if (u.startsWith("/")) return false; // blocks absolute paths
if (/^[a-zA-Z]:\\/.test(u)) return false; // blocks Windows paths
try {
const parsed = new URL(u);
return parsed.protocol === "https:"; // or allow "http:" only if you have SSRF protections
} catch {
return false;
}
}
const mediaUrls = Array.from(new Set(
(params.toolMediaUrls ?? []).filter(isSafeOutboundMediaUrl)
));
5. 🟠 Codex harness fallback can send prompts to chatgpt.com via PI provider transport
DescriptionWhen running with Key points:
Vulnerable code (fallback to PI on harness failure): try {
return await harness.runAttempt(params);
} catch (error) {
if (policy.runtime !== "auto" || policy.fallback === "none") {
throw error;
}
log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error });
return createPiAgentHarness().runAttempt(params);
}This can result in unintended external network calls and potential disclosure of user prompts, tool outputs, or other sensitive session data to RecommendationPrevent Codex runs from ever using the generic PI provider transport unless explicitly opted-in. Options:
// before falling back
if (params.provider.trim().toLowerCase() === "codex") {
throw error; // or require fallback:"none" for codex
}
Also consider setting the default embedded harness policy for Codex agents to 6. 🟡 Unbounded JSON-RPC stdio parsing and unbounded pending requests can cause resource exhaustion
Description
There are no limits or backpressure safeguards:
Vulnerable code: this.lines = createInterface({ input: child.stdout });
this.lines.on("line", (line) => this.handleLine(line));
...
parsed = JSON.parse(trimmed);
...
this.pending.set(id, { ... });Because the app-server binary and args can be overridden via environment variables ( RecommendationAdd explicit resource limits and timeouts at the transport/client layer:
Example (sketch): request<T>(method: string, params?: JsonValue, opts: { timeoutMs?: number } = {}): Promise<T> {
const id = this.nextId++;
return new Promise<T>((resolve, reject) => {
const timeout = opts.timeoutMs
? setTimeout(() => {
this.pending.delete(id);
reject(new Error(`${method} timed out`));
}, opts.timeoutMs)
: undefined;
this.pending.set(id, {
method,
resolve: (v) => { if (timeout) clearTimeout(timeout); resolve(v as T); },
reject: (e) => { if (timeout) clearTimeout(timeout); reject(e); },
});
this.writeMessage({ id, method, params });
});
}
These changes ensure a misbehaving app-server cannot force unbounded memory/CPU growth in the host process. 7. 🟡 Sensitive data exposure via verbatim logging of Codex app-server stderr
DescriptionThe Codex app-server child process
Vulnerable code: child.stderr.on("data", (chunk: Buffer | string) => {
const text = chunk.toString("utf8").trim();
if (text) {
embeddedAgentLog.debug(`codex app-server stderr: ${text}`);
}
});RecommendationAvoid logging external process stderr verbatim, or ensure it is sanitized and bounded. Options:
Example (bounded + redaction): import { redact } from "../../../logging/redact.js";
child.stderr.on("data", (chunk) => {
const raw = chunk.toString("utf8");
const bounded = raw.slice(0, 2000); // prevent huge log lines
const safe = redact(bounded).trim();
if (safe) {
embeddedAgentLog.debug("codex app-server stderr (redacted): " + safe);
}
});Also consider logging to 8. 🟡 Unbounded transcript file read/parsing enables memory/CPU DoS during transcript mirroring
Description
Vulnerable code: raw = await fs.readFile(sessionFile, "utf8");
for (const line of raw.split(/\r?\n/)) {
...
const parsed = JSON.parse(line)
...
}RecommendationAvoid loading and splitting the entire transcript file on every run. Options:
Example (streaming): import fs from "node:fs";
import readline from "node:readline";
async function readTranscriptIdempotencyKeys(sessionFile: string): Promise<Set<string>> {
const keys = new Set<string>();
try {
const stream = fs.createReadStream(sessionFile, { encoding: "utf8" });
const rl = readline.createInterface({ input: stream, crlfDelay: Infinity });
for await (const line of rl) {
if (!line.trim()) continue;
try {
const parsed = JSON.parse(line) as { message?: { idempotencyKey?: unknown } };
if (typeof parsed.message?.idempotencyKey === "string") keys.add(parsed.message.idempotencyKey);
} catch {
// ignore malformed lines
}
}
} catch (e: any) {
if (e?.code !== "ENOENT") throw e;
}
return keys;
}Analyzed PR: #64298 at commit Last updated on: 2026-04-10T20:04:06Z |
PR SummaryHigh Risk Overview Implements the Codex app-server bridge and safety plumbing. The extension adds a JSON-RPC client with version gating, dynamic tool bridging (including media artifact telemetry), approval bridging via gateway approval tools (fail-closed), per-session thread binding persistence, transcript mirroring with idempotency/locking, and compaction completion waiting. SDK/docs/test plumbing updates. Exposes a new Reviewed by Cursor Bugbot for commit 9da2afb. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR refactors the agent harness into a plugin-owned extension by introducing a new The implementation follows established patterns for other pluggable registries (compaction providers, memory embedding providers) and correctly handles cache snapshots, non-activating loads, and error rollback. Confidence Score: 5/5Safe to merge; the harness registry, selection, plugin loader integration, and Codex extension are all well-structured and consistent with existing patterns. The only finding is a P2 style observation: clearAgentHarnesses and restoreRegisteredAgentHarnesses do not call dispose() on evicted entries. No current harness implements dispose(), so there is no practical impact today. All other aspects — registry lifecycle, cache save/restore, non-activating load rollback, activating reload clear, agentHarnessIds tracking, harness selection priority, and Pi fallback — are correctly implemented and tested. src/agents/harness/registry.ts — dispose() hook is declared on AgentHarness but never invoked during clear/restore cycles. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/harness/registry.ts
Line: 55-63
Comment:
**`dispose()` not called on cleared harnesses**
Both `clearAgentHarnesses` and `restoreRegisteredAgentHarnesses` silently drop existing harness entries without invoking their optional `dispose()` hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the `AgentHarness` type explicitly declares `dispose?()` it is more likely to be implemented.
`restoreRegisteredAgentHarnesses` has the same pattern and would benefit from calling `dispose()` on evicted entries before the `map.clear()`. Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "test: expand Codex harness Docker live s..." | Re-trigger Greptile |
|
|
||
| export function restoreRegisteredAgentHarnesses(entries: RegisteredAgentHarness[]): void { | ||
| const map = getAgentHarnessRegistryState().harnesses; | ||
| map.clear(); | ||
| for (const entry of entries) { | ||
| map.set(entry.harness.id, entry); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
dispose() not called on cleared harnesses
Both clearAgentHarnesses and restoreRegisteredAgentHarnesses silently drop existing harness entries without invoking their optional dispose() hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the AgentHarness type explicitly declares dispose?() it is more likely to be implemented.
restoreRegisteredAgentHarnesses has the same pattern and would benefit from calling dispose() on evicted entries before the map.clear(). Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/harness/registry.ts
Line: 55-63
Comment:
**`dispose()` not called on cleared harnesses**
Both `clearAgentHarnesses` and `restoreRegisteredAgentHarnesses` silently drop existing harness entries without invoking their optional `dispose()` hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the `AgentHarness` type explicitly declares `dispose?()` it is more likely to be implemented.
`restoreRegisteredAgentHarnesses` has the same pattern and would benefit from calling `dispose()` on evicted entries before the `map.clear()`. Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bad1277f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| replayMetadata: { | ||
| hadPotentialSideEffects: toolTelemetry.didSendViaMessagingTool, | ||
| replaySafe: !toolTelemetry.didSendViaMessagingTool, |
There was a problem hiding this comment.
Derive replay safety from all mutating tool side effects
replayMetadata is currently marked unsafe only when a messaging tool sends, but this harness can also mutate state via command/file-change tool items and cron adds. Because incomplete-turn handling and retry decisions rely on hadPotentialSideEffects, a Codex run that executes bash/apply_patch/cron without messaging can be mislabeled replaySafe, which can trigger unsafe retries or omit side-effect warnings after partial/failed turns.
Useful? React with 👍 / 👎.
| const normalizedHarness = { | ||
| ...harness, | ||
| id, | ||
| pluginId: harness.pluginId ?? record.id, | ||
| }; |
There was a problem hiding this comment.
Reserve the built-in
pi harness id during registration
The new registration path accepts any non-empty harness id and immediately stores it, so a plugin can register id: "pi". Since harness selection checks registered harnesses before appending the built-in PI harness, forcing OPENCLAW_AGENT_RUNTIME=pi can resolve to plugin code instead of the built-in backend, breaking the expected kill-switch/fallback behavior for PI mode.
Useful? React with 👍 / 👎.
9540884 to
5d4789e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d4789e690
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .map((harness) => ({ | ||
| harness, | ||
| support: harness.supports({ | ||
| provider: params.provider, |
There was a problem hiding this comment.
Isolate failing harness support checks
Wrap harness.supports(...) in a per-harness try/catch so one buggy plugin harness cannot abort selection for the whole run. In auto mode this loop evaluates every registered harness, so if any supports implementation throws (for example due to missing plugin config), selection throws before PI fallback is chosen and the agent run fails entirely instead of degrading to another harness.
Useful? React with 👍 / 👎.
| if ( | ||
| !isMessagingTool(params.toolName) || | ||
| !isMessagingToolSendAction(params.toolName, params.args) | ||
| ) { | ||
| return; |
There was a problem hiding this comment.
Record messaging side effects only on successful sends
collectToolTelemetry marks messaging sends as successful even when the tool call threw, because the messaging branch ignores isError. When a message/sessions_send call fails, this still sets didSendViaMessagingTool and target/text metadata, which can suppress normal assistant output and mislabel replay safety as if a send occurred. Gate this branch on successful execution (the same way cron/media telemetry is gated).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c35e17a549
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } catch (error) { | ||
| if (runtime !== "auto") { | ||
| throw error; | ||
| } | ||
| log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error }); | ||
| return createPiAgentHarness().runAttempt(params); |
There was a problem hiding this comment.
Gate PI fallback on replay-safe harness failures
In auto mode this catch block reruns the same turn through PI after any plugin harness exception, but the harness may have already executed mutating tools, approvals, or channel sends before throwing. Replaying immediately through PI can duplicate side effects on the same user input (for example duplicate messages or command executions). The fallback should only happen when the failing harness can prove the turn is replay-safe; otherwise the original error should be surfaced.
Useful? React with 👍 / 👎.
| await projector.handleNotification(notification); | ||
| if ( | ||
| notification.method === "turn/completed" && | ||
| isTurnNotification(notification.params, turnId) | ||
| ) { | ||
| completed = true; | ||
| resolveCompletion?.(); |
There was a problem hiding this comment.
Resolve completion even if notification projection fails
Completion is resolved only after projector.handleNotification(...) returns, so if projection throws (for example from a callback invoked inside the projector), a turn/completed notification is effectively dropped and the run waits until timeout/abort even though the turn actually finished. Because the client swallows handler rejections, this turns real completions into spurious timeouts; completion detection should run even when projection fails.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb3f54b909
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| turn = await client.request<CodexTurnStartResponse>("turn/start", { | ||
| threadId: thread.threadId, | ||
| input: buildUserInput(params), | ||
| cwd: effectiveWorkspace, | ||
| approvalPolicy: resolveAppServerApprovalPolicy(), |
There was a problem hiding this comment.
Cover turn/start with the attempt timeout
timeoutMs does not protect the turn/start RPC: the call at this site is awaited directly, while the run timeout and abort-interrupt wiring are only installed afterward. If the app-server hangs before returning the turn id (for example, server-side deadlock during turn creation), this attempt can block indefinitely and ignore the configured attempt timeout/cancel path, leaving the session stuck until the process exits.
Useful? React with 👍 / 👎.
| const threadId = readString(params, "threadId"); | ||
| const turnId = readString(params, "turnId"); | ||
| return (!threadId || threadId === this.threadId) && (!turnId || turnId === this.turnId); |
There was a problem hiding this comment.
Scope notifications to explicit turn identity
This matcher treats notifications with no threadId/turnId as belonging to the current run. Because runs share a singleton Codex client (getSharedCodexAppServerClient), a global error notification without ids can be accepted by every active projector and set promptError on unrelated turns, causing spurious cross-session failures under concurrent runs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92e8c7908d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| close(): void { | ||
| this.closed = true; | ||
| this.lines.close(); | ||
| if (!this.child.killed) { | ||
| this.child.kill?.(); |
There was a problem hiding this comment.
Reject pending RPCs when closing the Codex app-server client
close() marks the client as closed before process exit/error handling runs, but pending requests are only rejected in closeWithError(), which early-returns when closed is already true. If close() is called while requests are in flight (for example during startup timeout/reset paths), those Promises can remain unresolved indefinitely, leaking waiters and leaving callers hung instead of receiving a failure.
Useful? React with 👍 / 👎.
| } catch (error) { | ||
| clearSharedCodexAppServerClient(); | ||
| params.abortSignal?.removeEventListener("abort", abortFromUpstream); |
There was a problem hiding this comment.
Keep startup failures from tearing down unrelated active runs
This catch block unconditionally clears the shared Codex app-server client for the whole process. Because the client is a singleton used by concurrent attempts, a startup failure in one run will close the transport used by other in-progress runs, causing unrelated sessions to fail or time out even though their own turn is healthy.
Useful? React with 👍 / 👎.
| if (!requestParams) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Require explicit turn identity for approval request routing
This matcher treats approval requests with missing ids as belonging to the current run. With multiple request handlers attached to the shared client, an approval request lacking threadId/turnId can be consumed by the wrong run and resolved against the wrong session context, which is especially risky for command/file approval decisions.
Useful? React with 👍 / 👎.
| stream: "codex_app_server.guardian", | ||
| data: { method: notification.method }, | ||
| }); | ||
| break; |
There was a problem hiding this comment.
Guardian review count double-increments per review cycle
Medium Severity
guardianReviewCount is incremented for both item/autoApprovalReview/started and item/autoApprovalReview/completed notifications. A single guardian review cycle produces both events, so the count reflects events rather than distinct reviews. This inflated count means didSendDeterministicApprovalPrompt at line 179 evaluates to false instead of undefined even when intended to represent "no reviews happened."
Reviewed by Cursor Bugbot for commit 0cd597daeb31136843bb27fd5022d824ab9a2b0b. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9da2afb. Configure here.
| if (!this.child.killed) { | ||
| this.child.kill?.(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Client close() leaves pending requests permanently unresolved
High Severity
The close() method sets this.closed = true before killing the child process, but never rejects pending requests. When the child's exit event fires and triggers closeWithError, the guard if (this.closed) { return; } causes it to bail out without rejecting any pending promises. Any in-flight requests (e.g., turn/start, model/list) will hang forever. This can be triggered when clearSharedCodexAppServerClient() is called from run-attempt.ts on startup failure while requests are still pending.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9da2afb. Configure here.
|
Landed via GitHub rebase merge. Gates:
Source head: 9da2afb Thanks @steipete! |


Summary
Tests
Notes