Skip to content

Refactor agent harness into Codex extension#64298

Merged
steipete merged 20 commits intomainfrom
app-server
Apr 10, 2026
Merged

Refactor agent harness into Codex extension#64298
steipete merged 20 commits intomainfrom
app-server

Conversation

@steipete
Copy link
Copy Markdown
Contributor

Summary

  • add the plugin-owned agent harness registry and Codex extension harness
  • move Codex app-server plumbing out of core-specific surfaces
  • bridge Codex app-server tools, approvals, compaction, media artifacts, and Docker/live coverage

Tests

  • pnpm check
  • pnpm test extensions/codex/app-server/compact.test.ts
  • pnpm test src/plugins/loader.test.ts -t "clears plugin agent harnesses during activating reloads"
  • pnpm test src/agents/harness/registry.test.ts
  • pnpm plugin-sdk:api:check
  • OPENCLAW_LIVE_CODEX_HARNESS=1 OPENCLAW_LIVE_CODEX_HARNESS_DEBUG=1 OPENCLAW_LIVE_CODEX_HARNESS_MODEL=codex/gpt-5.4 pnpm test:live src/gateway/gateway-codex-harness.live.test.ts
  • OPENCLAW_LIVE_CODEX_HARNESS_DEBUG=1 OPENCLAW_LIVE_CODEX_HARNESS_MODEL=codex/gpt-5.4 bash scripts/test-live-codex-harness-docker.sh

Notes

  • Docker live smoke covers text continuity, image attachment, and cron MCP/tool call via the Codex harness.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 10, 2026

🔒 Aisle Security Analysis

We found 8 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Approval requests not bound to current thread/turn in Codex app-server approval bridge
2 🟠 High Codex app-server runs with approvals disabled by default (approvalPolicy="never")
3 🟠 High Over-broad permission granting via unvalidated item/permissions/requestApproval payload (Codex app-server approval bridge)
4 🟠 High Untrusted tool-provided media URLs can trigger local file read in webchat audio embedding
5 🟠 High Codex harness fallback can send prompts to chatgpt.com via PI provider transport
6 🟡 Medium Unbounded JSON-RPC stdio parsing and unbounded pending requests can cause resource exhaustion
7 🟡 Medium Sensitive data exposure via verbatim logging of Codex app-server stderr
8 🟡 Medium Unbounded transcript file read/parsing enables memory/CPU DoS during transcript mirroring
1. 🟠 Approval requests not bound to current thread/turn in Codex app-server approval bridge
Property Value
Severity High
CWE CWE-285
Location extensions/codex/src/app-server/approval-bridge.ts:169-187

Description

handleCodexAppServerApprovalRequest() attempts to ensure an approval request is for the current Codex thread/turn via matchesCurrentTurn(), but the matching logic is fail-open when identifiers are missing.

  • If requestParams is missing or not a JSON object, matchesCurrentTurn() returns true.
  • If threadId/turnId are absent from requestParams, the function also returns true.
  • As a result, an out-of-order/stale approval request (or a malicious/compromised app-server) can submit an approval request without IDs and have it treated as belonging to the current turn, potentially tricking an operator into approving an unrelated command/file/permission action.

Vulnerable code:

function matchesCurrentTurn(requestParams: JsonObject | undefined, threadId: string, turnId: string): boolean {
  if (!requestParams) {
    return true;
  }
  const requestThreadId = readString(requestParams, "threadId") ?? readString(requestParams, "conversationId");
  const requestTurnId = readString(requestParams, "turnId");
  if (requestThreadId && requestThreadId !== threadId) return false;
  if (requestTurnId && requestTurnId !== turnId) return false;
  return true;
}

Recommendation

Fail closed and require turn binding for approval requests.

Suggested approach:

  • Require requestParams to be a JSON object.
  • Require both threadId (or conversationId) and turnId to be present and match the active turn.
  • If the protocol legitimately omits these fields for some methods, consider:
    • rejecting those methods entirely, or
    • requiring an alternative unforgeable binding (e.g., a per-turn nonce established when turn/start returns).

Example fix:

function matchesCurrentTurn(
  requestParams: JsonObject | undefined,
  threadId: string,
  turnId: string,
): boolean {
  if (!requestParams) return false;

  const requestThreadId =
    readString(requestParams, "threadId") ?? readString(requestParams, "conversationId");
  const requestTurnId = readString(requestParams, "turnId");

  if (!requestThreadId || !requestTurnId) return false;
  return requestThreadId === threadId && requestTurnId === turnId;
}

This ensures approvals cannot be processed unless they are explicitly associated with the active thread/turn.

2. 🟠 Codex app-server runs with approvals disabled by default (approvalPolicy="never")
Property Value
Severity High
CWE CWE-862
Location extensions/codex/src/app-server/run-attempt.ts:521-185

Description

runCodexAppServerAttempt() starts Codex app-server threads/turns with an approval policy resolved from environment, but defaults to "never" when unset.

  • The approvalPolicy value is sent to Codex app-server in both thread/start and turn/start.
  • When set to "never", Codex app-server may proceed with native command execution / file change / permission requests without ever issuing */requestApproval RPCs, meaning OpenClaw’s handleCodexAppServerApprovalRequest() path is bypassed.
  • This creates a risky insecure-by-default posture: simply enabling/auto-selecting the Codex app-server harness can result in actions occurring without user approval prompts.

Vulnerable code (defaulting to no approvals):

function resolveAppServerApprovalPolicy(): "never" | "on-request" | "on-failure" | "untrusted" {
  const raw = process.env.OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY?.trim();
  if (raw === "on-request" || raw === "on-failure" || raw === "untrusted") {
    return raw;
  }
  return "never";
}

And it is used here:

turn = await client.request("turn/start", {
  ...,
  approvalPolicy: resolveAppServerApprovalPolicy(),
});

Recommendation

Make the default approval policy safe-by-default (typically "on-request"), and require an explicit opt-out to disable approvals.

Suggested change:

function resolveAppServerApprovalPolicy(): "never" | "on-request" | "on-failure" | "untrusted" {
  const raw = process.env.OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY?.trim();
  if (raw === "never" || raw === "on-request" || raw === "on-failure" || raw === "untrusted") {
    return raw;
  }// Safe default
  return "on-request";
}

Additionally:

  • Consider refusing to start the Codex app-server harness unless approvals are enabled, unless a clearly-named “I understand the risk” configuration flag is set.
  • Add logging/telemetry when approvals are disabled, to make the security posture explicit.
3. 🟠 Over-broad permission granting via unvalidated `item/permissions/requestApproval` payload (Codex app-server approval bridge)
Property Value
Severity High
CWE CWE-269
Location extensions/codex/src/app-server/approval-bridge.ts:155-160

Description

buildApprovalResponse() grants permissions for item/permissions/requestApproval by directly reflecting requestParams.permissions.network and requestParams.permissions.fileSystem back to the app-server without any schema validation or allowlisting.

  • Input: requestParams is app-server-provided JSON (untrusted boundary)
  • Logic flaw: requestedPermissions() only checks that network / fileSystem are JSON objects, then copies them verbatim
  • Result: if the user approves the request (potentially based on an approval prompt that does not enumerate requested permissions), the app-server can obtain arbitrary network / filesystem scopes encoded in those objects.

Vulnerable code:

if (method === "item/permissions/requestApproval") {
  if (outcome === "approved-session" || outcome === "approved-once") {
    return {
      permissions: requestedPermissions(requestParams),
      scope: outcome === "approved-session" ? "session" : "turn",
    };
  }
  return { permissions: {}, scope: "turn" };
}

function requestedPermissions(requestParams: JsonObject | undefined): JsonObject {
  const permissions = isJsonObject(requestParams?.permissions) ? requestParams.permissions : {};
  const granted: JsonObject = {};
  if (isJsonObject(permissions.network)) {
    granted.network = permissions.network;
  }
  if (isJsonObject(permissions.fileSystem)) {
    granted.fileSystem = permissions.fileSystem;
  }
  return granted;
}

Additionally, the approval prompt context built in buildApprovalContext() does not include the requested permission details (it only uses command, reason, or method name), which increases the risk of a user approving overly-broad permissions without understanding the scope.

Recommendation

Treat the app-server permission request as untrusted and only grant permissions that pass a strict schema + policy.

  1. Validate and normalize the requested permission objects (deep validation, types, bounds):

    • Define explicit shapes for network (e.g., allowlisted hosts, ports, protocols) and fileSystem (e.g., explicit paths, read/write flags).
    • Reject unknown keys and excessively broad wildcards.
  2. Enforce an allowlist / policy at the bridge (or gateway) level so even an approved request cannot exceed configured maximums.

  3. Render full requested scopes in the approval UI prompt (include normalized permissions in description), so approvals are informed.

Example (sketch):

import { z } from "zod";

const PermissionRequestSchema = z.object({
  permissions: z.object({
    network: z
      .object({ allowedHosts: z.array(z.string()).max(100) })
      .strict()
      .optional(),
    fileSystem: z
      .object({ allowedPaths: z.array(z.string()).max(100), mode: z.enum(["read","write","readwrite"]) })
      .strict()
      .optional(),
  }).strict(),
}).strict();

function requestedPermissions(requestParams: JsonObject | undefined): JsonObject {
  const parsed = PermissionRequestSchema.safeParse(requestParams);
  if (!parsed.success) return {};
  return parsed.data.permissions;
}

If permissive behavior is required for compatibility, at minimum log/deny requests that contain unknown keys or wildcards, and show the full permission JSON in the approval description.

4. 🟠 Untrusted tool-provided media URLs can trigger local file read in webchat audio embedding
Property Value
Severity High
CWE CWE-73
Location src/agents/pi-embedded-runner/run/tool-media-payloads.ts:7-31

Description

mergeAttemptToolMediaPayloads() merges attempt.toolMediaUrls (collected from tool results) into outbound reply payloads without validation. These payload mediaUrls are later consumed by the webchat path (buildWebchatAudioContentBlocksFromReplyPayloads() in src/gateway/server-methods/chat-webchat-media.ts), which will treat absolute paths / file: URLs as local files and fs.readFileSync() them when they look like audio.

Impact:

  • A malicious/compromised tool can return mediaUrl(s) like file:/.../secret.mp3 or /.../secret.mp3.
  • The runner will propagate these into outbound payloads.
  • Webchat will read and base64-embed the referenced local file into the assistant message content, disclosing server-local files to the client.

Vulnerable code (propagation):

const mediaUrls = Array.from(
  new Set(params.toolMediaUrls?.map((url) => url.trim()).filter(Boolean) ?? []),
);
...
const mergedMediaUrls = Array.from(new Set([...(payload.mediaUrls ?? []), ...mediaUrls]));
...
mediaUrls: mergedMediaUrls.length ? mergedMediaUrls : undefined,
mediaUrl: payload.mediaUrl ?? mergedMediaUrls[0],

Downstream sink (for context, not in this diff): src/gateway/server-methods/chat-webchat-media.ts reads absolute paths / file: URLs and calls fs.readFileSync() for audio.

Recommendation

Treat tool-returned media URLs as untrusted input and enforce a strict allowlist before merging them into outbound payloads.

Suggested fixes (choose one or combine):

  1. Allow only safe URL schemes (e.g. https: and optionally data:) and explicitly reject file: and absolute/relative paths:
function isSafeOutboundMediaUrl(raw: string): boolean {
  const u = raw.trim();
  if (!u) return false;
  if (u.startsWith("file:")) return false;
  if (u.startsWith("/")) return false; // blocks absolute paths
  if (/^[a-zA-Z]:\\/.test(u)) return false; // blocks Windows paths
  try {
    const parsed = new URL(u);
    return parsed.protocol === "https:"; // or allow "http:" only if you have SSRF protections
  } catch {
    return false;
  }
}

const mediaUrls = Array.from(new Set(
  (params.toolMediaUrls ?? []).filter(isSafeOutboundMediaUrl)
));
  1. If local media is required, gate it behind an explicit capability/allowlist (e.g. only allow files within a dedicated media directory) and validate path traversal with path.resolve().

  2. Add a second line of defense at the sink (buildWebchatAudioContentBlocksFromReplyPayloads) to require an allowlisted directory or a signed token rather than arbitrary paths/file: URLs.

5. 🟠 Codex harness fallback can send prompts to chatgpt.com via PI provider transport
Property Value
Severity High
CWE CWE-201
Location src/agents/harness/selection.ts:129-137

Description

When running with OPENCLAW_AGENT_RUNTIME=auto (default) and PI fallback enabled, failures in the Codex agent harness can fall back to the embedded PI backend. Because the Codex provider advertises an external baseUrl (https://chatgpt.com/backend-api) and uses the openai-codex-responses transport, this fallback path can cause the PI transport to make outbound HTTPS requests to chatgpt.com and include the full prompt/messages in the request.

Key points:

  • The Codex provider config sets baseUrl to an external service (chatgpt.com/backend-api).
  • runAgentHarnessAttemptWithFallback() falls back to PI on any harness error when runtime is auto.
  • PI's OpenAI Responses transport uses model.baseUrl for the OpenAI client base URL, so it will send the request body (prompt/messages) to that host.
  • The provider also defines resolveSyntheticAuth() returning a non-secret placeholder token; if this (or some other resolved token) is used in the fallback request, it may create confusing/unsafe authentication behavior, but more importantly the request still contains sensitive prompt content.

Vulnerable code (fallback to PI on harness failure):

try {
  return await harness.runAttempt(params);
} catch (error) {
  if (policy.runtime !== "auto" || policy.fallback === "none") {
    throw error;
  }
  log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error });
  return createPiAgentHarness().runAttempt(params);
}

This can result in unintended external network calls and potential disclosure of user prompts, tool outputs, or other sensitive session data to chatgpt.com.

Recommendation

Prevent Codex runs from ever using the generic PI provider transport unless explicitly opted-in.

Options:

  1. Disable PI fallback when provider is Codex (recommended default):
// before falling back
if (params.provider.trim().toLowerCase() === "codex") {
  throw error; // or require fallback:"none" for codex
}
  1. Introduce a provider-level guard in the PI embedded runner to refuse using openai-codex-responses (or baseUrl containing chatgpt.com) unless an explicit allowlist flag is enabled.

  2. Change Codex provider catalog to not expose an external baseUrl/transport when it is intended to be harness-only (e.g., mark as harness-only, or set a dummy baseUrl and ensure transport cannot run).

Also consider setting the default embedded harness policy for Codex agents to fallback: "none" so harness errors fail closed instead of falling back to an outbound network path.

6. 🟡 Unbounded JSON-RPC stdio parsing and unbounded pending requests can cause resource exhaustion
Property Value
Severity Medium
CWE CWE-400
Location extensions/codex/src/app-server/client.ts:64-146

Description

CodexAppServerClient spawns an app-server and processes newline-delimited JSON-RPC messages from stdout using readline.

There are no limits or backpressure safeguards:

  • Unbounded message size / line length: readline.createInterface({ input: child.stdout }) will buffer until a newline is seen. A malicious or buggy app-server can emit an extremely long line (or never emit a newline), causing large memory usage and high CPU during line.trim() and JSON.parse().
  • Unbounded in-flight requests: request() inserts an entry into pending but there is no per-request timeout and no cap on pending size. If the app-server stops responding (or responses are withheld), the pending map can grow without bound (e.g., repeated turn/steer calls), leading to memory growth.

Vulnerable code:

this.lines = createInterface({ input: child.stdout });
this.lines.on("line", (line) => this.handleLine(line));
...
parsed = JSON.parse(trimmed);
...
this.pending.set(id, { ... });

Because the app-server binary and args can be overridden via environment variables (OPENCLAW_CODEX_APP_SERVER_BIN, OPENCLAW_CODEX_APP_SERVER_ARGS), this is reachable in deployments where environment/config is not fully trusted, and it is also a robustness issue even when trusted (buggy server).

Recommendation

Add explicit resource limits and timeouts at the transport/client layer:

  1. Enforce a maximum message size (and optionally a maximum JSON nesting depth):
  • Replace readline with a small framing parser that buffers data up to MAX_MESSAGE_BYTES and rejects/terminates the child if exceeded.
  • Alternatively, wrap child.stdout with a Transform that enforces a maximum line length.
  1. Add per-request timeouts and/or a maximum pending cap:
  • Allow request(method, params, { timeoutMs }) and reject (and delete) the pending entry on timeout.
  • Consider a global cap such as MAX_PENDING_REQUESTS; if exceeded, reject new requests or close the client.

Example (sketch):

request<T>(method: string, params?: JsonValue, opts: { timeoutMs?: number } = {}): Promise<T> {
  const id = this.nextId++;
  return new Promise<T>((resolve, reject) => {
    const timeout = opts.timeoutMs
      ? setTimeout(() => {
          this.pending.delete(id);
          reject(new Error(`${method} timed out`));
        }, opts.timeoutMs)
      : undefined;

    this.pending.set(id, {
      method,
      resolve: (v) => { if (timeout) clearTimeout(timeout); resolve(v as T); },
      reject: (e) => { if (timeout) clearTimeout(timeout); reject(e); },
    });

    this.writeMessage({ id, method, params });
  });
}
  1. Consider rate-limiting / truncating stderr logging to avoid log flooding.

These changes ensure a misbehaving app-server cannot force unbounded memory/CPU growth in the host process.

7. 🟡 Sensitive data exposure via verbatim logging of Codex app-server stderr
Property Value
Severity Medium
CWE CWE-532
Location extensions/codex/src/app-server/client.ts:78-83

Description

The Codex app-server child process stderr stream is logged verbatim at debug level. stderr output from external processes commonly contains sensitive information (API tokens, auth headers, file paths, user prompts, command lines, stack traces).

  • Input/source: chunk data from child.stderr (external process output)
  • Sink: embeddedAgentLog.debug(...) without redaction or length limiting
  • If debug logging is enabled (e.g., via config/env), these values may be written to console and/or persisted to log files, causing information disclosure.

Vulnerable code:

child.stderr.on("data", (chunk: Buffer | string) => {
  const text = chunk.toString("utf8").trim();
  if (text) {
    embeddedAgentLog.debug(`codex app-server stderr: ${text}`);
  }
});

Recommendation

Avoid logging external process stderr verbatim, or ensure it is sanitized and bounded.

Options:

  • Log only a short, redacted summary (and/or only on non-zero exit).
  • Apply a secret redaction function before logging.
  • Add a configuration flag to enable raw stderr logging only for local debugging.

Example (bounded + redaction):

import { redact } from "../../../logging/redact.js";

child.stderr.on("data", (chunk) => {
  const raw = chunk.toString("utf8");
  const bounded = raw.slice(0, 2000); // prevent huge log lines
  const safe = redact(bounded).trim();
  if (safe) {
    embeddedAgentLog.debug("codex app-server stderr (redacted): " + safe);
  }
});

Also consider logging to trace and/or gating with an explicit env var like OPENCLAW_LOG_CODEX_STDERR=1.

8. 🟡 Unbounded transcript file read/parsing enables memory/CPU DoS during transcript mirroring
Property Value
Severity Medium
CWE CWE-400
Location extensions/codex/src/app-server/transcript-mirror.ts:65-80

Description

mirrorCodexAppServerTranscript() calls readTranscriptIdempotencyKeys() which reads the entire session transcript JSONL file into memory and parses it line-by-line on every mirror attempt.

  • Input/control: sessionFile is provided via EmbeddedRunAttemptParams.sessionFile (passed through from run orchestration), and its contents are whatever is on disk.
  • Issue: fs.readFile(sessionFile, "utf8") loads the whole file, then raw.split(/\r?\n/) duplicates memory and iterates JSON.parse for each line.
  • Impact: A very large transcript (or a transcript intentionally bloated with many lines / large lines) can cause high memory usage and significant CPU time each time transcript mirroring runs, potentially leading to process slowdown or out-of-memory termination.

Vulnerable code:

raw = await fs.readFile(sessionFile, "utf8");
for (const line of raw.split(/\r?\n/)) {
  ...
  const parsed = JSON.parse(line)
  ...
}

Recommendation

Avoid loading and splitting the entire transcript file on every run.

Options:

  1. Stream the JSONL file line-by-line using fs.createReadStream + readline to keep memory bounded.
  2. Add a maximum file size / maximum lines processed for idempotency scanning (e.g., only scan the last N MB/lines).
  3. Persist idempotency keys separately (sidecar index file) so you don't need to re-scan the full transcript.

Example (streaming):

import fs from "node:fs";
import readline from "node:readline";

async function readTranscriptIdempotencyKeys(sessionFile: string): Promise<Set<string>> {
  const keys = new Set<string>();
  try {
    const stream = fs.createReadStream(sessionFile, { encoding: "utf8" });
    const rl = readline.createInterface({ input: stream, crlfDelay: Infinity });
    for await (const line of rl) {
      if (!line.trim()) continue;
      try {
        const parsed = JSON.parse(line) as { message?: { idempotencyKey?: unknown } };
        if (typeof parsed.message?.idempotencyKey === "string") keys.add(parsed.message.idempotencyKey);
      } catch {// ignore malformed lines
      }
    }
  } catch (e: any) {
    if (e?.code !== "ENOENT") throw e;
  }
  return keys;
}

Analyzed PR: #64298 at commit 9da2afb

Last updated on: 2026-04-10T20:04:06Z

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 10, 2026

PR Summary

High Risk
Adds a new low-level agent execution path (Codex app-server harness) including tool routing, approvals, session/thread binding, and compaction, which can affect core agent behavior and safety controls. Also introduces new provider/model discovery and protocol/version gating that may fail in production if the external Codex CLI/app-server differs from assumptions.

Overview
Adds a bundled Codex runtime as a plugin-owned agent harness + provider. New extensions/codex registers api.registerAgentHarness(...) and api.registerProvider(...) so codex/gpt-* model refs use a Codex-managed app-server for auth/session threads, model discovery, tool calls, and native compaction, while openai/gpt-* remains on the existing OpenAI path.

Implements the Codex app-server bridge and safety plumbing. The extension adds a JSON-RPC client with version gating, dynamic tool bridging (including media artifact telemetry), approval bridging via gateway approval tools (fail-closed), per-session thread binding persistence, transcript mirroring with idempotency/locking, and compaction completion waiting.

SDK/docs/test plumbing updates. Exposes a new openclaw/plugin-sdk/agent-harness export, updates docs to describe agent harness plugins and runEmbeddedAgent(...) naming, adds live/docker smoke coverage and a protocol-drift check script, and includes small config/labeler and SSRF-policy compatibility tweaks.

Reviewed by Cursor Bugbot for commit 9da2afb. Bugbot is set up for automated code reviews on this repo. Configure here.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime scripts Repository scripts docker Docker and sandbox tooling agents Agent runtime and tooling size: XL maintainer Maintainer-authored PR labels Apr 10, 2026
Comment thread extensions/codex/src/app-server/run-attempt.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 10, 2026

Greptile Summary

This PR refactors the agent harness into a plugin-owned extension by introducing a new AgentHarness registry (src/agents/harness/), wiring it into the plugin loader lifecycle (save/restore/clear on activating and snapshot reloads), and moving all Codex app-server plumbing into a bundled extensions/codex package. A new openclaw/plugin-sdk/agent-harness subpath exposes the required types and helpers for the Codex extension without direct src/** imports.

The implementation follows established patterns for other pluggable registries (compaction providers, memory embedding providers) and correctly handles cache snapshots, non-activating loads, and error rollback.

Confidence Score: 5/5

Safe to merge; the harness registry, selection, plugin loader integration, and Codex extension are all well-structured and consistent with existing patterns.

The only finding is a P2 style observation: clearAgentHarnesses and restoreRegisteredAgentHarnesses do not call dispose() on evicted entries. No current harness implements dispose(), so there is no practical impact today. All other aspects — registry lifecycle, cache save/restore, non-activating load rollback, activating reload clear, agentHarnessIds tracking, harness selection priority, and Pi fallback — are correctly implemented and tested.

src/agents/harness/registry.ts — dispose() hook is declared on AgentHarness but never invoked during clear/restore cycles.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/harness/registry.ts
Line: 55-63

Comment:
**`dispose()` not called on cleared harnesses**

Both `clearAgentHarnesses` and `restoreRegisteredAgentHarnesses` silently drop existing harness entries without invoking their optional `dispose()` hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the `AgentHarness` type explicitly declares `dispose?()` it is more likely to be implemented.

`restoreRegisteredAgentHarnesses` has the same pattern and would benefit from calling `dispose()` on evicted entries before the `map.clear()`. Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test: expand Codex harness Docker live s..." | Re-trigger Greptile

Comment on lines +55 to +63

export function restoreRegisteredAgentHarnesses(entries: RegisteredAgentHarness[]): void {
const map = getAgentHarnessRegistryState().harnesses;
map.clear();
for (const entry of entries) {
map.set(entry.harness.id, entry);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 dispose() not called on cleared harnesses

Both clearAgentHarnesses and restoreRegisteredAgentHarnesses silently drop existing harness entries without invoking their optional dispose() hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the AgentHarness type explicitly declares dispose?() it is more likely to be implemented.

restoreRegisteredAgentHarnesses has the same pattern and would benefit from calling dispose() on evicted entries before the map.clear(). Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/harness/registry.ts
Line: 55-63

Comment:
**`dispose()` not called on cleared harnesses**

Both `clearAgentHarnesses` and `restoreRegisteredAgentHarnesses` silently drop existing harness entries without invoking their optional `dispose()` hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the `AgentHarness` type explicitly declares `dispose?()` it is more likely to be implemented.

`restoreRegisteredAgentHarnesses` has the same pattern and would benefit from calling `dispose()` on evicted entries before the `map.clear()`. Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bad1277f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +163 to +165
replayMetadata: {
hadPotentialSideEffects: toolTelemetry.didSendViaMessagingTool,
replaySafe: !toolTelemetry.didSendViaMessagingTool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive replay safety from all mutating tool side effects

replayMetadata is currently marked unsafe only when a messaging tool sends, but this harness can also mutate state via command/file-change tool items and cron adds. Because incomplete-turn handling and retry decisions rely on hadPotentialSideEffects, a Codex run that executes bash/apply_patch/cron without messaging can be mislabeled replaySafe, which can trigger unsafe retries or omit side-effect warnings after partial/failed turns.

Useful? React with 👍 / 👎.

Comment thread src/plugins/registry.ts
Comment on lines +548 to +552
const normalizedHarness = {
...harness,
id,
pluginId: harness.pluginId ?? record.id,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reserve the built-in pi harness id during registration

The new registration path accepts any non-empty harness id and immediately stores it, so a plugin can register id: "pi". Since harness selection checks registered harnesses before appending the built-in PI harness, forcing OPENCLAW_AGENT_RUNTIME=pi can resolve to plugin code instead of the built-in backend, breaking the expected kill-switch/fallback behavior for PI mode.

Useful? React with 👍 / 👎.

@steipete steipete force-pushed the app-server branch 3 times, most recently from 9540884 to 5d4789e Compare April 10, 2026 12:57
Comment thread extensions/codex/src/app-server/client.ts
Comment thread extensions/codex/src/app-server/protocol.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d4789e690

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +45 to +48
.map((harness) => ({
harness,
support: harness.supports({
provider: params.provider,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Isolate failing harness support checks

Wrap harness.supports(...) in a per-harness try/catch so one buggy plugin harness cannot abort selection for the whole run. In auto mode this loop evaluates every registered harness, so if any supports implementation throws (for example due to missing plugin config), selection throws before PI fallback is chosen and the agent run fails entirely instead of degrading to another harness.

Useful? React with 👍 / 👎.

Comment on lines +125 to +129
if (
!isMessagingTool(params.toolName) ||
!isMessagingToolSendAction(params.toolName, params.args)
) {
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Record messaging side effects only on successful sends

collectToolTelemetry marks messaging sends as successful even when the tool call threw, because the messaging branch ignores isError. When a message/sessions_send call fails, this still sets didSendViaMessagingTool and target/text metadata, which can suppress normal assistant output and mislabel replay safety as if a send occurred. Gate this branch on successful execution (the same way cron/media telemetry is gated).

Useful? React with 👍 / 👎.

Comment thread extensions/codex/src/app-server/client.ts
Comment thread extensions/codex/app-server/run-attempt.ts Outdated
Comment thread extensions/codex/src/app-server/dynamic-tools.ts
Comment thread extensions/codex/src/app-server/compact.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c35e17a549

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +80 to +85
} catch (error) {
if (runtime !== "auto") {
throw error;
}
log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error });
return createPiAgentHarness().runAttempt(params);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate PI fallback on replay-safe harness failures

In auto mode this catch block reruns the same turn through PI after any plugin harness exception, but the harness may have already executed mutating tools, approvals, or channel sends before throwing. Replaying immediately through PI can duplicate side effects on the same user input (for example duplicate messages or command executions). The fallback should only happen when the failing harness can prove the turn is replay-safe; otherwise the original error should be surfaced.

Useful? React with 👍 / 👎.

Comment on lines +140 to +146
await projector.handleNotification(notification);
if (
notification.method === "turn/completed" &&
isTurnNotification(notification.params, turnId)
) {
completed = true;
resolveCompletion?.();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve completion even if notification projection fails

Completion is resolved only after projector.handleNotification(...) returns, so if projection throws (for example from a callback invoked inside the projector), a turn/completed notification is effectively dropped and the run waits until timeout/abort even though the turn actually finished. Because the client swallows handler rejections, this turns real completions into spurious timeouts; completion detection should run even when projection fails.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb3f54b909

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +177 to +181
turn = await client.request<CodexTurnStartResponse>("turn/start", {
threadId: thread.threadId,
input: buildUserInput(params),
cwd: effectiveWorkspace,
approvalPolicy: resolveAppServerApprovalPolicy(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cover turn/start with the attempt timeout

timeoutMs does not protect the turn/start RPC: the call at this site is awaited directly, while the run timeout and abort-interrupt wiring are only installed afterward. If the app-server hangs before returning the turn id (for example, server-side deadlock during turn creation), this attempt can block indefinitely and ignore the configured attempt timeout/cancel path, leaving the session stuck until the process exits.

Useful? React with 👍 / 👎.

Comment on lines +451 to +453
const threadId = readString(params, "threadId");
const turnId = readString(params, "turnId");
return (!threadId || threadId === this.threadId) && (!turnId || turnId === this.turnId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope notifications to explicit turn identity

This matcher treats notifications with no threadId/turnId as belonging to the current run. Because runs share a singleton Codex client (getSharedCodexAppServerClient), a global error notification without ids can be accepted by every active projector and set promptError on unrelated turns, causing spurious cross-session failures under concurrent runs.

Useful? React with 👍 / 👎.

Comment thread extensions/codex/src/app-server/event-projector.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92e8c7908d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +156 to +160
close(): void {
this.closed = true;
this.lines.close();
if (!this.child.killed) {
this.child.kill?.();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject pending RPCs when closing the Codex app-server client

close() marks the client as closed before process exit/error handling runs, but pending requests are only rejected in closeWithError(), which early-returns when closed is already true. If close() is called while requests are in flight (for example during startup timeout/reset paths), those Promises can remain unresolved indefinitely, leaking waiters and leaving callers hung instead of receiving a failure.

Useful? React with 👍 / 👎.

Comment on lines +119 to +121
} catch (error) {
clearSharedCodexAppServerClient();
params.abortSignal?.removeEventListener("abort", abortFromUpstream);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep startup failures from tearing down unrelated active runs

This catch block unconditionally clears the shared Codex app-server client for the whole process. Because the client is a singleton used by concurrent attempts, a startup failure in one run will close the transport used by other in-progress runs, causing unrelated sessions to fail or time out even though their own turn is healthy.

Useful? React with 👍 / 👎.

Comment on lines +174 to +176
if (!requestParams) {
return true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require explicit turn identity for approval request routing

This matcher treats approval requests with missing ids as belonging to the current run. With multiple request handlers attached to the shared client, an approval request lacking threadId/turnId can be consumed by the wrong run and resolved against the wrong session context, which is especially risky for command/file approval decisions.

Useful? React with 👍 / 👎.

stream: "codex_app_server.guardian",
data: { method: notification.method },
});
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guardian review count double-increments per review cycle

Medium Severity

guardianReviewCount is incremented for both item/autoApprovalReview/started and item/autoApprovalReview/completed notifications. A single guardian review cycle produces both events, so the count reflects events rather than distinct reviews. This inflated count means didSendDeterministicApprovalPrompt at line 179 evaluates to false instead of undefined even when intended to represent "no reviews happened."

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0cd597daeb31136843bb27fd5022d824ab9a2b0b. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9da2afb. Configure here.

if (!this.child.killed) {
this.child.kill?.();
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client close() leaves pending requests permanently unresolved

High Severity

The close() method sets this.closed = true before killing the child process, but never rejects pending requests. When the child's exit event fires and triggers closeWithError, the guard if (this.closed) { return; } causes it to bail out without rejecting any pending promises. Any in-flight requests (e.g., turn/start, model/list) will hang forever. This can be triggered when clearSharedCodexAppServerClient() is called from run-attempt.ts on startup failure while requests are still pending.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9da2afb. Configure here.

@steipete steipete merged commit 6783bef into main Apr 10, 2026
36 of 37 checks passed
@steipete steipete deleted the app-server branch April 10, 2026 20:22
@steipete
Copy link
Copy Markdown
Contributor Author

Landed via GitHub rebase merge.

Gates:

  • pnpm run lint:tmp:no-raw-channel-fetch
  • pnpm test src/plugins/install.test.ts -t "blocks package installs when a broad vendored tree contains a deeply nested blocked manifest|blocks install when resolved dependencies introduce a denied package|safe-name symlink targets a file under a blocked package directory|symlink targets an allowed scoped package path"
  • pnpm test src/plugin-sdk/browser-subpaths.test.ts
  • pnpm check
  • PR CI green on 9da2afbc64e93670dd8a4315263f26258e66df75 after rerunning an install-smoke Docker Hub IPv6 network flake

Source head: 9da2afb
Landed head: 6783bef

Thanks @steipete!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docker Docker and sandbox tooling docs Improvements or additions to documentation gateway Gateway runtime maintainer Maintainer-authored PR scripts Repository scripts size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant