Refactor agent harness into Codex extension by steipete · Pull Request #64298 · openclaw/openclaw

steipete · 2026-04-10T12:23:40Z

Summary

add the plugin-owned agent harness registry and Codex extension harness
move Codex app-server plumbing out of core-specific surfaces
bridge Codex app-server tools, approvals, compaction, media artifacts, and Docker/live coverage

Tests

pnpm check
pnpm test extensions/codex/app-server/compact.test.ts
pnpm test src/plugins/loader.test.ts -t "clears plugin agent harnesses during activating reloads"
pnpm test src/agents/harness/registry.test.ts
pnpm plugin-sdk:api:check
OPENCLAW_LIVE_CODEX_HARNESS=1 OPENCLAW_LIVE_CODEX_HARNESS_DEBUG=1 OPENCLAW_LIVE_CODEX_HARNESS_MODEL=codex/gpt-5.4 pnpm test:live src/gateway/gateway-codex-harness.live.test.ts
OPENCLAW_LIVE_CODEX_HARNESS_DEBUG=1 OPENCLAW_LIVE_CODEX_HARNESS_MODEL=codex/gpt-5.4 bash scripts/test-live-codex-harness-docker.sh

Notes

Docker live smoke covers text continuity, image attachment, and cron MCP/tool call via the Codex harness.

aisle-research-bot · 2026-04-10T12:23:47Z

🔒 Aisle Security Analysis

We found 8 potential security issue(s) in this PR:

#	Severity	Title
1	🟠 High	Approval requests not bound to current thread/turn in Codex app-server approval bridge
2	🟠 High	Codex app-server runs with approvals disabled by default (approvalPolicy="never")
3	🟠 High	Over-broad permission granting via unvalidated `item/permissions/requestApproval` payload (Codex app-server approval bridge)
4	🟠 High	Untrusted tool-provided media URLs can trigger local file read in webchat audio embedding
5	🟠 High	Codex harness fallback can send prompts to chatgpt.com via PI provider transport
6	🟡 Medium	Unbounded JSON-RPC stdio parsing and unbounded pending requests can cause resource exhaustion
7	🟡 Medium	Sensitive data exposure via verbatim logging of Codex app-server stderr
8	🟡 Medium	Unbounded transcript file read/parsing enables memory/CPU DoS during transcript mirroring

1. 🟠 Approval requests not bound to current thread/turn in Codex app-server approval bridge

Property	Value
Severity	High
CWE	CWE-285
Location	`extensions/codex/src/app-server/approval-bridge.ts:169-187`

Description

handleCodexAppServerApprovalRequest() attempts to ensure an approval request is for the current Codex thread/turn via matchesCurrentTurn(), but the matching logic is fail-open when identifiers are missing.

If requestParams is missing or not a JSON object, matchesCurrentTurn() returns true.
If threadId/turnId are absent from requestParams, the function also returns true.
As a result, an out-of-order/stale approval request (or a malicious/compromised app-server) can submit an approval request without IDs and have it treated as belonging to the current turn, potentially tricking an operator into approving an unrelated command/file/permission action.

Vulnerable code:

function matchesCurrentTurn(requestParams: JsonObject | undefined, threadId: string, turnId: string): boolean {
  if (!requestParams) {
    return true;
  }
  const requestThreadId = readString(requestParams, "threadId") ?? readString(requestParams, "conversationId");
  const requestTurnId = readString(requestParams, "turnId");
  if (requestThreadId && requestThreadId !== threadId) return false;
  if (requestTurnId && requestTurnId !== turnId) return false;
  return true;
}

Recommendation

Fail closed and require turn binding for approval requests.

Suggested approach:

Require requestParams to be a JSON object.
Require both threadId (or conversationId) and turnId to be present and match the active turn.
If the protocol legitimately omits these fields for some methods, consider:
- rejecting those methods entirely, or
- requiring an alternative unforgeable binding (e.g., a per-turn nonce established when turn/start returns).

Example fix:

function matchesCurrentTurn(
  requestParams: JsonObject | undefined,
  threadId: string,
  turnId: string,
): boolean {
  if (!requestParams) return false;

  const requestThreadId =
    readString(requestParams, "threadId") ?? readString(requestParams, "conversationId");
  const requestTurnId = readString(requestParams, "turnId");

  if (!requestThreadId || !requestTurnId) return false;
  return requestThreadId === threadId && requestTurnId === turnId;
}

This ensures approvals cannot be processed unless they are explicitly associated with the active thread/turn.

2. 🟠 Codex app-server runs with approvals disabled by default (approvalPolicy="never")

Property	Value
Severity	High
CWE	CWE-862
Location	`extensions/codex/src/app-server/run-attempt.ts:521-185`

Description

runCodexAppServerAttempt() starts Codex app-server threads/turns with an approval policy resolved from environment, but defaults to "never" when unset.

The approvalPolicy value is sent to Codex app-server in both thread/start and turn/start.
When set to "never", Codex app-server may proceed with native command execution / file change / permission requests without ever issuing */requestApproval RPCs, meaning OpenClaw’s handleCodexAppServerApprovalRequest() path is bypassed.
This creates a risky insecure-by-default posture: simply enabling/auto-selecting the Codex app-server harness can result in actions occurring without user approval prompts.

Vulnerable code (defaulting to no approvals):

function resolveAppServerApprovalPolicy(): "never" | "on-request" | "on-failure" | "untrusted" {
  const raw = process.env.OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY?.trim();
  if (raw === "on-request" || raw === "on-failure" || raw === "untrusted") {
    return raw;
  }
  return "never";
}

And it is used here:

turn = await client.request("turn/start", {
  ...,
  approvalPolicy: resolveAppServerApprovalPolicy(),
});

Recommendation

Make the default approval policy safe-by-default (typically "on-request"), and require an explicit opt-out to disable approvals.

Suggested change:

function resolveAppServerApprovalPolicy(): "never" | "on-request" | "on-failure" | "untrusted" {
  const raw = process.env.OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY?.trim();
  if (raw === "never" || raw === "on-request" || raw === "on-failure" || raw === "untrusted") {
    return raw;
  }
  // Safe default
  return "on-request";
}

Additionally:

Consider refusing to start the Codex app-server harness unless approvals are enabled, unless a clearly-named “I understand the risk” configuration flag is set.
Add logging/telemetry when approvals are disabled, to make the security posture explicit.

3. 🟠 Over-broad permission granting via unvalidated `item/permissions/requestApproval` payload (Codex app-server approval bridge)

Property	Value
Severity	High
CWE	CWE-269
Location	`extensions/codex/src/app-server/approval-bridge.ts:155-160`

Description

buildApprovalResponse() grants permissions for item/permissions/requestApproval by directly reflecting requestParams.permissions.network and requestParams.permissions.fileSystem back to the app-server without any schema validation or allowlisting.

Input: requestParams is app-server-provided JSON (untrusted boundary)
Logic flaw: requestedPermissions() only checks that network / fileSystem are JSON objects, then copies them verbatim
Result: if the user approves the request (potentially based on an approval prompt that does not enumerate requested permissions), the app-server can obtain arbitrary network / filesystem scopes encoded in those objects.

Vulnerable code:

if (method === "item/permissions/requestApproval") {
  if (outcome === "approved-session" || outcome === "approved-once") {
    return {
      permissions: requestedPermissions(requestParams),
      scope: outcome === "approved-session" ? "session" : "turn",
    };
  }
  return { permissions: {}, scope: "turn" };
}

function requestedPermissions(requestParams: JsonObject | undefined): JsonObject {
  const permissions = isJsonObject(requestParams?.permissions) ? requestParams.permissions : {};
  const granted: JsonObject = {};
  if (isJsonObject(permissions.network)) {
    granted.network = permissions.network;
  }
  if (isJsonObject(permissions.fileSystem)) {
    granted.fileSystem = permissions.fileSystem;
  }
  return granted;
}

Additionally, the approval prompt context built in buildApprovalContext() does not include the requested permission details (it only uses command, reason, or method name), which increases the risk of a user approving overly-broad permissions without understanding the scope.

Recommendation

Treat the app-server permission request as untrusted and only grant permissions that pass a strict schema + policy.

Validate and normalize the requested permission objects (deep validation, types, bounds):
- Define explicit shapes for network (e.g., allowlisted hosts, ports, protocols) and fileSystem (e.g., explicit paths, read/write flags).
- Reject unknown keys and excessively broad wildcards.
Enforce an allowlist / policy at the bridge (or gateway) level so even an approved request cannot exceed configured maximums.
Render full requested scopes in the approval UI prompt (include normalized permissions in description), so approvals are informed.

Example (sketch):

import { z } from "zod";

const PermissionRequestSchema = z.object({
  permissions: z.object({
    network: z
      .object({ allowedHosts: z.array(z.string()).max(100) })
      .strict()
      .optional(),
    fileSystem: z
      .object({ allowedPaths: z.array(z.string()).max(100), mode: z.enum(["read","write","readwrite"]) })
      .strict()
      .optional(),
  }).strict(),
}).strict();

function requestedPermissions(requestParams: JsonObject | undefined): JsonObject {
  const parsed = PermissionRequestSchema.safeParse(requestParams);
  if (!parsed.success) return {};
  return parsed.data.permissions;
}

If permissive behavior is required for compatibility, at minimum log/deny requests that contain unknown keys or wildcards, and show the full permission JSON in the approval description.

4. 🟠 Untrusted tool-provided media URLs can trigger local file read in webchat audio embedding

Property	Value
Severity	High
CWE	CWE-73
Location	`src/agents/pi-embedded-runner/run/tool-media-payloads.ts:7-31`

Description

mergeAttemptToolMediaPayloads() merges attempt.toolMediaUrls (collected from tool results) into outbound reply payloads without validation. These payload mediaUrls are later consumed by the webchat path (buildWebchatAudioContentBlocksFromReplyPayloads() in src/gateway/server-methods/chat-webchat-media.ts), which will treat absolute paths / file: URLs as local files and fs.readFileSync() them when they look like audio.

Impact:

A malicious/compromised tool can return mediaUrl(s) like file:/.../secret.mp3 or /.../secret.mp3.
The runner will propagate these into outbound payloads.
Webchat will read and base64-embed the referenced local file into the assistant message content, disclosing server-local files to the client.

Vulnerable code (propagation):

const mediaUrls = Array.from(
  new Set(params.toolMediaUrls?.map((url) => url.trim()).filter(Boolean) ?? []),
);
...
const mergedMediaUrls = Array.from(new Set([...(payload.mediaUrls ?? []), ...mediaUrls]));
...
mediaUrls: mergedMediaUrls.length ? mergedMediaUrls : undefined,
mediaUrl: payload.mediaUrl ?? mergedMediaUrls[0],

Downstream sink (for context, not in this diff): src/gateway/server-methods/chat-webchat-media.ts reads absolute paths / file: URLs and calls fs.readFileSync() for audio.

Recommendation

Treat tool-returned media URLs as untrusted input and enforce a strict allowlist before merging them into outbound payloads.

Suggested fixes (choose one or combine):

Allow only safe URL schemes (e.g. https: and optionally data:) and explicitly reject file: and absolute/relative paths:

function isSafeOutboundMediaUrl(raw: string): boolean {
  const u = raw.trim();
  if (!u) return false;
  if (u.startsWith("file:")) return false;
  if (u.startsWith("/")) return false; // blocks absolute paths
  if (/^[a-zA-Z]:\\/.test(u)) return false; // blocks Windows paths
  try {
    const parsed = new URL(u);
    return parsed.protocol === "https:"; // or allow "http:" only if you have SSRF protections
  } catch {
    return false;
  }
}

const mediaUrls = Array.from(new Set(
  (params.toolMediaUrls ?? []).filter(isSafeOutboundMediaUrl)
));

If local media is required, gate it behind an explicit capability/allowlist (e.g. only allow files within a dedicated media directory) and validate path traversal with path.resolve().
Add a second line of defense at the sink (buildWebchatAudioContentBlocksFromReplyPayloads) to require an allowlisted directory or a signed token rather than arbitrary paths/file: URLs.

5. 🟠 Codex harness fallback can send prompts to chatgpt.com via PI provider transport

Property	Value
Severity	High
CWE	CWE-201
Location	`src/agents/harness/selection.ts:129-137`

Description

When running with OPENCLAW_AGENT_RUNTIME=auto (default) and PI fallback enabled, failures in the Codex agent harness can fall back to the embedded PI backend. Because the Codex provider advertises an external baseUrl (https://chatgpt.com/backend-api) and uses the openai-codex-responses transport, this fallback path can cause the PI transport to make outbound HTTPS requests to chatgpt.com and include the full prompt/messages in the request.

Key points:

The Codex provider config sets baseUrl to an external service (chatgpt.com/backend-api).
runAgentHarnessAttemptWithFallback() falls back to PI on any harness error when runtime is auto.
PI's OpenAI Responses transport uses model.baseUrl for the OpenAI client base URL, so it will send the request body (prompt/messages) to that host.
The provider also defines resolveSyntheticAuth() returning a non-secret placeholder token; if this (or some other resolved token) is used in the fallback request, it may create confusing/unsafe authentication behavior, but more importantly the request still contains sensitive prompt content.

Vulnerable code (fallback to PI on harness failure):

try {
  return await harness.runAttempt(params);
} catch (error) {
  if (policy.runtime !== "auto" || policy.fallback === "none") {
    throw error;
  }
  log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error });
  return createPiAgentHarness().runAttempt(params);
}

This can result in unintended external network calls and potential disclosure of user prompts, tool outputs, or other sensitive session data to chatgpt.com.

Recommendation

Prevent Codex runs from ever using the generic PI provider transport unless explicitly opted-in.

Options:

Disable PI fallback when provider is Codex (recommended default):

// before falling back
if (params.provider.trim().toLowerCase() === "codex") {
  throw error; // or require fallback:"none" for codex
}

Introduce a provider-level guard in the PI embedded runner to refuse using openai-codex-responses (or baseUrl containing chatgpt.com) unless an explicit allowlist flag is enabled.
Change Codex provider catalog to not expose an external baseUrl/transport when it is intended to be harness-only (e.g., mark as harness-only, or set a dummy baseUrl and ensure transport cannot run).

Also consider setting the default embedded harness policy for Codex agents to fallback: "none" so harness errors fail closed instead of falling back to an outbound network path.

6. 🟡 Unbounded JSON-RPC stdio parsing and unbounded pending requests can cause resource exhaustion

Property	Value
Severity	Medium
CWE	CWE-400
Location	`extensions/codex/src/app-server/client.ts:64-146`

Description

CodexAppServerClient spawns an app-server and processes newline-delimited JSON-RPC messages from stdout using readline.

There are no limits or backpressure safeguards:

Unbounded message size / line length: readline.createInterface({ input: child.stdout }) will buffer until a newline is seen. A malicious or buggy app-server can emit an extremely long line (or never emit a newline), causing large memory usage and high CPU during line.trim() and JSON.parse().
Unbounded in-flight requests: request() inserts an entry into pending but there is no per-request timeout and no cap on pending size. If the app-server stops responding (or responses are withheld), the pending map can grow without bound (e.g., repeated turn/steer calls), leading to memory growth.

Vulnerable code:

this.lines = createInterface({ input: child.stdout });
this.lines.on("line", (line) => this.handleLine(line));
...
parsed = JSON.parse(trimmed);
...
this.pending.set(id, { ... });

Because the app-server binary and args can be overridden via environment variables (OPENCLAW_CODEX_APP_SERVER_BIN, OPENCLAW_CODEX_APP_SERVER_ARGS), this is reachable in deployments where environment/config is not fully trusted, and it is also a robustness issue even when trusted (buggy server).

Recommendation

Add explicit resource limits and timeouts at the transport/client layer:

Enforce a maximum message size (and optionally a maximum JSON nesting depth):

Replace readline with a small framing parser that buffers data up to MAX_MESSAGE_BYTES and rejects/terminates the child if exceeded.
Alternatively, wrap child.stdout with a Transform that enforces a maximum line length.

Add per-request timeouts and/or a maximum pending cap:

Allow request(method, params, { timeoutMs }) and reject (and delete) the pending entry on timeout.
Consider a global cap such as MAX_PENDING_REQUESTS; if exceeded, reject new requests or close the client.

Example (sketch):

request<T>(method: string, params?: JsonValue, opts: { timeoutMs?: number } = {}): Promise<T> {
  const id = this.nextId++;
  return new Promise<T>((resolve, reject) => {
    const timeout = opts.timeoutMs
      ? setTimeout(() => {
          this.pending.delete(id);
          reject(new Error(`${method} timed out`));
        }, opts.timeoutMs)
      : undefined;

    this.pending.set(id, {
      method,
      resolve: (v) => { if (timeout) clearTimeout(timeout); resolve(v as T); },
      reject: (e) => { if (timeout) clearTimeout(timeout); reject(e); },
    });

    this.writeMessage({ id, method, params });
  });
}

Consider rate-limiting / truncating stderr logging to avoid log flooding.

These changes ensure a misbehaving app-server cannot force unbounded memory/CPU growth in the host process.

7. 🟡 Sensitive data exposure via verbatim logging of Codex app-server stderr

Property	Value
Severity	Medium
CWE	CWE-532
Location	`extensions/codex/src/app-server/client.ts:78-83`

Description

The Codex app-server child process stderr stream is logged verbatim at debug level. stderr output from external processes commonly contains sensitive information (API tokens, auth headers, file paths, user prompts, command lines, stack traces).

Input/source: chunk data from child.stderr (external process output)
Sink: embeddedAgentLog.debug(...) without redaction or length limiting
If debug logging is enabled (e.g., via config/env), these values may be written to console and/or persisted to log files, causing information disclosure.

Vulnerable code:

child.stderr.on("data", (chunk: Buffer | string) => {
  const text = chunk.toString("utf8").trim();
  if (text) {
    embeddedAgentLog.debug(`codex app-server stderr: ${text}`);
  }
});

Recommendation

Avoid logging external process stderr verbatim, or ensure it is sanitized and bounded.

Options:

Log only a short, redacted summary (and/or only on non-zero exit).
Apply a secret redaction function before logging.
Add a configuration flag to enable raw stderr logging only for local debugging.

Example (bounded + redaction):

import { redact } from "../../../logging/redact.js";

child.stderr.on("data", (chunk) => {
  const raw = chunk.toString("utf8");
  const bounded = raw.slice(0, 2000); // prevent huge log lines
  const safe = redact(bounded).trim();
  if (safe) {
    embeddedAgentLog.debug("codex app-server stderr (redacted): " + safe);
  }
});

Also consider logging to trace and/or gating with an explicit env var like OPENCLAW_LOG_CODEX_STDERR=1.

8. 🟡 Unbounded transcript file read/parsing enables memory/CPU DoS during transcript mirroring

Property	Value
Severity	Medium
CWE	CWE-400
Location	`extensions/codex/src/app-server/transcript-mirror.ts:65-80`

Description

mirrorCodexAppServerTranscript() calls readTranscriptIdempotencyKeys() which reads the entire session transcript JSONL file into memory and parses it line-by-line on every mirror attempt.

Input/control: sessionFile is provided via EmbeddedRunAttemptParams.sessionFile (passed through from run orchestration), and its contents are whatever is on disk.
Issue: fs.readFile(sessionFile, "utf8") loads the whole file, then raw.split(/\r?\n/) duplicates memory and iterates JSON.parse for each line.
Impact: A very large transcript (or a transcript intentionally bloated with many lines / large lines) can cause high memory usage and significant CPU time each time transcript mirroring runs, potentially leading to process slowdown or out-of-memory termination.

Vulnerable code:

raw = await fs.readFile(sessionFile, "utf8");
for (const line of raw.split(/\r?\n/)) {
  ...
  const parsed = JSON.parse(line)
  ...
}

Recommendation

Avoid loading and splitting the entire transcript file on every run.

Options:

Stream the JSONL file line-by-line using fs.createReadStream + readline to keep memory bounded.
Add a maximum file size / maximum lines processed for idempotency scanning (e.g., only scan the last N MB/lines).
Persist idempotency keys separately (sidecar index file) so you don't need to re-scan the full transcript.

Example (streaming):

import fs from "node:fs";
import readline from "node:readline";

async function readTranscriptIdempotencyKeys(sessionFile: string): Promise<Set<string>> {
  const keys = new Set<string>();
  try {
    const stream = fs.createReadStream(sessionFile, { encoding: "utf8" });
    const rl = readline.createInterface({ input: stream, crlfDelay: Infinity });
    for await (const line of rl) {
      if (!line.trim()) continue;
      try {
        const parsed = JSON.parse(line) as { message?: { idempotencyKey?: unknown } };
        if (typeof parsed.message?.idempotencyKey === "string") keys.add(parsed.message.idempotencyKey);
      } catch {
        // ignore malformed lines
      }
    }
  } catch (e: any) {
    if (e?.code !== "ENOENT") throw e;
  }
  return keys;
}

Analyzed PR: #64298 at commit 9da2afb

_{Last updated on: 2026-04-10T20:04:06Z}

cursor · 2026-04-10T12:23:49Z

PR Summary

High Risk
Adds a new low-level agent execution path (Codex app-server harness) including tool routing, approvals, session/thread binding, and compaction, which can affect core agent behavior and safety controls. Also introduces new provider/model discovery and protocol/version gating that may fail in production if the external Codex CLI/app-server differs from assumptions.

Overview
Adds a bundled Codex runtime as a plugin-owned agent harness + provider. New extensions/codex registers api.registerAgentHarness(...) and api.registerProvider(...) so codex/gpt-* model refs use a Codex-managed app-server for auth/session threads, model discovery, tool calls, and native compaction, while openai/gpt-* remains on the existing OpenAI path.

Implements the Codex app-server bridge and safety plumbing. The extension adds a JSON-RPC client with version gating, dynamic tool bridging (including media artifact telemetry), approval bridging via gateway approval tools (fail-closed), per-session thread binding persistence, transcript mirroring with idempotency/locking, and compaction completion waiting.

SDK/docs/test plumbing updates. Exposes a new openclaw/plugin-sdk/agent-harness export, updates docs to describe agent harness plugins and runEmbeddedAgent(...) naming, adds live/docker smoke coverage and a protocol-drift check script, and includes small config/labeler and SSRF-policy compatibility tweaks.

^{Reviewed by Cursor Bugbot for commit 9da2afb. Bugbot is set up for automated code reviews on this repo. Configure here.}

greptile-apps · 2026-04-10T12:29:42Z

Greptile Summary

This PR refactors the agent harness into a plugin-owned extension by introducing a new AgentHarness registry (src/agents/harness/), wiring it into the plugin loader lifecycle (save/restore/clear on activating and snapshot reloads), and moving all Codex app-server plumbing into a bundled extensions/codex package. A new openclaw/plugin-sdk/agent-harness subpath exposes the required types and helpers for the Codex extension without direct src/** imports.

The implementation follows established patterns for other pluggable registries (compaction providers, memory embedding providers) and correctly handles cache snapshots, non-activating loads, and error rollback.

Confidence Score: 5/5

Safe to merge; the harness registry, selection, plugin loader integration, and Codex extension are all well-structured and consistent with existing patterns.

The only finding is a P2 style observation: clearAgentHarnesses and restoreRegisteredAgentHarnesses do not call dispose() on evicted entries. No current harness implements dispose(), so there is no practical impact today. All other aspects — registry lifecycle, cache save/restore, non-activating load rollback, activating reload clear, agentHarnessIds tracking, harness selection priority, and Pi fallback — are correctly implemented and tested.

src/agents/harness/registry.ts — dispose() hook is declared on AgentHarness but never invoked during clear/restore cycles.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/harness/registry.ts
Line: 55-63

Comment:
**`dispose()` not called on cleared harnesses**

Both `clearAgentHarnesses` and `restoreRegisteredAgentHarnesses` silently drop existing harness entries without invoking their optional `dispose()` hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the `AgentHarness` type explicitly declares `dispose?()` it is more likely to be implemented.

`restoreRegisteredAgentHarnesses` has the same pattern and would benefit from calling `dispose()` on evicted entries before the `map.clear()`. Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "test: expand Codex harness Docker live s..." | Re-trigger Greptile}

greptile-apps · 2026-04-10T12:29:46Z

+
+export function restoreRegisteredAgentHarnesses(entries: RegisteredAgentHarness[]): void {
+  const map = getAgentHarnessRegistryState().harnesses;
+  map.clear();
+  for (const entry of entries) {
+    map.set(entry.harness.id, entry);
+  }
+}
+


dispose() not called on cleared harnesses

Both clearAgentHarnesses and restoreRegisteredAgentHarnesses silently drop existing harness entries without invoking their optional dispose() hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the AgentHarness type explicitly declares dispose?() it is more likely to be implemented.

restoreRegisteredAgentHarnesses has the same pattern and would benefit from calling dispose() on evicted entries before the map.clear(). Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/harness/registry.ts Line: 55-63 Comment: **`dispose()` not called on cleared harnesses** Both `clearAgentHarnesses` and `restoreRegisteredAgentHarnesses` silently drop existing harness entries without invoking their optional `dispose()` hook. Any future harness that holds open connections, timers, or other resources would leak them on plugin reload. The equivalent compaction-provider and memory-embedding-provider registries share this gap, but since the `AgentHarness` type explicitly declares `dispose?()` it is more likely to be implemented. `restoreRegisteredAgentHarnesses` has the same pattern and would benefit from calling `dispose()` on evicted entries before the `map.clear()`. Note that snapshot-load rollback paths must not dispose harnesses that are about to be restored, so the fix requires distinguishing evicted-permanently entries from temporarily-swapped ones. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bad1277f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T12:32:42Z

+      replayMetadata: {
+        hadPotentialSideEffects: toolTelemetry.didSendViaMessagingTool,
+        replaySafe: !toolTelemetry.didSendViaMessagingTool,


Derive replay safety from all mutating tool side effects

replayMetadata is currently marked unsafe only when a messaging tool sends, but this harness can also mutate state via command/file-change tool items and cron adds. Because incomplete-turn handling and retry decisions rely on hadPotentialSideEffects, a Codex run that executes bash/apply_patch/cron without messaging can be mislabeled replaySafe, which can trigger unsafe retries or omit side-effect warnings after partial/failed turns.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T12:32:42Z

+    const normalizedHarness = {
+      ...harness,
+      id,
+      pluginId: harness.pluginId ?? record.id,
+    };


Reserve the built-in pi harness id during registration

The new registration path accepts any non-empty harness id and immediately stores it, so a plugin can register id: "pi". Since harness selection checks registered harnesses before appending the built-in PI harness, forcing OPENCLAW_AGENT_RUNTIME=pi can resolve to plugin code instead of the built-in backend, breaking the expected kill-switch/fallback behavior for PI mode.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d4789e690

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T13:05:48Z

+    .map((harness) => ({
+      harness,
+      support: harness.supports({
+        provider: params.provider,


Isolate failing harness support checks

Wrap harness.supports(...) in a per-harness try/catch so one buggy plugin harness cannot abort selection for the whole run. In auto mode this loop evaluates every registered harness, so if any supports implementation throws (for example due to missing plugin config), selection throws before PI fallback is chosen and the agent run fails entirely instead of degrading to another harness.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T13:05:48Z

+  if (
+    !isMessagingTool(params.toolName) ||
+    !isMessagingToolSendAction(params.toolName, params.args)
+  ) {
+    return;


Record messaging side effects only on successful sends

collectToolTelemetry marks messaging sends as successful even when the tool call threw, because the messaging branch ignores isError. When a message/sessions_send call fails, this still sets didSendViaMessagingTool and target/text metadata, which can suppress normal assistant output and mislabel replay safety as if a send occurred. Gate this branch on successful execution (the same way cron/media telemetry is gated).

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c35e17a549

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T16:33:29Z

+  } catch (error) {
+    if (runtime !== "auto") {
+      throw error;
+    }
+    log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error });
+    return createPiAgentHarness().runAttempt(params);


Gate PI fallback on replay-safe harness failures

In auto mode this catch block reruns the same turn through PI after any plugin harness exception, but the harness may have already executed mutating tools, approvals, or channel sends before throwing. Replaying immediately through PI can duplicate side effects on the same user input (for example duplicate messages or command executions). The fallback should only happen when the failing harness can prove the turn is replay-safe; otherwise the original error should be surfaced.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T16:33:29Z

+    await projector.handleNotification(notification);
+    if (
+      notification.method === "turn/completed" &&
+      isTurnNotification(notification.params, turnId)
+    ) {
+      completed = true;
+      resolveCompletion?.();


Resolve completion even if notification projection fails

Completion is resolved only after projector.handleNotification(...) returns, so if projection throws (for example from a callback invoked inside the projector), a turn/completed notification is effectively dropped and the run waits until timeout/abort even though the turn actually finished. Because the client swallows handler rejections, this turns real completions into spurious timeouts; completion detection should run even when projection fails.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb3f54b909

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T17:02:43Z

+    turn = await client.request<CodexTurnStartResponse>("turn/start", {
+      threadId: thread.threadId,
+      input: buildUserInput(params),
+      cwd: effectiveWorkspace,
+      approvalPolicy: resolveAppServerApprovalPolicy(),


Cover turn/start with the attempt timeout

timeoutMs does not protect the turn/start RPC: the call at this site is awaited directly, while the run timeout and abort-interrupt wiring are only installed afterward. If the app-server hangs before returning the turn id (for example, server-side deadlock during turn creation), this attempt can block indefinitely and ignore the configured attempt timeout/cancel path, leaving the session stuck until the process exits.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T17:02:43Z

+    const threadId = readString(params, "threadId");
+    const turnId = readString(params, "turnId");
+    return (!threadId || threadId === this.threadId) && (!turnId || turnId === this.turnId);


Scope notifications to explicit turn identity

This matcher treats notifications with no threadId/turnId as belonging to the current run. Because runs share a singleton Codex client (getSharedCodexAppServerClient), a global error notification without ids can be accepted by every active projector and set promptError on unrelated turns, causing spurious cross-session failures under concurrent runs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92e8c7908d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T17:17:37Z

+  close(): void {
+    this.closed = true;
+    this.lines.close();
+    if (!this.child.killed) {
+      this.child.kill?.();


Reject pending RPCs when closing the Codex app-server client

close() marks the client as closed before process exit/error handling runs, but pending requests are only rejected in closeWithError(), which early-returns when closed is already true. If close() is called while requests are in flight (for example during startup timeout/reset paths), those Promises can remain unresolved indefinitely, leaking waiters and leaving callers hung instead of receiving a failure.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T17:17:37Z

+  } catch (error) {
+    clearSharedCodexAppServerClient();
+    params.abortSignal?.removeEventListener("abort", abortFromUpstream);


Keep startup failures from tearing down unrelated active runs

This catch block unconditionally clears the shared Codex app-server client for the whole process. Because the client is a singleton used by concurrent attempts, a startup failure in one run will close the transport used by other in-progress runs, causing unrelated sessions to fail or time out even though their own turn is healthy.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T17:17:37Z

+  if (!requestParams) {
+    return true;
+  }


Require explicit turn identity for approval request routing

This matcher treats approval requests with missing ids as belonging to the current run. With multiple request handlers attached to the shared client, an approval request lacking threadId/turnId can be consumed by the wrong run and resolved against the wrong session context, which is especially risky for command/file approval decisions.

Useful? React with 👍 / 👎.

cursor · 2026-04-10T17:28:15Z

+          stream: "codex_app_server.guardian",
+          data: { method: notification.method },
+        });
+        break;


Guardian review count double-increments per review cycle

Medium Severity

guardianReviewCount is incremented for both item/autoApprovalReview/started and item/autoApprovalReview/completed notifications. A single guardian review cycle produces both events, so the count reflects events rather than distinct reviews. This inflated count means didSendDeterministicApprovalPrompt at line 179 evaluates to false instead of undefined even when intended to represent "no reviews happened."

^{Reviewed by Cursor Bugbot for commit 0cd597daeb31136843bb27fd5022d824ab9a2b0b. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9da2afb. Configure here.}

cursor · 2026-04-10T20:06:11Z

+    if (!this.child.killed) {
+      this.child.kill?.();
+    }
+  }


Client close() leaves pending requests permanently unresolved

High Severity

The close() method sets this.closed = true before killing the child process, but never rejects pending requests. When the child's exit event fires and triggers closeWithError, the guard if (this.closed) { return; } causes it to bail out without rejecting any pending promises. Any in-flight requests (e.g., turn/start, model/list) will hang forever. This can be triggered when clearSharedCodexAppServerClient() is called from run-attempt.ts on startup failure while requests are still pending.

Additional Locations (1)

extensions/codex/src/app-server/client.ts#L253-L264

^{Reviewed by Cursor Bugbot for commit 9da2afb. Configure here.}

steipete · 2026-04-10T20:22:44Z

Landed via GitHub rebase merge.

Gates:

pnpm run lint:tmp:no-raw-channel-fetch
pnpm test src/plugins/install.test.ts -t "blocks package installs when a broad vendored tree contains a deeply nested blocked manifest|blocks install when resolved dependencies introduce a denied package|safe-name symlink targets a file under a blocked package directory|symlink targets an allowed scoped package path"
pnpm test src/plugin-sdk/browser-subpaths.test.ts
pnpm check
PR CI green on 9da2afbc64e93670dd8a4315263f26258e66df75 after rerunning an install-smoke Docker Hub IPv6 network flake

Source head: 9da2afb
Landed head: 6783bef

Thanks @steipete!

openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime scripts Repository scripts docker Docker and sandbox tooling agents Agent runtime and tooling size: XL maintainer Maintainer-authored PR labels Apr 10, 2026

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread extensions/codex/src/app-server/run-attempt.ts

greptile-apps Bot reviewed Apr 10, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

steipete force-pushed the app-server branch 3 times, most recently from 9540884 to 5d4789e Compare April 10, 2026 12:57

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread extensions/codex/src/app-server/client.ts

Comment thread extensions/codex/src/app-server/protocol.ts

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

steipete force-pushed the app-server branch from 5d4789e to ad6869e Compare April 10, 2026 14:50

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread extensions/codex/src/app-server/client.ts

Comment thread extensions/codex/app-server/run-attempt.ts Outdated

Comment thread extensions/codex/src/app-server/dynamic-tools.ts

steipete force-pushed the app-server branch from ad6869e to a0e9168 Compare April 10, 2026 16:18

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread extensions/codex/src/app-server/compact.ts

steipete force-pushed the app-server branch from a0e9168 to c35e17a Compare April 10, 2026 16:22

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

steipete force-pushed the app-server branch from c35e17a to cb3f54b Compare April 10, 2026 16:55

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

steipete force-pushed the app-server branch from cb3f54b to 92e8c79 Compare April 10, 2026 17:06

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread extensions/codex/src/app-server/event-projector.ts

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

cursor Bot reviewed Apr 10, 2026

View reviewed changes

steipete added 17 commits April 10, 2026 21:01

fix: address Codex harness review issues

6a5e536

fix: address Codex harness review regressions

466525d

fix: keep Codex harness opt-in by default

9fa674e

refactor: drop legacy Codex approval support

7300dff

fix(codex): require supported app-server version

70333df

feat(agents): allow disabling PI harness fallback

cb4364c

docs: note Codex harness PR in changelog

4a18a9b

fix(codex): keep app-server inside extension src

ec848ed

fix: stabilize Codex harness landing checks

ef420af

fix: preserve scoped plugin symlink installs

ba5fb18

build: refresh Codex harness lockfile

3ef2b3b

fix(browser): keep legacy ssrf alias raw-config only

eaf99d4

build: refresh pi-ai lockfile snapshot

32157b7

fix: close landing test gaps

1c86dbe

fix: prefer manifest evidence in install scanner

d99e6c0

fix(codex): satisfy approval bridge lint

a3fe075

ci: refresh browser raw fetch guard

9da2afb

steipete force-pushed the app-server branch from 507547d to 9da2afb Compare April 10, 2026 20:01

cursor Bot reviewed Apr 10, 2026

View reviewed changes

steipete merged commit 6783bef into main Apr 10, 2026
36 of 37 checks passed

steipete deleted the app-server branch April 10, 2026 20:22

jankutschera mentioned this pull request Apr 11, 2026

[Bug]: codex/gpt-5.4 via bundled Codex harness hangs indefinitely on initialize in v2026.4.10 (macOS, codex-cli 0.118.0 and 0.120.0) #64744

Closed

joshuaswarren mentioned this pull request Apr 11, 2026

Bundled Codex provider compatibility: correct buffering under Codex threads and compaction joshuaswarren/remnic#390

Closed

8 tasks

mjamiv mentioned this pull request Apr 11, 2026

[Bug]: Codex app-server register.runtime JSON parser crashes on non-JSON stderr from NODE_OPTIONS --require preloads (2026.4.11-beta.1) #64923

Closed

ttomiczek mentioned this pull request Apr 11, 2026

Minimax reasoning level rejected, forces OpenRouter fallback every request #64922

Closed

This was referenced Apr 12, 2026

🦞 OpenClaw 生态日报 2026-04-12 gsscsd/big_model_radar#173

Open

Update ghcr.io/openclaw/openclaw Docker tag to v2026.5.7 claytono/infra#1977

Open

hoyyeva mentioned this pull request Apr 13, 2026

fix: include apiKey in codex provider catalog to unblock models.json loading #66180

Merged

clawsweeper Bot mentioned this pull request Apr 30, 2026

[Bug]: Discrepancy in models.json: bundled plugins bypass the onboard lifecycle to inject models into the manifest #67097

Open

Uh oh!

Conversation

steipete commented Apr 10, 2026

Summary

Tests

Notes

Uh oh!

aisle-research-bot Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Description

Recommendation

Description

Recommendation

Description

Recommendation

Description

Recommendation

Description

Recommendation

Description

Recommendation

Uh oh!

cursor Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 10, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

aisle-research-bot Bot commented Apr 10, 2026 •

edited

Loading

cursor Bot commented Apr 10, 2026 •

edited

Loading

Client `close()` leaves pending requests permanently unresolved