feat(codex): run context-engine lifecycle in app-server harness#70809
Conversation
🔒 Aisle Security AnalysisWe found 5 potential security issue(s) in this PR:
1. 🟠 Context-engine `systemPromptAddition` is appended to Codex developer instructions (prompt-injection privilege escalation)
DescriptionThe Codex harness projects
This enables instruction-confusion / prompt-injection style privilege escalation: an attacker who can influence context-engine outputs (directly via untrusted engine selection, or indirectly via a trusted engine that incorporates user content) can inject directives that override safety policies, tool-use constraints, or operational guidance. Vulnerable code: const projection = projectContextEngineAssemblyForCodex({
assembledMessages: assembled.messages,
originalHistoryMessages: historyMessages,
prompt: params.prompt,
systemPromptAddition: assembled.systemPromptAddition,
});
...
developerInstructions = joinPresentSections(
baseDeveloperInstructions,
projection.developerInstructionAddition,
);And the projection promotes it: ...(params.systemPromptAddition?.trim()
? { developerInstructionAddition: params.systemPromptAddition.trim() }
: {}),RecommendationEstablish and enforce a trust boundary for Options (pick one consistent policy):
Example mitigation (allowlist by engine id): const TRUSTED_SYSTEM_PROMPT_ENGINES = new Set(["legacy", "built_in_memory"]);
const trustedAddition =
activeContextEngine && TRUSTED_SYSTEM_PROMPT_ENGINES.has(activeContextEngine.info.id)
? assembled.systemPromptAddition
: undefined;
const projection = projectContextEngineAssemblyForCodex({
assembledMessages: assembled.messages,
originalHistoryMessages: historyMessages,
prompt: params.prompt,
systemPromptAddition: trustedAddition,
});If you must support third-party engines, consider adding a separate field like 2. 🟠 Tool result text blocks can leak sensitive tool output into Codex prompt context
Description
However, elsewhere in the codebase tool results are represented as As a result:
Vulnerable code (redaction only happens per-part, not per-message role): if (typeof message.content === "string") {
return truncateText(message.content.trim(), MAX_TEXT_PART_CHARS);
}
...
return message.content
.map((part: unknown) => renderMessagePart(part))RecommendationRedact tool payloads based on message role as well as per-part type. For example, treat any function renderMessageBody(message: AgentMessage): string {
// Omit all tool results by role to prevent schema-dependent leaks
if (message.role === "toolResult" || message.role === "tool") {
return "[tool result omitted]";
}
if (!hasMessageContent(message)) return "";
if (typeof message.content === "string") {
return truncateText(message.content.trim(), MAX_TEXT_PART_CHARS);
}
if (!Array.isArray(message.content)) return "[non-text content omitted]";
return message.content
.map((part) => renderMessagePart(part))
.filter((value): value is string => value.length > 0)
.join("\n")
.trim();
}Additionally:
3. 🟡 Potential DoS via unbounded context rendering before truncation in Codex context projection
Description
Vulnerable code path: const renderedContext = renderMessagesForCodexContext(contextMessages);
...
truncateText(renderedContext, MAX_RENDERED_CONTEXT_CHARS)Even though each text part is truncated to RecommendationApply global size limits during rendering to avoid building large intermediate strings. Options:
Example (incremental render with early cutoff): function renderMessagesForCodexContextBounded(messages: AgentMessage[], maxChars: number): string {
const chunks: string[] = [];
let used = 0;
for (const message of messages) {
const text = renderMessageBody(message);
if (!text) continue;
const chunk = `[${message.role}]\n${text}`;
const separator = chunks.length ? "\n\n" : "";
const remaining = maxChars - used;
if (remaining <= 0) break;
const toAdd = (separator + chunk).slice(0, remaining);
chunks.push(toAdd);
used += toAdd.length;
if (used >= maxChars) break;
}
const rendered = chunks.join("");
return rendered.length >= maxChars ? `${rendered}\n[truncated]` : rendered;
}Additionally, consider rejecting/ignoring context-engine assemblies whose total message count or total text size exceed a safe threshold, and ensure 4. 🟡 Sensitive data exposure via unredacted error object logging in context-engine assembly failure path
DescriptionThe Codex app-server attempt runner logs a raw error object when context-engine assembly fails:
If a thrown error includes sensitive data in its message or properties (e.g., API keys, Authorization headers, prompt/transcript fragments, tool payloads), this path may persist it in logs without redaction. Vulnerable code: } catch (assembleErr) {
embeddedAgentLog.warn("context engine assemble failed; using Codex baseline prompt", {
error: assembleErr,
});
}RecommendationAvoid logging raw error objects on paths that may include user/tool data. Log a redacted, message-only representation using Example fix: } catch (assembleErr) {
embeddedAgentLog.warn("context engine assemble failed; using Codex baseline prompt", {
error: formatErrorMessage(assembleErr),
});
}If stack traces are required for debugging, ensure they are redacted before logging (e.g., 5. 🟡 Sensitive data exposure via unredacted String(error) logging in harness context-engine lifecycle
DescriptionThe new harness context-engine lifecycle helpers log errors using
This can leak credentials or private prompt/transcript content into logs when a context engine implementation throws errors that embed those values. Vulnerable code: } catch (bootstrapErr) {
params.warn(`context engine bootstrap failed: ${String(bootstrapErr)}`);
}
...
} catch (afterTurnErr) {
params.warn(`context engine afterTurn failed: ${String(afterTurnErr)}`);
}
...
} catch (ingestErr) {
params.warn(`context engine ingest failed: ${String(ingestErr)}`);
}RecommendationUse the shared Example fix: import { formatErrorMessage } from "../../infra/errors.js";
// ...
} catch (afterTurnErr) {
postTurnFinalizationSucceeded = false;
params.warn(`context engine afterTurn failed: ${formatErrorMessage(afterTurnErr)}`);
}Apply the same change to Analyzed PR: #70809 at commit Last updated on: 2026-04-24T03:57:00Z |
Greptile SummaryThis PR wires the OpenClaw context-engine lifecycle (bootstrap, assemble, finalize, maintenance) into the bundled non-ACP Codex app-server harness, bringing it to feature parity with the embedded PI path. It extracts shared helpers into a harness-neutral
Confidence Score: 4/5Safe to merge after addressing the maintenance error-handling gap in compact.ts; all other paths are well-implemented. One P1 defect: a maintenance failure in the engine-owns-compaction path prevents native Codex compaction from running, leaving the Codex thread without compaction in an edge case. Everything else — lifecycle ordering, projection logic, bootstrap gating, post-turn finalization, and re-export chain — looks correct and consistent with the PI embedded path. extensions/codex/src/app-server/compact.ts — maintenance error handling before compactCodexNativeThread Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/codex/src/app-server/compact.ts
Line: 38-58
Comment:
**Maintenance failure silently skips native Codex compaction**
If `runHarnessContextEngineMaintenance` throws, the `await compactCodexNativeThread(...)` call on line 58 is never reached. The design intent (and the plan doc) is that native Codex compaction is "best-effort" and should run regardless of whether context-engine maintenance succeeded — skipping it on a maintenance failure can let the internal Codex thread grow unbounded. Wrapping maintenance in a try-catch here would preserve the best-effort semantic.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/codex/src/app-server/compact.ts
Line: 59-72
Comment:
**Native compaction diagnostics lost when primary result is absent**
When `primary.result` is `undefined` (e.g. compaction ran but returned no result object, or `primary.compacted` is false), `nativeResult` is discarded and the caller has no visibility into whether native Codex compaction ran. Consider including the native result in the return even when `primary.result` is absent.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat: port codex harness context engine" | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b9903355d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53ddfba8d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| agentDir, | ||
| contextTokenBudget, | ||
| }); | ||
| const harnessResult = await maybeCompactAgentHarnessSession({ |
There was a problem hiding this comment.
Dispose resolved context engine on harness compaction errors
compactEmbeddedPiSession now resolves a context engine before delegating to maybeCompactAgentHarnessSession, but this call is not protected by a try/finally. If harness selection or harness compact throws (for example, a Codex harness failure before it returns an EmbeddedPiCompactResult), execution exits before any contextEngine.dispose?.() call, leaking engine-owned resources across repeated compactions. Please ensure the resolved engine is disposed on both success and throw paths.
Useful? React with 👍 / 👎.
| prompt: string; | ||
| systemPromptAddition?: string; | ||
| }): CodexContextProjection { | ||
| const prompt = params.prompt.trim(); |
There was a problem hiding this comment.
Preserve user prompt bytes when projecting assembled context
projectContextEngineAssemblyForCodex trims params.prompt before both deduplication and final prompt construction, which mutates user input whenever the context-engine path is active. This can change whitespace-sensitive prompts (e.g., indented code, deliberate trailing spaces, or newline-only continuation prompts) and can also trigger false duplicate removal in dropDuplicateTrailingPrompt. Keep the original prompt text for output and only normalize separately when needed for comparisons.
Useful? React with 👍 / 👎.
cc510f6 to
8dc77e7
Compare
8dc77e7 to
dd38014
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8dc77e77d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| runMaintenance: runHarnessContextEngineMaintenance, | ||
| sessionManager, | ||
| warn: (message) => embeddedAgentLog.warn(message), |
There was a problem hiding this comment.
Reopen session manager before turn maintenance
runCodexAppServerAttempt passes the sessionManager instance that was opened before the turn into finalizeHarnessContextEngineTurn, but transcript mirroring just before this step writes through a different SessionManager instance. If a context-engine maintenance pass uses runtimeContext.rewriteTranscriptEntries, it can rewrite from that stale pre-turn branch and move the active leaf away from the newly mirrored turn messages. Reopen the session file (or avoid passing sessionManager here) after mirroring so maintenance operates on fresh transcript state.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da51769870
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -123,7 +137,10 @@ export async function runCodexAppServerAttempt( | |||
| runId: params.runId, | |||
| }, | |||
| }); | |||
| const historyMessages = readMirroredSessionHistoryMessages(params.sessionFile); | |||
| const hadSessionFile = await fileExists(params.sessionFile); | |||
| const sessionManager = SessionManager.open(params.sessionFile); | |||
There was a problem hiding this comment.
Guard SessionManager opening before reading history
runCodexAppServerAttempt now opens SessionManager unconditionally before any context-engine checks. If sessionFile is malformed or otherwise unreadable, SessionManager.open(...) can throw here and abort the entire Codex turn, whereas the previous flow tolerated history-read failures and continued with empty history. This regresses resilience for corrupted session transcripts and can block users from sending any further turns until manual file repair.
Useful? React with 👍 / 👎.
| return enqueueCommandInLane(sessionLane, () => | ||
| enqueueGlobal(async () => { |
There was a problem hiding this comment.
Dispose context engine when queue enqueue fails
The context engine is resolved before lane enqueue, but disposal is only handled in the harness-result branch or inside the queued callback. If enqueueCommandInLane(...) rejects before executing the task (for example while gateway draining rejects new work), this function exits without running the inner finally, leaking the resolved context engine instance. Repeated compaction attempts during restart/drain windows can accumulate leaked engine resources.
Useful? React with 👍 / 👎.
ef3b9b3 to
3e2f9f3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e2f9f34fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const finalMessages = | ||
| readMirroredSessionHistoryMessages(params.sessionFile) ?? | ||
| historyMessages.concat(result.messagesSnapshot); |
There was a problem hiding this comment.
Fall back to in-memory turn messages when mirror write fails
mirrorTranscriptBestEffort intentionally swallows transcript write errors, but finalization immediately prefers readMirroredSessionHistoryMessages(...) and only falls back to historyMessages.concat(result.messagesSnapshot) when the read itself fails. If the file is still readable after a failed mirror append, finalizeHarnessContextEngineTurn receives pre-turn history only, so afterTurn/ingestBatch can miss the current user+assistant turn and silently desynchronize context-engine memory from the actual run.
Useful? React with 👍 / 👎.
| const { model: ceModel } = await resolveModelAsync( | ||
| ceProvider, | ||
| ceModelId, | ||
| agentDir, | ||
| params.config, |
There was a problem hiding this comment.
Defer model resolution until after harness compaction short-circuit
This resolveModelAsync(...) call now runs before maybeCompactAgentHarnessSession(...), so compaction can fail on model-resolution errors before a non-PI harness gets to handle compaction. In cases like Codex harness/native thread compaction, an unknown or stale model id becomes a hard blocker even though the harness path itself does not require this lookup, which is a regression from the previous short-circuit ordering.
Useful? React with 👍 / 👎.
…@jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
…claw#70809) Port the Codex app-server harness onto the context-engine lifecycle, add Codex context projection and compaction integration, and cover bootstrap/history/compaction fallback behavior. Thanks @jalehman.
… (thanks @jalehman) Co-authored-by: Josh Lehman <josh@martian.engineering>
Summary
contextEngineonEmbeddedRunAttemptParams, but did not actually run the context-engine lifecycle around Codex turns or compactions.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
N/A — this is feature parity / lifecycle plumbing for the bundled Codex harness.
Regression Test Plan (if applicable)
extensions/codex/src/app-server/context-engine-projection.test.tsextensions/codex/src/app-server/run-attempt.context-engine.test.tsextensions/codex/src/app-server/compact.test.tssrc/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.tssrc/agents/pi-embedded-runner/compact.hooks.test.tsUser-visible / Behavior Changes
Codex app-server harness sessions can now honor configured context-engine plugins for prompt assembly, post-turn ingestion/maintenance, and engine-owned compaction. Users running context-engine plugins should see Codex behave more like the embedded PI path.
Diagram (if applicable)
Security Impact (required)
No)No)No)No)Yes)Yes, explain risk + mitigation: Codex harness now passes mirrored session history and runtime context through the already-configured context-engine plugin lifecycle, matching the existing embedded PI behavior. This is gated by the configured context-engine slot and excludes the legacy engine viaisActiveHarnessContextEngine.Repro + Verification
Environment
Steps
Expected
info.ownsCompaction === true.Actual
Evidence
Validation run locally:
pnpm test -- --run extensions/codex/src/app-server/context-engine-projection.test.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts extensions/codex/src/app-server/compact.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.tspnpm test -- --run src/agents/pi-embedded-runner/compact.hooks.test.ts src/agents/harness/selection.test.tspnpm -s tsgo:test:extensionspnpm -s tsgo:test:srcpnpm -s lint:corepnpm -s lint:extensionspnpm -s lint:plugins:no-extension-src-importspnpm -s lint:extensions:no-src-outside-plugin-sdkpnpm -s lint:extensions:no-relative-outside-packagepnpm -s lint:docsHuman Verification (required)
What I personally verified (not just CI), and how:
main; ran focused Codex context-engine tests, PI compaction/harness tests, TypeScript checks, extension/core lints, plugin boundary lints, and docs lint.afterTurnfallback toingestBatch, skipped maintenance on failed prompts, and PI compaction hook behavior after helper extraction.Review Conversations
Compatibility / Migration
Yes)No)No)Risks and Mitigations
ownsCompactionmakes context-engine compaction primary; native Codex compaction remains best-effort and is recorded separately in result details.