Context
RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.
This issue implements the Codex CLI runtime — targeting OpenAI's codex CLI from openai/codex.
Architecture
AgentRuntime (interface, src/middleware/types.ts)
└── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
├── ClaudeCliRuntime (src/middleware/runtimes/claude.ts) ✅ done
├── GeminiCliRuntime (src/middleware/runtimes/gemini.ts) ✅ done
└── CodexCliRuntime ← THIS ISSUE
CLIRuntimeBase requires subclasses to implement:
/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];
/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;
/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;
Additionally, subclasses may override:
get supportsStdinPrompt(): boolean — whether the CLI accepts prompts via stdin (default: true)
execute() — to wrap the base execution with per-call setup/teardown
Dependencies
src/middleware/types.ts — AgentRuntime, AgentExecuteParams, AgentEvent, AgentRunResult, etc.
src/middleware/cli-runtime-base.ts — CLIRuntimeBase abstract class
src/middleware/runtimes/claude.ts / gemini.ts — reference implementations (same pattern)
All exist on main.
Specification
File: src/middleware/runtimes/codex.ts
Create CodexCliRuntime extending CLIRuntimeBase.
Constructor
constructor() {
super("codex"); // CLI binary name
}
get supportsStdinPrompt(): boolean
Override to return false. The Codex CLI accepts prompts only as positional arguments, not via stdin.
buildArgs(params: AgentExecuteParams): string[]
Build the Codex CLI argument list. Codex uses the exec subcommand with different syntax for new vs resumed sessions.
New session:
| Arg |
Value |
When |
exec |
(subcommand) |
Always |
--json |
(none) |
Always — NDJSON streaming output |
--color |
never |
Always — prevents ANSI escape codes in output |
| (positional) |
params.prompt |
Always (new session) |
Result: ["exec", "--json", "--color", "never", params.prompt]
Session resume:
| Arg |
Value |
When |
exec |
(subcommand) |
Always |
resume |
(resume sub-subcommand) |
When params.sessionId is provided |
| (positional) |
params.sessionId |
Session ID to resume |
--json |
(none) |
Always |
--color |
never |
Always |
Result: ["exec", "resume", params.sessionId, "--json", "--color", "never"]
Important: Prompt is excluded on resume. This is a documented Codex CLI limitation — codex exec resume <id> does not accept a new prompt. The agent continues from where it left off. The prompt from params.prompt is ignored when params.sessionId is provided.
MCP config is not handled via CLI args — see the execute() override section below.
execute() override — state reset + MCP config + done enrichment
Same pattern as GeminiCliRuntime: override execute() to manage per-execution state and MCP config file lifecycle.
async *execute(params: AgentExecuteParams): AsyncIterable<AgentEvent> {
this.resetState();
const mcpConfigManager =
params.mcpServers && Object.keys(params.mcpServers).length > 0
? new CodexMcpConfigManager(params.workingDirectory, params.mcpServers)
: null;
try {
await mcpConfigManager?.setup();
for await (const event of super.execute(params)) {
if (event.type === "done") {
this.enrichDoneEvent(event);
}
yield event;
}
} finally {
await mcpConfigManager?.teardown();
}
}
MCP config file management:
The Codex CLI reads MCP config from ~/.codex/config.toml (global) or a project-local equivalent. The config uses TOML format.
The CodexMcpConfigManager follows the same merge-restore pattern as GeminiMcpConfigManager:
- Setup: Check for existing config file, save copy if it exists, merge
mcp_servers section, write back
- Teardown: Restore original or delete created file
Codex TOML MCP format:
[mcp_servers.server_name]
type = "stdio"
command = ["node", "server.js"]
[mcp_servers.server_name.env]
KEY = "VALUE"
Note the differences from our McpServerConfig type:
command is an array: [config.command, ...(config.args ?? [])]
type = "stdio" is required (always "stdio" for RemoteClaw's MCP servers)
TOML generation: Since the MCP config TOML structure is simple and predictable, generate it manually without a TOML library. The format is straightforward string concatenation of [mcp_servers.<name>] sections. If a TOML library is preferred, check smol-toml (small, zero-dependency).
Implementation note: Check during implementation whether the Codex CLI supports a --config flag pointing to a custom config file. If it does, use a temp directory approach (cleaner). The merge-restore pattern on ~/.codex/config.toml is the fallback.
extractEvent(line: string): AgentEvent | null
Parse a single NDJSON line from Codex's --json output into an AgentEvent.
Codex --json format (verified from openai/codex SDK source: sdk/typescript/src/events.ts, items.ts and official docs):
Each NDJSON line is bare JSON (no envelope) with a type discriminator. There are 8 event types. Items within events have their own item.type discriminator with 8 sub-types.
Event types:
| Event Type |
Description |
thread.started |
New thread created, contains thread_id |
turn.started |
Agent turn begins |
item.started |
Item lifecycle start, contains full item data |
item.updated |
Item state update (progressive text for agent_message) |
item.completed |
Item lifecycle end, contains final item data |
turn.completed |
Agent turn ends, contains usage |
turn.failed |
Agent turn failed |
error |
Stream-level error |
Item types (discriminated by item.type):
| Item Type |
Description |
Relevant For |
agent_message |
Text output from agent |
AgentTextEvent |
command_execution |
Shell command execution |
AgentToolUseEvent / AgentToolResultEvent |
mcp_tool_call |
MCP tool invocation |
AgentToolUseEvent / AgentToolResultEvent |
file_change |
File modification |
Skip (or AgentToolUseEvent if useful) |
reasoning |
Reasoning/thinking content |
Skip |
web_search |
Web search invocation |
Skip (or AgentToolUseEvent if useful) |
error |
Error item |
AgentErrorEvent |
todo_list |
Task/todo tracking |
Skip |
Event mapping:
| Codex Event |
Condition |
Maps To |
Notes |
thread.started |
— |
Skip |
Extract thread_id as currentSessionId |
turn.started |
— |
Skip |
Turn lifecycle boundary |
item.started |
item.type === "command_execution" |
AgentToolUseEvent |
{ toolName: "command_execution", toolId, input: { command } } |
item.started |
item.type === "mcp_tool_call" |
AgentToolUseEvent |
{ toolName: item.name, toolId, input: item.arguments } |
item.started |
other |
Skip |
|
item.updated |
item.type === "agent_message" |
AgentTextEvent |
Delta computation (see below) |
item.updated |
other |
Skip |
Intermediate state |
item.completed |
item.type === "agent_message" |
AgentTextEvent |
Emit final delta if any |
item.completed |
item.type === "command_execution" |
AgentToolResultEvent |
{ toolId, output: item.output, isError: item.exit_code !== 0 } |
item.completed |
item.type === "mcp_tool_call" |
AgentToolResultEvent |
{ toolId, output: item.output, isError: !!item.error } |
item.completed |
item.type === "error" |
AgentErrorEvent |
{ message: item.message } |
item.completed |
other |
Skip |
|
turn.completed |
— |
Store for enrichment |
Extract usage field |
turn.failed |
— |
AgentErrorEvent |
{ message, code: "turn_failed" } |
error |
— |
AgentErrorEvent |
{ message } |
Text streaming — delta computation:
The item.updated event for agent_message contains the full accumulated text so far (not just the new delta). To emit incremental AgentTextEvents:
// Instance state:
private lastEmittedTextLength = 0;
// In item.updated handler for agent_message:
const fullText = /* extract text from item content */;
const delta = fullText.substring(this.lastEmittedTextLength);
this.lastEmittedTextLength = fullText.length;
if (delta) {
this.accumulatedText += delta;
return { type: "text", text: delta };
}
return null;
Reset lastEmittedTextLength to 0 on each new item.started for agent_message.
Tool ID generation: Codex items have SDK-native IDs. Extract from item.id field. If not present, generate with codex-item-${counter}.
Stateful fields (instance-level, reset per execute() call):
currentSessionId: string | undefined — from thread.started event's thread_id
accumulatedText: string — concatenated text from message deltas for AgentRunResult.text
lastEmittedTextLength: number — for delta computation within a message item
lastUsage: AgentUsage | undefined — from turn.completed event's usage field
currentToolId: string | undefined — tracks the active tool item ID for correlating item.started → item.completed
Usage extraction (from turn.completed event's usage field):
// turn.completed.usage structure:
{
input_tokens: number;
cached_input_tokens: number;
output_tokens: number;
}
Map to AgentUsage:
inputTokens ← usage.input_tokens
outputTokens ← usage.output_tokens
cacheReadTokens ← usage.cached_input_tokens (when > 0)
buildEnv(params: AgentExecuteParams): Record<string, string>
Return environment variable overrides for the Codex subprocess.
Cross-contamination prevention: Codex strips ANTHROPIC_API_KEY from its subprocess environment to prevent accidental cross-provider auth leakage. Implement this in buildEnv():
protected buildEnv(_params: AgentExecuteParams): Record<string, string> {
return {
ANTHROPIC_API_KEY: "", // Prevent cross-provider leakage
};
}
Auth credentials (OPENAI_API_KEY) are passed through params.env by the caller, not hardcoded in the runtime.
Done event enrichment
Same pattern as Claude/Gemini: intercept the done event and enrich AgentRunResult with accumulated state.
Result metadata mapping (from accumulated state → AgentRunResult):
text ← accumulated from all agent_message delta events
sessionId ← from thread.started event's thread_id
usage ← from turn.completed event's usage (last one, if multiple turns)
Note: Codex does not report cost, API duration, or stop reason in its NDJSON output.
File: src/middleware/runtimes/codex.test.ts
Unit tests following the same testable-subclass pattern.
-
Argument construction (6+ test cases):
- New session:
["exec", "--json", "--color", "never", "<prompt>"]
- Session resume:
["exec", "resume", "<session-id>", "--json", "--color", "never"] — no prompt
- No session: no
resume sub-subcommand
- Verify prompt is excluded on resume
- Verify
--color never always present
- Verify
exec always first arg
-
Event extraction (12+ test cases):
thread.started → skip (but thread_id captured as session ID)
turn.started → skip
item.started + command_execution → AgentToolUseEvent
item.started + mcp_tool_call → AgentToolUseEvent with tool name and arguments
item.started + agent_message → skip
item.updated + agent_message → AgentTextEvent with delta from accumulated text
item.updated + agent_message (multiple) → correct incremental deltas
item.completed + command_execution → AgentToolResultEvent with exit_code check
item.completed + mcp_tool_call → AgentToolResultEvent
item.completed + error → AgentErrorEvent
turn.completed → stores usage, returns null
turn.failed → AgentErrorEvent
error → AgentErrorEvent
- Unknown event type → skip
-
Environment construction (3+ test cases):
- Strips
ANTHROPIC_API_KEY (cross-contamination prevention)
- Does not inject
OPENAI_API_KEY (caller responsibility)
-
MCP config file management (4+ test cases):
- TOML file created when
mcpServers has entries
- Correct TOML structure (
[mcp_servers.<name>] sections)
command array correctly formed from McpServerConfig.command + args
- Cleanup on teardown
-
supportsStdinPrompt (1 test case):
-
Done event enrichment (3+ test cases):
- Enriches with accumulated text, session ID from thread_id, usage
- Handles missing usage gracefully
- Handles multiple turns (uses last turn's usage)
Acceptance Criteria
Reference
src/middleware/runtimes/claude.ts / gemini.ts — reference implementations
src/middleware/runtimes/claude.test.ts / gemini.test.ts — reference test files
openai/codex SDK source:
sdk/typescript/src/events.ts — event type definitions
sdk/typescript/src/items.ts — item type definitions
- Official Codex non-interactive docs —
codex exec --json format
- Historical:
--json was previously --experimental-json; agent_message was previously assistant_message
- The app-server protocol (
codex app-server) uses a richer format with slash-delimited events — NOT relevant for CLI --json output
- Known limitation: Prompt is excluded on session resume (
codex exec resume <id> does not accept a new prompt)
- Empirical verification needed: Exact item field names (e.g.,
item.output vs item.result, item.command vs item.input) and item.id availability require capture of actual codex exec --json output during implementation
Context
RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class
CLIRuntimeBase(insrc/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.This issue implements the Codex CLI runtime — targeting OpenAI's
codexCLI fromopenai/codex.Architecture
CLIRuntimeBaserequires subclasses to implement:Additionally, subclasses may override:
get supportsStdinPrompt(): boolean— whether the CLI accepts prompts via stdin (default:true)execute()— to wrap the base execution with per-call setup/teardownDependencies
src/middleware/types.ts—AgentRuntime,AgentExecuteParams,AgentEvent,AgentRunResult, etc.src/middleware/cli-runtime-base.ts—CLIRuntimeBaseabstract classsrc/middleware/runtimes/claude.ts/gemini.ts— reference implementations (same pattern)All exist on
main.Specification
File:
src/middleware/runtimes/codex.tsCreate
CodexCliRuntimeextendingCLIRuntimeBase.Constructor
get supportsStdinPrompt(): booleanOverride to return
false. The Codex CLI accepts prompts only as positional arguments, not via stdin.buildArgs(params: AgentExecuteParams): string[]Build the Codex CLI argument list. Codex uses the
execsubcommand with different syntax for new vs resumed sessions.New session:
exec--json--colorneverparams.promptResult:
["exec", "--json", "--color", "never", params.prompt]Session resume:
execresumeparams.sessionIdis providedparams.sessionId--json--colorneverResult:
["exec", "resume", params.sessionId, "--json", "--color", "never"]Important: Prompt is excluded on resume. This is a documented Codex CLI limitation —
codex exec resume <id>does not accept a new prompt. The agent continues from where it left off. The prompt fromparams.promptis ignored whenparams.sessionIdis provided.MCP config is not handled via CLI args — see the
execute()override section below.execute()override — state reset + MCP config + done enrichmentSame pattern as
GeminiCliRuntime: overrideexecute()to manage per-execution state and MCP config file lifecycle.MCP config file management:
The Codex CLI reads MCP config from
~/.codex/config.toml(global) or a project-local equivalent. The config uses TOML format.The
CodexMcpConfigManagerfollows the same merge-restore pattern asGeminiMcpConfigManager:mcp_serverssection, write backCodex TOML MCP format:
Note the differences from our
McpServerConfigtype:commandis an array:[config.command, ...(config.args ?? [])]type = "stdio"is required (always"stdio"for RemoteClaw's MCP servers)TOML generation: Since the MCP config TOML structure is simple and predictable, generate it manually without a TOML library. The format is straightforward string concatenation of
[mcp_servers.<name>]sections. If a TOML library is preferred, checksmol-toml(small, zero-dependency).Implementation note: Check during implementation whether the Codex CLI supports a
--configflag pointing to a custom config file. If it does, use a temp directory approach (cleaner). The merge-restore pattern on~/.codex/config.tomlis the fallback.extractEvent(line: string): AgentEvent | nullParse a single NDJSON line from Codex's
--jsonoutput into anAgentEvent.Codex
--jsonformat (verified fromopenai/codexSDK source:sdk/typescript/src/events.ts,items.tsand official docs):Each NDJSON line is bare JSON (no envelope) with a
typediscriminator. There are 8 event types. Items within events have their ownitem.typediscriminator with 8 sub-types.Event types:
thread.startedthread_idturn.starteditem.starteditemdataitem.updatedagent_message)item.completeditemdataturn.completedusageturn.failederrorItem types (discriminated by
item.type):agent_messageAgentTextEventcommand_executionAgentToolUseEvent/AgentToolResultEventmcp_tool_callAgentToolUseEvent/AgentToolResultEventfile_changeAgentToolUseEventif useful)reasoningweb_searchAgentToolUseEventif useful)errorAgentErrorEventtodo_listEvent mapping:
thread.startedthread_idascurrentSessionIdturn.starteditem.starteditem.type === "command_execution"AgentToolUseEvent{ toolName: "command_execution", toolId, input: { command } }item.starteditem.type === "mcp_tool_call"AgentToolUseEvent{ toolName: item.name, toolId, input: item.arguments }item.starteditem.updateditem.type === "agent_message"AgentTextEventitem.updateditem.completeditem.type === "agent_message"AgentTextEventitem.completeditem.type === "command_execution"AgentToolResultEvent{ toolId, output: item.output, isError: item.exit_code !== 0 }item.completeditem.type === "mcp_tool_call"AgentToolResultEvent{ toolId, output: item.output, isError: !!item.error }item.completeditem.type === "error"AgentErrorEvent{ message: item.message }item.completedturn.completedusagefieldturn.failedAgentErrorEvent{ message, code: "turn_failed" }errorAgentErrorEvent{ message }Text streaming — delta computation:
The
item.updatedevent foragent_messagecontains the full accumulated text so far (not just the new delta). To emit incrementalAgentTextEvents:Reset
lastEmittedTextLengthto 0 on each newitem.startedforagent_message.Tool ID generation: Codex items have SDK-native IDs. Extract from
item.idfield. If not present, generate withcodex-item-${counter}.Stateful fields (instance-level, reset per
execute()call):currentSessionId: string | undefined— fromthread.startedevent'sthread_idaccumulatedText: string— concatenated text from message deltas forAgentRunResult.textlastEmittedTextLength: number— for delta computation within a message itemlastUsage: AgentUsage | undefined— fromturn.completedevent'susagefieldcurrentToolId: string | undefined— tracks the active tool item ID for correlatingitem.started→item.completedUsage extraction (from
turn.completedevent'susagefield):Map to
AgentUsage:inputTokens←usage.input_tokensoutputTokens←usage.output_tokenscacheReadTokens←usage.cached_input_tokens(when > 0)buildEnv(params: AgentExecuteParams): Record<string, string>Return environment variable overrides for the Codex subprocess.
Cross-contamination prevention: Codex strips
ANTHROPIC_API_KEYfrom its subprocess environment to prevent accidental cross-provider auth leakage. Implement this inbuildEnv():Auth credentials (
OPENAI_API_KEY) are passed throughparams.envby the caller, not hardcoded in the runtime.Done event enrichment
Same pattern as Claude/Gemini: intercept the
doneevent and enrichAgentRunResultwith accumulated state.Result metadata mapping (from accumulated state →
AgentRunResult):text← accumulated from allagent_messagedelta eventssessionId← fromthread.startedevent'sthread_idusage← fromturn.completedevent'susage(last one, if multiple turns)Note: Codex does not report cost, API duration, or stop reason in its NDJSON output.
File:
src/middleware/runtimes/codex.test.tsUnit tests following the same testable-subclass pattern.
Argument construction (6+ test cases):
["exec", "--json", "--color", "never", "<prompt>"]["exec", "resume", "<session-id>", "--json", "--color", "never"]— no promptresumesub-subcommand--color neveralways presentexecalways first argEvent extraction (12+ test cases):
thread.started→ skip (but thread_id captured as session ID)turn.started→ skipitem.started+command_execution→AgentToolUseEventitem.started+mcp_tool_call→AgentToolUseEventwith tool name and argumentsitem.started+agent_message→ skipitem.updated+agent_message→AgentTextEventwith delta from accumulated textitem.updated+agent_message(multiple) → correct incremental deltasitem.completed+command_execution→AgentToolResultEventwith exit_code checkitem.completed+mcp_tool_call→AgentToolResultEventitem.completed+error→AgentErrorEventturn.completed→ stores usage, returns nullturn.failed→AgentErrorEventerror→AgentErrorEventEnvironment construction (3+ test cases):
ANTHROPIC_API_KEY(cross-contamination prevention)OPENAI_API_KEY(caller responsibility)MCP config file management (4+ test cases):
mcpServershas entries[mcp_servers.<name>]sections)commandarray correctly formed fromMcpServerConfig.command+argssupportsStdinPrompt(1 test case):falseDone event enrichment (3+ test cases):
Acceptance Criteria
src/middleware/runtimes/codex.tsexists and exportsCodexCliRuntimeCLIRuntimeBaseand implements all three abstract methodssupportsStdinPromptreturnsfalsebuildArgs()producesexec --json --color never <prompt>for new sessionsbuildArgs()producesexec resume <id> --json --color neverfor resumed sessions (no prompt)extractEvent()correctly maps all 8 Codex event types toAgentEventtypesextractEvent()handles the two-level event model: events (thread/turn) + items (8 sub-types)item.updatedeventsitem.started→AgentToolUseEvent,item.completed→AgentToolResultEventthread.startedevent'sthread_idturn.completedevent'susagefieldbuildEnv()stripsANTHROPIC_API_KEY(cross-contamination prevention)mcpServershas entriespnpm buildpassespnpm testpassesReference
src/middleware/runtimes/claude.ts/gemini.ts— reference implementationssrc/middleware/runtimes/claude.test.ts/gemini.test.ts— reference test filesopenai/codexSDK source:sdk/typescript/src/events.ts— event type definitionssdk/typescript/src/items.ts— item type definitionscodex exec --jsonformat--jsonwas previously--experimental-json;agent_messagewas previouslyassistant_messagecodex app-server) uses a richer format with slash-delimited events — NOT relevant for CLI--jsonoutputcodex exec resume <id>does not accept a new prompt)item.outputvsitem.result,item.commandvsitem.input) anditem.idavailability require capture of actualcodex exec --jsonoutput during implementation