feat(codex-cli): prefer output-last-message artifacts by BingqingLyu · Pull Request #2449 · BingqingLyu/openclaw

BingqingLyu · 2026-04-28T06:29:27Z

Summary

Problem: direct codex-cli runs still finalize the user-visible reply from stdout even though Codex can write a dedicated final-message artifact
Why it matters: stdout is useful for transport and telemetry, but it is a weaker source of truth for the final assistant answer than Codex's explicit --output-last-message output
What changed: codex-cli runs now allocate a temp --output-last-message file, prefer that artifact when it contains text, and keep stdout parsing as the fallback path
What did NOT change (scope boundary): other CLI backends still use their existing stdout/json/jsonl parsing paths, and this PR does not change failure classification or streaming behavior

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes Enhancement: finalize codex-cli replies from --output-last-message artifact instead of relying on stdout openclaw/openclaw#65074
Related [Bug]: OpenClaw treats codex exec sandbox-denied runs as success because codex exits 0 openclaw/openclaw#39317
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: the direct codex-cli runner only treated stdout parsing as the final reply source even though Codex already exposes a more stable final-message artifact contract
Missing detection / guardrail: there was no runner-level test that forced stdout and the final-message artifact to disagree and verified that OpenClaw finalized from the artifact
Contributing context (if known): Codex CLI already supports --output-last-message, but OpenClaw was not wiring that contract into the codex-cli backend path

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/agents/cli-runner.spawn.test.ts, src/agents/cli-output.test.ts
Scenario the test should lock in: when Codex writes both stdout JSONL and an --output-last-message file, OpenClaw should prefer the artifact when it is non-empty and fall back to parsed stdout when the artifact is empty
Why this is the smallest reliable guardrail: it exercises the exact runner boundary where args are built, the artifact is read, stdout is still parsed, and the final reply is selected
Existing test that already covers this (if any): none covered the artifact-vs-stdout precedence or Codex message-content JSONL fallback
If no new test is added, why not:

User-visible / Behavior Changes

codex-cli replies now finalize from Codex's explicit final-message artifact when available, which makes delegated Codex runs less dependent on stdout formatting details.

Diagram (if applicable)

Before:
[codex exec --json] -> [stdout parsed] -> [reply text selected from stdout]

After:
[codex exec --json --output-last-message <file>]
  -> [stdout parsed for session/usage/fallback]
  -> [artifact read]
  -> [artifact text wins when present]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (Yes)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:
- This only adds Codex's own --output-last-message flag plus a temp file under OpenClaw's preferred temp dir. The file is per-run, read back locally, and deleted in cleanup even when parsing falls back to stdout.

Repro + Verification

Environment

OS: macOS
Runtime/container: local pnpm/vitest
Model/provider: codex-cli/gpt-5.4
Integration/channel (if any): direct CLI runner
Relevant config (redacted): default codex-cli backend with --json

Steps

Run a direct codex-cli execution through the CLI runner.
Have Codex emit stdout JSONL and write a final-message artifact.
Compare the final reply selection when the artifact is populated vs empty.

Expected

OpenClaw should prefer the artifact text when it exists and keep stdout parsing as the fallback.

Actual

Before this change, the runner always finalized from stdout because it never requested or read Codex's final-message artifact.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: targeted vitest run for src/agents/cli-output.test.ts, src/agents/cli-runner.spawn.test.ts, and src/agents/cli-runner.reliability.test.ts
Edge cases checked: non-empty artifact overriding stdout; empty artifact falling back to stdout; Codex JSONL message content still parsing to plain assistant text
What you did not verify: full repo-wide pnpm check / pnpm test lanes and a live end-to-end Codex CLI run against the actual binary in this branch

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Risks and Mitigations

Risk: temp artifact cleanup could leak files if the runner exits early
- Mitigation: cleanup runs in the outer finally, and the unit test asserts the artifact path is removed after success
Risk: Codex stdout fallback could regress if the artifact is empty and stdout uses the normal message-content JSONL shape
- Mitigation: parseCliJsonl now extracts text from Codex message content records, and the fallback path is covered by a dedicated unit test

AI-assisted: Yes (Codex). Testing: targeted vitest coverage listed above. I attempted codex review --base origin/main, but the local Codex CLI is currently rate-limited.

feat(codex-cli): prefer output-last-message artifacts

9aea402

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codex-cli): prefer output-last-message artifacts#2449

feat(codex-cli): prefer output-last-message artifacts#2449
BingqingLyu wants to merge 1 commit intomainfrom
fork-pr-66081-feat-codex-output-last-message

BingqingLyu commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BingqingLyu commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BingqingLyu commented Apr 28, 2026 •

edited

Loading