Skip to content

feat(codex-cli): prefer output-last-message artifacts#2449

Open
BingqingLyu wants to merge 1 commit intomainfrom
fork-pr-66081-feat-codex-output-last-message
Open

feat(codex-cli): prefer output-last-message artifacts#2449
BingqingLyu wants to merge 1 commit intomainfrom
fork-pr-66081-feat-codex-output-last-message

Conversation

@BingqingLyu
Copy link
Copy Markdown
Owner

@BingqingLyu BingqingLyu commented Apr 28, 2026

Summary

  • Problem: direct codex-cli runs still finalize the user-visible reply from stdout even though Codex can write a dedicated final-message artifact
  • Why it matters: stdout is useful for transport and telemetry, but it is a weaker source of truth for the final assistant answer than Codex's explicit --output-last-message output
  • What changed: codex-cli runs now allocate a temp --output-last-message file, prefer that artifact when it contains text, and keep stdout parsing as the fallback path
  • What did NOT change (scope boundary): other CLI backends still use their existing stdout/json/jsonl parsing paths, and this PR does not change failure classification or streaming behavior

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: the direct codex-cli runner only treated stdout parsing as the final reply source even though Codex already exposes a more stable final-message artifact contract
  • Missing detection / guardrail: there was no runner-level test that forced stdout and the final-message artifact to disagree and verified that OpenClaw finalized from the artifact
  • Contributing context (if known): Codex CLI already supports --output-last-message, but OpenClaw was not wiring that contract into the codex-cli backend path

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/cli-runner.spawn.test.ts, src/agents/cli-output.test.ts
  • Scenario the test should lock in: when Codex writes both stdout JSONL and an --output-last-message file, OpenClaw should prefer the artifact when it is non-empty and fall back to parsed stdout when the artifact is empty
  • Why this is the smallest reliable guardrail: it exercises the exact runner boundary where args are built, the artifact is read, stdout is still parsed, and the final reply is selected
  • Existing test that already covers this (if any): none covered the artifact-vs-stdout precedence or Codex message-content JSONL fallback
  • If no new test is added, why not:

User-visible / Behavior Changes

  • codex-cli replies now finalize from Codex's explicit final-message artifact when available, which makes delegated Codex runs less dependent on stdout formatting details.

Diagram (if applicable)

Before:
[codex exec --json] -> [stdout parsed] -> [reply text selected from stdout]

After:
[codex exec --json --output-last-message <file>]
  -> [stdout parsed for session/usage/fallback]
  -> [artifact read]
  -> [artifact text wins when present]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (Yes)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:
    • This only adds Codex's own --output-last-message flag plus a temp file under OpenClaw's preferred temp dir. The file is per-run, read back locally, and deleted in cleanup even when parsing falls back to stdout.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local pnpm/vitest
  • Model/provider: codex-cli/gpt-5.4
  • Integration/channel (if any): direct CLI runner
  • Relevant config (redacted): default codex-cli backend with --json

Steps

  1. Run a direct codex-cli execution through the CLI runner.
  2. Have Codex emit stdout JSONL and write a final-message artifact.
  3. Compare the final reply selection when the artifact is populated vs empty.

Expected

  • OpenClaw should prefer the artifact text when it exists and keep stdout parsing as the fallback.

Actual

  • Before this change, the runner always finalized from stdout because it never requested or read Codex's final-message artifact.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: targeted vitest run for src/agents/cli-output.test.ts, src/agents/cli-runner.spawn.test.ts, and src/agents/cli-runner.reliability.test.ts
  • Edge cases checked: non-empty artifact overriding stdout; empty artifact falling back to stdout; Codex JSONL message content still parsing to plain assistant text
  • What you did not verify: full repo-wide pnpm check / pnpm test lanes and a live end-to-end Codex CLI run against the actual binary in this branch

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: temp artifact cleanup could leak files if the runner exits early
    • Mitigation: cleanup runs in the outer finally, and the unit test asserts the artifact path is removed after success
  • Risk: Codex stdout fallback could regress if the artifact is empty and stdout uses the normal message-content JSONL shape
    • Mitigation: parseCliJsonl now extracts text from Codex message content records, and the fallback path is covered by a dedicated unit test

AI-assisted: Yes (Codex). Testing: targeted vitest coverage listed above. I attempted codex review --base origin/main, but the local Codex CLI is currently rate-limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: finalize codex-cli replies from --output-last-message artifact instead of relying on stdout

2 participants