Skip to content

fix(codex): synthesize failed tool.result for orphan tool.call (#86808)#87228

Open
Sanjays2402 wants to merge 1 commit into
openclaw:mainfrom
Sanjays2402:fix/86808-orphan-tool-call
Open

fix(codex): synthesize failed tool.result for orphan tool.call (#86808)#87228
Sanjays2402 wants to merge 1 commit into
openclaw:mainfrom
Sanjays2402:fix/86808-orphan-tool-call

Conversation

@Sanjays2402

Copy link
Copy Markdown
Contributor

Closes #86808.

Problem

When a Codex app-server turn ended after persisting a tool.call but before the matching tool.result (denied auto-approval, interrupted sandbox, runtime crash), the projector mirrored the call into the OpenClaw transcript and trajectory but never emitted a terminal result. The downstream invariant every persisted tool.call has exactly one terminal tool.result broke; the session resumed with an orphan tool call and the next prompt was rejected with SYSTEM_RUN_DENIED.

Fix

CodexAppServerEventProjector now:

  • Tracks transcript tool-call ids by tool name (toolTranscriptNamesById).
  • Tracks which call ids were recorded into the trajectory (toolTrajectoryCallIds / toolTrajectoryResultIds).
  • On buildResult, synthesizes a status: "failed", reason: "missing_tool_result" trajectory event and a mirrored toolResult message for every call id without a terminal result.
  • Bubbles a synthetic SYSTEM_RUN_DENIED promptError so the attempt is classified as failed instead of silently swallowed.

Regression test

extensions/codex/src/app-server/event-projector.test.tsfails closed and synthesizes a result when a native tool call never completes:

  • emits item/started for a commandExecution (no completion event),
  • finishes with turn/completed and an agentMessage,
  • asserts promptError, the synthesized toolResult message shape (toolCallId, toolName=bash, isError=true, error text), and the synthesized trajectory tool.result event.

Verified the test fails on main (expected 'null' to contain 'SYSTEM_RUN_DENIED') and passes with the fix.

node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts \
  extensions/codex/src/app-server/event-projector.test.ts
# 70 passed

Notes

  • No behaviour change when every tool.call already has a result (synthesize loop is a no-op).
  • The synthetic trajectory event uses status: "failed", result: { status: "failed", reason: "missing_tool_result" }, mirroring how the runtime would describe a denied tool today.

… no terminal result (openclaw#86808)

When the Codex app-server runtime drops a turn after persisting a
`tool.call` (denied auto-approval, interrupted sandbox, crashed
runtime), no matching `tool.result` was emitted into the mirrored
transcript or the trajectory recorder. The downstream invariant 'every
persisted tool.call has exactly one terminal tool.result' broke, the
OpenClaw session resumed with an orphan tool call, and the next prompt
could be rejected with SYSTEM_RUN_DENIED.

Track tool transcript call ids by name and trajectory call ids
explicitly, and on `buildResult` synthesize a terminal `status: failed,
reason: missing_tool_result` entry plus a matching mirrored toolResult
message for every orphan id. Bubble the synthetic error into
`promptError` so the attempt is reported as failed instead of silently
swallowed.

Adds a regression test reproducing the orphan tool.call path from the
issue report (commandExecution with no completion) and asserts the
mirrored toolResult, synthesized trajectory event, and propagated
promptError.
@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:12 AM ET / 05:12 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +50, Tests +75. Total +125 across 2 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1188aa3b81ef.

Label changes

Label changes:

  • remove P2: Current review triage priority is none.
  • remove merge-risk: 🚨 compatibility: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 session-state: Current PR review selected no merge-risk labels.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +50, Tests +75. Total +125 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 50 0 +50
Tests 1 75 0 +75
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 125 0 +125

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Codex review failed for this PR with exit 1.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OpenClaw can persist tool.call without a matching tool.result when a Codex turn is denied, interrupted, or terminated

2 participants