Correction TLDR
Status: harness/mock-provider artifact, not a proven user-facing Codex app-server bug.
The original issue overclaimed this as a P1 Codex runtime problem. A higher-confidence audit shows the mock provider emits a provider-level read function call from prompt text even when the Codex app-server lane does not declare read as an OpenClaw dynamic tool. Codex intentionally owns workspace tools such as read/write/edit/exec/apply_patch natively rather than exposing them through the OpenClaw dynamic-tool bridge.
What actually breaks: the QA parity harness was comparing a malformed mock-provider plan against the Codex app-server lane. This is not enough evidence that real users lose approval-followthrough reads.
Product impact if OpenClaw moved fully to Codex today: P4 until live/native proof says otherwise. The remaining risk is live proof coverage, not a demonstrated production approval-read regression.
Latest Beta.5 Evidence
OpenClaw baseline: v2026.5.10-beta.5
PR: #80323
PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Confidence tracker: #80936
The corrected first-hour-20 mock gate now has zero hard failures:
{
"first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 }
}
approval-turn-tool-followthrough is one of the 3 report-only rows:
mock-openai still models approval followthrough as a Pi-style read call; Codex-native approval/read behavior requires native/live proof
Correct Fix
- Gate mock
read planning on declared/available tools, or model Codex-native read through the real Codex app-server native tool protocol.
- Keep mock provider-plan diagnostics separate from runtime transcript/tool-call evidence.
- Reopen/escalate as a product bug only if a live/native Codex run shows approved reads fail outside this mock contract.
Superseded Original Report
The earlier reproduction and observed drift were useful for finding the harness flaw, but should not be read as proof of a Codex product/runtime bug.
Links
Correction TLDR
Status: harness/mock-provider artifact, not a proven user-facing Codex app-server bug.
The original issue overclaimed this as a P1 Codex runtime problem. A higher-confidence audit shows the mock provider emits a provider-level
readfunction call from prompt text even when the Codex app-server lane does not declarereadas an OpenClaw dynamic tool. Codex intentionally owns workspace tools such asread/write/edit/exec/apply_patchnatively rather than exposing them through the OpenClaw dynamic-tool bridge.What actually breaks: the QA parity harness was comparing a malformed mock-provider plan against the Codex app-server lane. This is not enough evidence that real users lose approval-followthrough reads.
Product impact if OpenClaw moved fully to Codex today: P4 until live/native proof says otherwise. The remaining risk is live proof coverage, not a demonstrated production approval-read regression.
Latest Beta.5 Evidence
The corrected
first-hour-20mock gate now has zero hard failures:{ "first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 } }approval-turn-tool-followthroughis one of the 3 report-only rows:Correct Fix
readplanning on declared/available tools, or model Codex-native read through the real Codex app-server native tool protocol.Superseded Original Report
The earlier reproduction and observed drift were useful for finding the harness flaw, but should not be read as proof of a Codex product/runtime bug.
Links