TLDR
Resolved as a hard-gate failure by PR #80323's beta.5 correction.
The old body said first-hour-20 had 18 total / 6 pass / 12 fail. That was the pre-correction mock/harness state. The current beta.5 proof shows:
{
"total": 18,
"passed": 15,
"skipped": 3,
"failed": 0
}
Product impact if OpenClaw moved fully to Codex today: P4 from this issue as filed. The remaining 3 rows are native/live proof gaps, not mock-proven Codex product bugs.
QA impact: P0 resolved for the maintainer mock gate. The gate now has zero hard failures and explicit report-only rows.
Latest Evidence
OpenClaw baseline: v2026.5.10-beta.5
PR: #80323
PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Confidence tracker: #80936
Artifact: first-hour-20-direct/qa-suite-summary.json
Remote workflow step Run first-hour-20 direct lane completed successfully.
Report-Only Rows
These rows are intentionally not hard-failed in mock mode:
Instruction followthrough repo contract: mock-openai cannot exercise Codex-native read/write tools.
Approval turn tool followthrough: mock-openai still models approval followthrough as a Pi-style read call; Codex-native approval/read behavior needs native/live proof.
Compaction retry after mutating tool: mock-openai cannot create files through Codex-native read/write; compaction replay safety remains a native/live proof lane.
Current Verdict
This issue should stay closed as the stale failing-gate report. Follow remaining proof work in:
TLDR
Resolved as a hard-gate failure by PR #80323's beta.5 correction.
The old body said
first-hour-20had 18 total / 6 pass / 12 fail. That was the pre-correction mock/harness state. The current beta.5 proof shows:{ "total": 18, "passed": 15, "skipped": 3, "failed": 0 }Product impact if OpenClaw moved fully to Codex today: P4 from this issue as filed. The remaining 3 rows are native/live proof gaps, not mock-proven Codex product bugs.
QA impact: P0 resolved for the maintainer mock gate. The gate now has zero hard failures and explicit report-only rows.
Latest Evidence
Remote workflow step
Run first-hour-20 direct lanecompleted successfully.Report-Only Rows
These rows are intentionally not hard-failed in mock mode:
Instruction followthrough repo contract: mock-openai cannot exercise Codex-native read/write tools.Approval turn tool followthrough: mock-openai still models approval followthrough as a Pi-stylereadcall; Codex-native approval/read behavior needs native/live proof.Compaction retry after mutating tool: mock-openai cannot create files through Codex-native read/write; compaction replay safety remains a native/live proof lane.Current Verdict
This issue should stay closed as the stale failing-gate report. Follow remaining proof work in:
soak-100proof.