TLDR
Status: QA harness/mock-provider issue, not a proven broad Codex runtime tool dropout.
The original report overclaimed that Codex drops planned tool calls for most default tool fixtures. The corrected architecture is:
- Codex-native workspace tools (
read, write, edit, apply_patch, exec, process, update_plan) are intentionally owned by Codex native app-server behavior, not duplicated as OpenClaw dynamic tools.
- OpenClaw-owned integration tools remain dynamic
openclaw bridge tools and are valid parity rows.
- Mock provider
providerPlanToolCalls are fixture intent, not actual runtime transcript tool calls.
Product impact if OpenClaw moved fully to Codex today: P4 as originally filed. This issue does not prove a Codex runner product bug.
QA impact: P1/P0 depending on lane. False product failures would undermine the parity gate, so PR #80323 corrects the harness and report language.
Latest Beta.5 Proof
Validated on PR #80323 at OpenClaw v2026.5.10-beta.5:
PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Confidence tracker: #80936
Artifact-backed results:
{
"tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 },
"openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 },
"tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 },
"confidence-report": { "pass": true, "zeroUnknowns": true }
}
The 5 searchable skips are explicitly report-only because the mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations honestly. That is a QA/mock limitation, not a product bug claim.
What Actually Was Wrong
The old tool-defaults interpretation mixed three layers:
- Codex-native workspace behavior: must be tested by user-visible native outcomes, not by expecting duplicate OpenClaw dynamic tools.
- OpenClaw dynamic integration tools: must be tested through actual dynamic bridge calls/results.
- Provider mock plans: must be displayed as fixture intent only, never as actual runtime tool calls.
Correct Fix In PR #80323
- Adds capability/bucket metadata for runtime tool rows.
- Suppresses false hard failures for Codex-native workspace tools when the only difference is missing duplicate dynamic exposure.
- Keeps OpenClaw dynamic integration tools hard-gated in
--codex-tool-loading direct mode.
- Keeps searchable/deferred loading as a report lane until the mock provider can model deferred discovery.
- Shows
counts.skipped in qa-suite-summary.json so report-only rows are not mistaken for passes.
- Separates provider-plan tool calls from runtime transcript tool calls.
Remaining Work
Links
TLDR
Status: QA harness/mock-provider issue, not a proven broad Codex runtime tool dropout.
The original report overclaimed that Codex drops planned tool calls for most default tool fixtures. The corrected architecture is:
read,write,edit,apply_patch,exec,process,update_plan) are intentionally owned by Codex native app-server behavior, not duplicated as OpenClaw dynamic tools.openclawbridge tools and are valid parity rows.providerPlanToolCallsare fixture intent, not actual runtime transcript tool calls.Product impact if OpenClaw moved fully to Codex today: P4 as originally filed. This issue does not prove a Codex runner product bug.
QA impact: P1/P0 depending on lane. False product failures would undermine the parity gate, so PR #80323 corrects the harness and report language.
Latest Beta.5 Proof
Validated on PR #80323 at OpenClaw
v2026.5.10-beta.5:Artifact-backed results:
{ "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 }, "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 }, "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 }, "confidence-report": { "pass": true, "zeroUnknowns": true } }The 5 searchable skips are explicitly report-only because the mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations honestly. That is a QA/mock limitation, not a product bug claim.
What Actually Was Wrong
The old
tool-defaultsinterpretation mixed three layers:Correct Fix In PR #80323
--codex-tool-loading directmode.counts.skippedinqa-suite-summary.jsonso report-only rows are not mistaken for passes.Remaining Work
Links