QA tool-defaults suite conflates Codex-native tools with OpenClaw dynamic tool parity

# TLDR

**Status: QA harness/mock-provider issue, not a proven broad Codex runtime tool dropout.**

The original report overclaimed that Codex drops planned tool calls for most default tool fixtures. The corrected architecture is:

- Codex-native workspace tools (`read`, `write`, `edit`, `apply_patch`, `exec`, `process`, `update_plan`) are intentionally owned by Codex native app-server behavior, not duplicated as OpenClaw dynamic tools.
- OpenClaw-owned integration tools remain dynamic `openclaw` bridge tools and are valid parity rows.
- Mock provider `providerPlanToolCalls` are fixture intent, not actual runtime transcript tool calls.

**Product impact if OpenClaw moved fully to Codex today: P4 as originally filed.** This issue does not prove a Codex runner product bug.

**QA impact: P1/P0 depending on lane.** False product failures would undermine the parity gate, so PR #80323 corrects the harness and report language.

# Latest Beta.5 Proof

Validated on PR #80323 at OpenClaw `v2026.5.10-beta.5`:

```text
PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Confidence tracker: #80936
```

Artifact-backed results:

```json
{
  "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 },
  "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 },
  "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 },
  "confidence-report": { "pass": true, "zeroUnknowns": true }
}
```

The 5 searchable skips are explicitly report-only because the mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations honestly. That is a QA/mock limitation, not a product bug claim.

# What Actually Was Wrong

The old `tool-defaults` interpretation mixed three layers:

1. **Codex-native workspace behavior**: must be tested by user-visible native outcomes, not by expecting duplicate OpenClaw dynamic tools.
2. **OpenClaw dynamic integration tools**: must be tested through actual dynamic bridge calls/results.
3. **Provider mock plans**: must be displayed as fixture intent only, never as actual runtime tool calls.

# Correct Fix In PR #80323

- Adds capability/bucket metadata for runtime tool rows.
- Suppresses false hard failures for Codex-native workspace tools when the only difference is missing duplicate dynamic exposure.
- Keeps OpenClaw dynamic integration tools hard-gated in `--codex-tool-loading direct` mode.
- Keeps searchable/deferred loading as a report lane until the mock provider can model deferred discovery.
- Shows `counts.skipped` in `qa-suite-summary.json` so report-only rows are not mistaken for passes.
- Separates provider-plan tool calls from runtime transcript tool calls.

# Remaining Work

- Keep this issue open until #80323 lands or until maintainers decide where to track searchable/deferred mock-provider fidelity.
- Do not file product bugs from these mock-only rows unless native/live Codex behavior reproduces the failure independently.

# Links

- Parent RFC/tracker: #80171
- PR: #80323
- Confidence proof: #80936
- Live/Testbox proof tracker: #80397


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QA tool-defaults suite conflates Codex-native tools with OpenClaw dynamic tool parity #80319

TLDR

Latest Beta.5 Proof

What Actually Was Wrong

Correct Fix In PR #80323

Remaining Work

Links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

QA tool-defaults suite conflates Codex-native tools with OpenClaw dynamic tool parity #80319

Description

TLDR

Latest Beta.5 Proof

What Actually Was Wrong

Correct Fix In PR #80323

Remaining Work

Links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions