Skip to content

QA tool-defaults suite conflates Codex-native tools with OpenClaw dynamic tool parity #80319

@100yenadmin

Description

@100yenadmin

TLDR

Status: QA harness/mock-provider issue, not a proven broad Codex runtime tool dropout.

The original report overclaimed that Codex drops planned tool calls for most default tool fixtures. The corrected architecture is:

  • Codex-native workspace tools (read, write, edit, apply_patch, exec, process, update_plan) are intentionally owned by Codex native app-server behavior, not duplicated as OpenClaw dynamic tools.
  • OpenClaw-owned integration tools remain dynamic openclaw bridge tools and are valid parity rows.
  • Mock provider providerPlanToolCalls are fixture intent, not actual runtime transcript tool calls.

Product impact if OpenClaw moved fully to Codex today: P4 as originally filed. This issue does not prove a Codex runner product bug.

QA impact: P1/P0 depending on lane. False product failures would undermine the parity gate, so PR #80323 corrects the harness and report language.

Latest Beta.5 Proof

Validated on PR #80323 at OpenClaw v2026.5.10-beta.5:

PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Confidence tracker: #80936

Artifact-backed results:

{
  "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 },
  "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 },
  "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 },
  "confidence-report": { "pass": true, "zeroUnknowns": true }
}

The 5 searchable skips are explicitly report-only because the mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations honestly. That is a QA/mock limitation, not a product bug claim.

What Actually Was Wrong

The old tool-defaults interpretation mixed three layers:

  1. Codex-native workspace behavior: must be tested by user-visible native outcomes, not by expecting duplicate OpenClaw dynamic tools.
  2. OpenClaw dynamic integration tools: must be tested through actual dynamic bridge calls/results.
  3. Provider mock plans: must be displayed as fixture intent only, never as actual runtime tool calls.

Correct Fix In PR #80323

  • Adds capability/bucket metadata for runtime tool rows.
  • Suppresses false hard failures for Codex-native workspace tools when the only difference is missing duplicate dynamic exposure.
  • Keeps OpenClaw dynamic integration tools hard-gated in --codex-tool-loading direct mode.
  • Keeps searchable/deferred loading as a report lane until the mock provider can model deferred discovery.
  • Shows counts.skipped in qa-suite-summary.json so report-only rows are not mistaken for passes.
  • Separates provider-plan tool calls from runtime transcript tool calls.

Remaining Work

Links

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions