Skip to content

[QA harness] Mock approval followthrough emits undeclared read for Codex app-server lane #80236

@100yenadmin

Description

@100yenadmin

Correction TLDR

Status: harness/mock-provider artifact, not a proven user-facing Codex app-server bug.

The original issue overclaimed this as a P1 Codex runtime problem. A higher-confidence audit shows the mock provider emits a provider-level read function call from prompt text even when the Codex app-server lane does not declare read as an OpenClaw dynamic tool. Codex intentionally owns workspace tools such as read/write/edit/exec/apply_patch natively rather than exposing them through the OpenClaw dynamic-tool bridge.

What actually breaks: the QA parity harness was comparing a malformed mock-provider plan against the Codex app-server lane. This is not enough evidence that real users lose approval-followthrough reads.

Product impact if OpenClaw moved fully to Codex today: P4 until live/native proof says otherwise. The remaining risk is live proof coverage, not a demonstrated production approval-read regression.

Latest Beta.5 Evidence

OpenClaw baseline: v2026.5.10-beta.5
PR: #80323
PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Confidence tracker: #80936

The corrected first-hour-20 mock gate now has zero hard failures:

{
  "first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 }
}

approval-turn-tool-followthrough is one of the 3 report-only rows:

mock-openai still models approval followthrough as a Pi-style read call; Codex-native approval/read behavior requires native/live proof

Correct Fix

  • Gate mock read planning on declared/available tools, or model Codex-native read through the real Codex app-server native tool protocol.
  • Keep mock provider-plan diagnostics separate from runtime transcript/tool-call evidence.
  • Reopen/escalate as a product bug only if a live/native Codex run shows approved reads fail outside this mock contract.

Superseded Original Report

The earlier reproduction and observed drift were useful for finding the harness flaw, but should not be read as proof of a Codex product/runtime bug.

Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions