[Codex×Pi parity Phase 2] Per-tool fixture set

**Tracking parent:** #80171
**Depends on:** Phase 1 #80172

## Goal

Build a deterministic per-tool fixture set so the runtime-parity harness can surface "tool X breaks under codex" at the tool granularity, not just session-level. This is the deliverable Eva called out: "test all tools and long runs in harness to get to 100% parity and use to debug all the edge cases."

## Scope

One fixture per tool family. Each fixture is deterministic: the prompt forces exactly one tool call with predictable arguments. The harness asserts the tool was invoked, completed, and result shape matches between runtimes.

## Tool families to cover

(Source: `src/agents/pi-tools.create-openclaw-coding-tools.ts` and Codex harness contract — finalise the list in the PR by reading both surfaces.)

- `bash` — `bash echo hello`
- `exec` — approval-required `exec "ls -la /tmp"` flow
- `fs.read`, `fs.write`, `fs.list` — read/write/list a temp file
- `grep` — grep for a literal in a fixture file
- `edit` / `apply-patch` — apply a small unified diff
- `web_search` — search for a fixed query (mock provider returns fixed results)
- `web_fetch` — fetch a fixed URL (mock provider returns fixed body)
- `tavily_search`, `tavily_extract`
- `image_generate` — generate against the qa-lab mock image provider
- `tts` — synth a fixed phrase against the mock TTS provider
- `message-tool` — `message-tool send` to a mock channel; media variant
- `session_status`, `sessions_spawn`
- `memory.recall`, `memory.add` (if pi-only, mark as expected drift with a known-broken marker)
- `skill_*` invocations

For each tool family, also one fixture for the failure mode (denied input, oversized payload, etc.) so error-path drift is captured.

## Concrete deliverables

### Fixtures

- `qa/scenarios/runtime/tools/<tool>.md` — one file per family. Reuse the existing scenario format already used by `approval-turn-tool-followthrough.md`.
- Each fixture exports both a happy-path and a failure-path scenario.

### Code

- **Extend** `extensions/qa-lab/src/runtime-parity.ts` (from Phase 1) — add `toolBreakdown` field to the report so per-tool drift surfaces alongside per-scenario drift.
- **New** `extensions/qa-lab/src/tool-coverage-report.ts` — generates a Markdown coverage table:
  ```
  | tool | pi | codex | drift | tracking |
  |------|----|-------|-------|----------|
  | bash | ✅  | ✅     | none  |          |
  | exec | ✅  | ❌     | tool-result-shape | #issue |
  ```
- **Extend** `extensions/qa-lab/src/cli.ts` — new `qa tool-coverage --runtime-pair pi,codex` command.

### Tests

- Each fixture has a self-test running it through the mock provider on both runtimes (no qa-lab harness dependency for the self-test — keeps fixtures portable).
- Coverage report rendering test.

## Acceptance criteria

- [ ] Each tool family in the list above has a `qa/scenarios/runtime/tools/<tool>.md` fixture.
- [ ] Each fixture passes both cells under `--runtime-pair pi,codex` against current main, OR is annotated with a `known-broken` marker pointing at a tracking issue (file the tracking issue as part of this PR if discovered).
- [ ] The runtime-parity report enumerates per-tool drift, not just per-scenario drift.
- [ ] `pnpm openclaw qa tool-coverage --runtime-pair pi,codex` produces a Markdown table suitable for the README of the harness.
- [ ] `pnpm check:test-types` and `pnpm exec oxlint` clean.

## Out of scope

- Plugin-lifecycle stress (Phase 3).
- Token efficiency (Phase 4).
- Live-mode runs — fixtures must be hermetic in this PR.

## References

- Tracking parent: #80171
- Phase 1: #80172
- Existing scenario format: `qa/scenarios/runtime/approval-turn-tool-followthrough.md`
- Tool surface (Pi): `src/agents/pi-tools.create-openclaw-coding-tools.ts`
- Tool surface (Codex): codex harness contract — see `extensions/codex/src/`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Codex×Pi parity Phase 2] Per-tool fixture set #80173

Goal

Scope

Tool families to cover

Concrete deliverables

Fixtures

Code

Tests

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Codex×Pi parity Phase 2] Per-tool fixture set #80173

Description

Goal

Scope

Tool families to cover

Concrete deliverables

Fixtures

Code

Tests

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions