[QA-lab] Complete live-frontier token-efficiency and Testbox parity proof

Parent: #80171
Related PR: #80323
Confidence proof tracker: #80936
Related plugin wrapper issue: #80365
Related harness correction: #80319
Related scheduled soak tracker: #80433
Related live token guard: #80411

# TLDR

**Still open.** The beta.5 mock/static confidence proof for PR #80323 is green, but live-frontier token efficiency and scheduled/Testbox proof remain incomplete.

Latest proof:

```text
OpenClaw baseline: v2026.5.10-beta.5
PR head: 3336dec6419c9cc9a87dc7cfa6f48118ca2d838e
Remote proof run: https://github.com/electricsheephq/openclaw-local-test/actions/runs/25719383976
Strict confidence: pass=true, zeroUnknowns=true
```

The confidence report classified live lanes as `environment-blocked`, not passed.

# Why This Issue Exists

The runtime/prompt/tool parity harness now has artifact-backed mock/static proof across the implemented suites. That does not replace the live/Testbox proof requested by the expansion plan.

This issue tracks the remaining validation gap so the project does not accidentally treat mock-estimate token efficiency as live token truth.

# Completed Beta.5 Mock/Static Proof

From run `25719383976`:

```json
{
  "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 },
  "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 },
  "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 },
  "first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 },
  "fault-injection-mock": { "total": 5, "passed": 3, "skipped": 2, "failed": 0 },
  "jsonl-expanded": { "curatedTranscripts": 7, "turnsCompared": 15, "driftedTurns": 0 },
  "confidence-self-test": { "pass": true, "detectedCanaries": "7/7" }
}
```

Token-efficiency artifact from the mock lane:

```json
{
  "status": "estimated",
  "providerMode": "mock-openai",
  "usageSources": ["mock-estimate"],
  "rows": 18,
  "pass": true
}
```

# Remaining Proof Needed

- Run `codex-native-live` with live/OAuth credentials and attach `qa-suite-summary.json`.
- Run `first-hour-live` with live/OAuth credentials and attach `qa-suite-summary.json`.
- Generate live token-efficiency from assistant-message `usage` and confirm `usageSource=live-usage`.
- Run or schedule `soak-100` in Testbox/scheduled infrastructure and attach artifacts.
- Keep #80411 open until failed live zero-usage runs cannot masquerade as valid token-efficiency passes.

# Guardrail

Mock-mode token efficiency must remain clearly labeled as an estimate. Do not use mock-mode token rows as live-token proof.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QA-lab] Complete live-frontier token-efficiency and Testbox parity proof #80397

TLDR

Why This Issue Exists

Completed Beta.5 Mock/Static Proof

Remaining Proof Needed

Guardrail

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[QA-lab] Complete live-frontier token-efficiency and Testbox parity proof #80397

Description

TLDR

Why This Issue Exists

Completed Beta.5 Mock/Static Proof

Remaining Proof Needed

Guardrail

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions