You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parent: #80171
Related PR: #80323
Confidence proof tracker: #80936
Related plugin wrapper issue: #80365
Related harness correction: #80319
Related scheduled soak tracker: #80433
Related live token guard: #80411
TLDR
Still open. The beta.5 mock/static confidence proof for PR #80323 is green, but live-frontier token efficiency and scheduled/Testbox proof remain incomplete.
The confidence report classified live lanes as environment-blocked, not passed.
Why This Issue Exists
The runtime/prompt/tool parity harness now has artifact-backed mock/static proof across the implemented suites. That does not replace the live/Testbox proof requested by the expansion plan.
This issue tracks the remaining validation gap so the project does not accidentally treat mock-estimate token efficiency as live token truth.
Parent: #80171
Related PR: #80323
Confidence proof tracker: #80936
Related plugin wrapper issue: #80365
Related harness correction: #80319
Related scheduled soak tracker: #80433
Related live token guard: #80411
TLDR
Still open. The beta.5 mock/static confidence proof for PR #80323 is green, but live-frontier token efficiency and scheduled/Testbox proof remain incomplete.
Latest proof:
The confidence report classified live lanes as
environment-blocked, not passed.Why This Issue Exists
The runtime/prompt/tool parity harness now has artifact-backed mock/static proof across the implemented suites. That does not replace the live/Testbox proof requested by the expansion plan.
This issue tracks the remaining validation gap so the project does not accidentally treat mock-estimate token efficiency as live token truth.
Completed Beta.5 Mock/Static Proof
From run
25719383976:{ "tool-defaults-direct": { "total": 20, "passed": 20, "skipped": 0, "failed": 0 }, "openclaw-dynamic-tools-direct": { "total": 8, "passed": 8, "skipped": 0, "failed": 0 }, "tool-defaults-searchable": { "total": 20, "passed": 15, "skipped": 5, "failed": 0 }, "first-hour-20-direct": { "total": 18, "passed": 15, "skipped": 3, "failed": 0 }, "fault-injection-mock": { "total": 5, "passed": 3, "skipped": 2, "failed": 0 }, "jsonl-expanded": { "curatedTranscripts": 7, "turnsCompared": 15, "driftedTurns": 0 }, "confidence-self-test": { "pass": true, "detectedCanaries": "7/7" } }Token-efficiency artifact from the mock lane:
{ "status": "estimated", "providerMode": "mock-openai", "usageSources": ["mock-estimate"], "rows": 18, "pass": true }Remaining Proof Needed
codex-native-livewith live/OAuth credentials and attachqa-suite-summary.json.first-hour-livewith live/OAuth credentials and attachqa-suite-summary.json.usageand confirmusageSource=live-usage.soak-100in Testbox/scheduled infrastructure and attach artifacts.Guardrail
Mock-mode token efficiency must remain clearly labeled as an estimate. Do not use mock-mode token rows as live-token proof.