fix(qa-lab): refresh parity models and approval timeout by steipete · Pull Request #79698 · openclaw/openclaw

steipete · 2026-05-09T07:09:22Z

Summary

Supersedes fix(qa-lab): bump parity baseline to Opus 4.7 / GPT-5.5 and lengthen approval-followthrough timeouts #79347 with the same QA parity model bumps rebased onto current main.
Updates release/nightly mock-openai parity lanes from Opus 4.6/GPT-5.4-alt to Opus 4.7/GPT-5.5-alt.
Raises approval-turn-tool-followthrough mock fallback timeouts to 60s and credits @100yenadmin in the changelog.

Real behavior proof

Behavior addressed: QA parity workflows still used older Opus 4.6 / GPT-5.4-alt labels, and the approval-turn-tool-followthrough scenario had short 20s/30s mock fallback timeouts.
Real environment tested: local OpenClaw checkout with private QA CLI enabled, built with OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1.
Exact steps or command run after this patch: ran the changed approval-turn scenario through pnpm openclaw qa suite for both candidate openai/gpt-5.5 + openai/gpt-5.5-alt and baseline anthropic/claude-opus-4-7 + anthropic/claude-sonnet-4-7; then generated the focused parity report.
Evidence after fix: terminal output from the patched QA CLI runs:

$ OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm openclaw qa suite --provider-mode mock-openai --scenario approval-turn-tool-followthrough --concurrency 1 --model openai/gpt-5.5 --alt-model openai/gpt-5.5-alt --output-dir .artifacts/qa-e2e/pr79347-approval-gpt55
passed 1/1

$ OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm openclaw qa suite --provider-mode mock-openai --scenario approval-turn-tool-followthrough --concurrency 1 --model anthropic/claude-opus-4-7 --alt-model anthropic/claude-sonnet-4-7 --output-dir .artifacts/qa-e2e/pr79347-approval-opus47
passed 1/1

Observed result after fix: changed scenario passed with the new GPT-5.5-alt candidate ref and Opus 4.7/Sonnet 4.7 baseline refs; focused parity metrics were 100%/100%. The full candidate --parity-pack agentic run also used openai/gpt-5.5-alt and passed the changed approval-turn scenario, while two unrelated existing scenarios timed out.
What was not tested: full parity-report pass for the whole pack; focused one-scenario parity report exits nonzero because it intentionally lacks full-pack coverage.

Verification

OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm build
pnpm test extensions/qa-lab/src/providers/mock-openai/server.test.ts extensions/qa-lab/src/qa-gateway-config.test.ts extensions/qa-lab/src/suite-planning.test.ts extensions/qa-lab/src/cli.runtime.test.ts
pnpm check:workflows
pnpm check:test-types
pnpm exec oxfmt --check --threads=1 .github/workflows/openclaw-release-checks.yml .github/workflows/qa-live-transports-convex.yml CHANGELOG.md qa/scenarios/runtime/approval-turn-tool-followthrough.md
git diff --check origin/main...HEAD

Refs #74290 / #74262. Supersedes #79347.

clawsweeper · 2026-05-09T07:14:47Z

ClawSweeper status: review started.

I am starting a fresh review of this pull request: fix(qa-lab): refresh parity models and approval timeout This is item 1/1 in the current shard. Shard 0/1.

This placeholder means the worker is alive and reading the current context. I will edit this same comment with the actual review when the claws are done clicking.

Crustacean status: shell secured, claws on keyboard, evidence pebbles being sorted.

…approval-turn-tool-followthrough timeouts Carries forward the surface-bump portion of #74290 (closed in favor of this slim follow-up since the parity-gate.yml workflow file the original PR also touched was retired by #74622 'ci: fold parity into QA release validation'). The mock-openai parity lanes that now live in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml` were still pinned to `anthropic/claude-opus-4-6` / `anthropic/claude-sonnet-4-6` for the baseline and `openai/gpt-5.4-alt` for the candidate alt model. That left the parity baseline one model-generation behind the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main (CHANGELOG.md:803, docs/providers/anthropic.md:108, openclaw-live-and-e2e-checks-reusable.yml:1894). The `approval-turn-tool-followthrough` scenario was using 20s/30s `liveTurnTimeoutMs` fallbacks that timed out on cold mock-gateway parity runs (the deleted `parity-gate.yml` env-var comments described exactly this scenario flake). Bumping all four turn fallbacks to 60s matches what the mock provider's `resolveTurnTimeoutMs` returns for fallbackMs (it returns the fallback unchanged) so cold starts have breathing room before the approval/follow-through chain has to complete. This PR does NOT touch: - The retired `.github/workflows/parity-gate.yml` (deleted on main by #74622) - Internal artifact directory names `gpt54`/`opus46` (cosmetic, out of scope for a slim follow-up) - The Discord QA scenario lane and the release-validation lane that intentionally pin `openai/gpt-5.4` (separate concerns) Refs #74290.

openclaw-barnacle Bot added size: XS maintainer Maintainer-authored PR labels May 9, 2026

steipete force-pushed the maint/79347-qa-parity-opus47-gpt55 branch from 0049f3a to a71e6b2 Compare May 9, 2026 07:14

openclaw-barnacle Bot added the channel: qa-channel Channel integration: qa-channel label May 9, 2026

steipete force-pushed the maint/79347-qa-parity-opus47-gpt55 branch from a71e6b2 to d7210a2 Compare May 9, 2026 07:15

openclaw-barnacle Bot removed the channel: qa-channel Channel integration: qa-channel label May 9, 2026

steipete merged commit 44d7d6f into main May 9, 2026
96 of 98 checks passed

steipete deleted the maint/79347-qa-parity-opus47-gpt55 branch May 9, 2026 07:22

steipete mentioned this pull request May 9, 2026

fix(qa-lab): bump parity baseline to Opus 4.7 / GPT-5.5 and lengthen approval-followthrough timeouts #79347

Closed

4 tasks

clawsweeper Bot mentioned this pull request May 13, 2026

Update QA lab parity gate for GPT-5.5 vs Opus 4.7 and harden preflight #74262

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(qa-lab): refresh parity models and approval timeout#79698

fix(qa-lab): refresh parity models and approval timeout#79698
steipete merged 1 commit into
mainfrom
maint/79347-qa-parity-opus47-gpt55

steipete commented May 9, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

steipete commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Real behavior proof

Uh oh!

clawsweeper Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steipete commented May 9, 2026 •

edited

Loading