fix(qa-lab): refresh parity models and approval timeout#79698
Merged
Conversation
0049f3a to
a71e6b2
Compare
Contributor
|
ClawSweeper status: review started. I am starting a fresh review of this pull request: fix(qa-lab): refresh parity models and approval timeout This is item 1/1 in the current shard. Shard 0/1. This placeholder means the worker is alive and reading the current context. I will edit this same comment with the actual review when the claws are done clicking. Crustacean status: shell secured, claws on keyboard, evidence pebbles being sorted. |
…approval-turn-tool-followthrough timeouts Carries forward the surface-bump portion of #74290 (closed in favor of this slim follow-up since the parity-gate.yml workflow file the original PR also touched was retired by #74622 'ci: fold parity into QA release validation'). The mock-openai parity lanes that now live in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml` were still pinned to `anthropic/claude-opus-4-6` / `anthropic/claude-sonnet-4-6` for the baseline and `openai/gpt-5.4-alt` for the candidate alt model. That left the parity baseline one model-generation behind the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main (CHANGELOG.md:803, docs/providers/anthropic.md:108, openclaw-live-and-e2e-checks-reusable.yml:1894). The `approval-turn-tool-followthrough` scenario was using 20s/30s `liveTurnTimeoutMs` fallbacks that timed out on cold mock-gateway parity runs (the deleted `parity-gate.yml` env-var comments described exactly this scenario flake). Bumping all four turn fallbacks to 60s matches what the mock provider's `resolveTurnTimeoutMs` returns for fallbackMs (it returns the fallback unchanged) so cold starts have breathing room before the approval/follow-through chain has to complete. This PR does NOT touch: - The retired `.github/workflows/parity-gate.yml` (deleted on main by #74622) - Internal artifact directory names `gpt54`/`opus46` (cosmetic, out of scope for a slim follow-up) - The Discord QA scenario lane and the release-validation lane that intentionally pin `openai/gpt-5.4` (separate concerns) Refs #74290.
a71e6b2 to
d7210a2
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Real behavior proof
OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1.pnpm openclaw qa suitefor both candidateopenai/gpt-5.5+openai/gpt-5.5-altand baselineanthropic/claude-opus-4-7+anthropic/claude-sonnet-4-7; then generated the focused parity report.--parity-pack agenticrun also usedopenai/gpt-5.5-altand passed the changed approval-turn scenario, while two unrelated existing scenarios timed out.Verification
OPENCLAW_BUILD_PRIVATE_QA=1 OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm buildpnpm test extensions/qa-lab/src/providers/mock-openai/server.test.ts extensions/qa-lab/src/qa-gateway-config.test.ts extensions/qa-lab/src/suite-planning.test.ts extensions/qa-lab/src/cli.runtime.test.tspnpm check:workflowspnpm check:test-typespnpm exec oxfmt --check --threads=1 .github/workflows/openclaw-release-checks.yml .github/workflows/qa-live-transports-convex.yml CHANGELOG.md qa/scenarios/runtime/approval-turn-tool-followthrough.mdgit diff --check origin/main...HEADRefs #74290 / #74262. Supersedes #79347.