Skip to content

Commit 23dc8ce

Browse files
committed
fix(qa-lab): bump parity baseline to Opus 4.7 / GPT-5.5 and lengthen approval-turn-tool-followthrough timeouts
Carries forward the surface-bump portion of #74290 (closed in favor of this slim follow-up since the parity-gate.yml workflow file the original PR also touched was retired by #74622 'ci: fold parity into QA release validation'). The mock-openai parity lanes that now live in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml` were still pinned to `anthropic/claude-opus-4-6` / `anthropic/claude-sonnet-4-6` for the baseline and `openai/gpt-5.4-alt` for the candidate alt model. That left the parity baseline one model-generation behind the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main (CHANGELOG.md:803, docs/providers/anthropic.md:108, openclaw-live-and-e2e-checks-reusable.yml:1894). The `approval-turn-tool-followthrough` scenario was using 20s/30s `liveTurnTimeoutMs` fallbacks that timed out on cold mock-gateway parity runs (the deleted `parity-gate.yml` env-var comments described exactly this scenario flake). Bumping all four turn fallbacks to 60s matches what the mock provider's `resolveTurnTimeoutMs` returns for fallbackMs (it returns the fallback unchanged) so cold starts have breathing room before the approval/follow-through chain has to complete. This PR does NOT touch: - The retired `.github/workflows/parity-gate.yml` (deleted on main by #74622) - Internal artifact directory names `gpt54`/`opus46` (cosmetic, out of scope for a slim follow-up) - The Discord QA scenario lane and the release-validation lane that intentionally pin `openai/gpt-5.4` (separate concerns) Refs #74290.
1 parent e8d63b8 commit 23dc8ce

4 files changed

Lines changed: 15 additions & 13 deletions

File tree

.github/workflows/openclaw-release-checks.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -705,11 +705,11 @@ jobs:
705705
case "${QA_PARITY_LANE}" in
706706
candidate)
707707
model="${OPENCLAW_CI_OPENAI_MODEL}"
708-
alt_model="openai/gpt-5.4-alt"
708+
alt_model="openai/gpt-5.5-alt"
709709
;;
710710
baseline)
711-
model="anthropic/claude-opus-4-6"
712-
alt_model="anthropic/claude-sonnet-4-6"
711+
model="anthropic/claude-opus-4-7"
712+
alt_model="anthropic/claude-sonnet-4-7"
713713
;;
714714
*)
715715
echo "Unknown QA parity lane: ${QA_PARITY_LANE}" >&2
@@ -779,7 +779,7 @@ jobs:
779779
--candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json \
780780
--baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \
781781
--candidate-label "${OPENCLAW_CI_OPENAI_MODEL}" \
782-
--baseline-label anthropic/claude-opus-4-6 \
782+
--baseline-label anthropic/claude-opus-4-7 \
783783
--output-dir .artifacts/qa-e2e/parity
784784
785785
- name: Upload parity artifacts

.github/workflows/qa-live-transports-convex.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -187,17 +187,17 @@ jobs:
187187
--parity-pack agentic \
188188
--concurrency "${QA_PARITY_CONCURRENCY}" \
189189
--model "${OPENCLAW_CI_OPENAI_MODEL}" \
190-
--alt-model openai/gpt-5.4-alt \
190+
--alt-model openai/gpt-5.5-alt \
191191
--output-dir .artifacts/qa-e2e/gpt54
192192
193-
- name: Run Opus 4.6 lane
193+
- name: Run Opus 4.7 lane
194194
run: |
195195
pnpm openclaw qa suite \
196196
--provider-mode mock-openai \
197197
--parity-pack agentic \
198198
--concurrency "${QA_PARITY_CONCURRENCY}" \
199-
--model anthropic/claude-opus-4-6 \
200-
--alt-model anthropic/claude-sonnet-4-6 \
199+
--model anthropic/claude-opus-4-7 \
200+
--alt-model anthropic/claude-sonnet-4-7 \
201201
--output-dir .artifacts/qa-e2e/opus46
202202
203203
- name: Generate parity report
@@ -207,7 +207,7 @@ jobs:
207207
--candidate-summary .artifacts/qa-e2e/gpt54/qa-suite-summary.json \
208208
--baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \
209209
--candidate-label "${OPENCLAW_CI_OPENAI_MODEL}" \
210-
--baseline-label anthropic/claude-opus-4-6 \
210+
--baseline-label anthropic/claude-opus-4-7 \
211211
--output-dir .artifacts/qa-e2e/parity
212212
213213
- name: Upload parity artifacts

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,8 @@ Docs: https://docs.openclaw.ai
184184

185185
### Fixes
186186

187+
- QA-lab/parity: bump the live mock-openai parity baseline from `claude-opus-4-6`/`claude-sonnet-4-6` to `claude-opus-4-7`/`claude-sonnet-4-7` and the candidate alt from `gpt-5.4-alt` to `gpt-5.5-alt` in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml`, matching the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main. Carries forward the surface-bump portion of #74290.
188+
- QA-lab/scenarios: raise the `approval-turn-tool-followthrough` per-turn fallback timeouts from 20s/30s to 60s so cold mock-gateway parity runs do not flake on the approval-turn chain. Carries forward the timeout-bump portion of #74290.
187189
- Agents/compaction: keep the recent tail after manual `/compact` when Pi returns an empty or no-op compaction summary, preventing blank checkpoints from replacing the live context.
188190
- fix(discord): gate user allowlist name resolution [AI]. (#79002) Thanks @pgondhi987.
189191
- fix(msteams): gate startup user allowlist resolution [AI]. (#79003) Thanks @pgondhi987.

qa/scenarios/runtime/approval-turn-tool-followthrough.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,14 +54,14 @@ steps:
5454
message:
5555
expr: config.preActionPrompt
5656
timeoutMs:
57-
expr: liveTurnTimeoutMs(env, 20000)
57+
expr: liveTurnTimeoutMs(env, 60000)
5858
- call: waitForOutboundMessage
5959
args:
6060
- ref: state
6161
- lambda:
6262
params: [candidate]
6363
expr: "candidate.conversation.id === 'qa-operator'"
64-
- expr: liveTurnTimeoutMs(env, 20000)
64+
- expr: liveTurnTimeoutMs(env, 60000)
6565
- set: beforeApprovalCursor
6666
value:
6767
expr: state.getSnapshot().messages.length
@@ -72,7 +72,7 @@ steps:
7272
message:
7373
expr: config.approvalPrompt
7474
timeoutMs:
75-
expr: liveTurnTimeoutMs(env, 30000)
75+
expr: liveTurnTimeoutMs(env, 60000)
7676
- set: expectedReplyAny
7777
value:
7878
expr: config.expectedReplyAny.map(normalizeLowercaseStringOrEmpty)
@@ -81,7 +81,7 @@ steps:
8181
args:
8282
- lambda:
8383
expr: "state.getSnapshot().messages.slice(beforeApprovalCursor).filter((candidate) => candidate.direction === 'outbound' && candidate.conversation.id === 'qa-operator' && expectedReplyAny.some((needle) => normalizeLowercaseStringOrEmpty(candidate.text).includes(needle))).at(-1)"
84-
- expr: liveTurnTimeoutMs(env, 20000)
84+
- expr: liveTurnTimeoutMs(env, 60000)
8585
- expr: "env.providerMode === 'mock-openai' ? 100 : 250"
8686
detailsExpr: outbound.text
8787
```

0 commit comments

Comments
 (0)