[CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate

## Summary

Propose a sibling QA gate to the existing model-parity gate (introduced in #74290, later folded into `openclaw-release-checks.yml` / `full-release-validation.yml` by #74622) that catches a different class of regression: **silent drift between the two paths to the same logical model**, and between **runtime harnesses for the same model+provider**.

This gate would have caught — or made trivially diagnosable — every issue in the cluster around #78055, including the doctor config-rewrite regression filed as #78407.

## Motivation

The existing parity gate compares **two different models**:
- candidate `openai/gpt-5.5-alt` vs baseline `anthropic/claude-opus-4-7`

That answers a *product* question (do GPT-5.5 and Opus 4.7 give equivalent answers for a user choosing between them). It does not exercise the surfaces that have produced the recent run of regressions:

1. **#78055 family** (#78147, #78146, #78142) — stale `response.completed` lineage on the `openai-codex` WebSocket transport. The same prompt routed through raw `openai` HTTP would have produced a divergent (correct) trajectory; a same-model-different-provider parity gate would have flagged the WS-only stale-final replay immediately.
2. **#78407** (doctor `--fix` rewrites `openai-codex/*` → `openai/*` on update) — config-migration silently flipped half the install from one transport to the other. A provider-parity gate would have failed when the post-doctor config produced a different (failing) auth resolution than the pre-doctor config for identical scenario inputs.
3. **#78060** (subagent thread-bound spawns implicitly forking requester history) — the implicit-fork path differs between `pi` native runtime and the `codex` CLI subprocess harness; a runtime-parity gate over the same scenarios would have surfaced the inconsistency.

#77221 (CLI tool-vs-subcommand error message) is in a different test family and is not in scope here.

## Proposed scope

A new gate, structured as a matrix in `extensions/qa-lab/`, asserting equivalence across two axes for the same scenario inputs already used by the existing character-eval / agentic-parity suites:

```
fixtures × ( openai-api-http × openai-codex-ws ) × ( pi × codex )
```

- **Axis 1 — Provider parity (same model, different transport):** `openai/gpt-5.5` vs `openai-codex/gpt-5.5`. Same logical model, different auth surface, different request shape, different lineage code (HTTP vs WS, no `previous_response_id` vs `previous_response_id`-based incremental). Any divergence beyond a published tolerance is a bug.
- **Axis 2 — Runtime parity (same model+provider, different harness):** `pi` native runtime vs `codex` CLI subprocess. Different tool-loop, different streaming surface, different memory wiring. Any divergence is a bug in one of them.

Cell assertions per scenario:
- Final answer text equivalent (within the existing parity-report tolerance).
- No errors from the gateway boot or run paths.
- No stale-finalization markers in the trajectory (#78055-class).
- Auth resolution succeeds against the configured `auth-profiles.json` (catches #78407-class config corruption).

## Implementation sketch

Reuse the qa-lab primitives that already exist in this clone:

- `extensions/qa-lab/src/providers/mock-openai/server.ts` — already extended in #74290; add a second profile variant exposing the openai-codex Responses surface.
- `extensions/qa-lab/src/providers/shared/mock-model-config.ts` — add `openai-codex/gpt-5.5` alongside the existing `openai/gpt-5.5-alt` entry.
- `extensions/qa-lab/src/qa-gateway-config.test.ts` — extend the gateway-boot test pattern with the four-cell matrix.
- New `extensions/qa-lab/src/transport-parity.ts` + `transport-parity.test.ts` — orchestrator that runs the matrix per fixture and produces a parity-report-style summary.
- New `extensions/qa-lab/src/runtime-parity.ts` — codex-CLI sandbox (mirror the pattern in `qa-live-transports-convex.yml` for transport sandboxing).

CI wiring: add a step in `openclaw-release-checks.yml` (the home that #74622 folded the parity gate into), gated behind the same `OPENCLAW_BUILD_PRIVATE_QA=1` build flag the existing parity tests use.

## Concrete starter (would also close #78407 as a side-effect)

A narrow first slice — fixture-replay regression for the doctor flow — can land independently of the broader matrix and is the smallest unit of value:

- New `src/commands/doctor-config-flow.codex-model-ref-preservation.test.ts` (sibling to the existing `doctor-config-flow.missing-default-account-bindings.test.ts`).
- Fixture config with `openai-codex/{gpt-5.4,gpt-5.4-mini,gpt-5.4-pro,gpt-5.5,gpt-5.5-pro}` across `agents.defaults.modelOverride.{primary,fallbacks}`, `agents.modelCatalog`, and per-agent + per-channel `modelOverride` blocks (mirrors the 5-location footprint observed in #78407).
- Fixture `auth-profiles.json` containing only `openai-codex:*` and `anthropic:*` (no raw `openai:*`).
- Run the full doctor `--fix` normalize pass.
- Invariants:
  - No `openai-codex/*` ref is rewritten to `openai/*`.
  - No rewritten ref points to a provider absent from `auth-profiles.json` (general invariant — applies to any future migration too).
  - No rewritten ref points to a model id absent from the post-migration `modelCatalog` (catches the lost `openai-codex/gpt-5.4-pro` ghost in #78407).

I'm planning to open an umbrella draft PR that adds at least the doctor-flow fixture-replay test (failing, reproducing #78407) and lays out the transport-parity scaffolding as TODOs the maintainers can flesh out — happy to split into smaller PRs if the maintainer prefers per-axis review.

## Related

- Sibling to: #74290 (existing model-parity gate), #74622 (parity gate folded into release validation)
- Motivating bugs: #78407 (doctor config rewrite), #78055 + #78147 + #78146 + #78142 (WS stale final lineage), #78060 (subagent isolation)
- Out of scope: #77221 (CLI tool-vs-subcommand error message — different test family)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457

Summary

Motivation

Proposed scope

Implementation sketch

Concrete starter (would also close #78407 as a side-effect)

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457

Description

Summary

Motivation

Proposed scope

Implementation sketch

Concrete starter (would also close #78407 as a side-effect)

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions