[Codex×Pi parity Phase 3] Codex-plugin lifecycle harness

**Tracking parent:** #80171
**Depends on:** Phase 1 #80172

## Goal

Stress the codex-as-plugin install / update / version-pinning lifecycle that pash flagged: "codex is a plugin like anything else, so it needs to be downloaded and installed before you can use it. obviously, considering openai use codex harness be default now, this is a source of stress for me, and I want to make sure all the edge cases are covered."

This phase codifies @ai-hpc's manual 4-cell doctor-migration verification plus the additional plugin-lifecycle cells the maintainer thread surfaced.

## Scope

Six cells, automated, mock-openai mode, per-cell <60s. Live-mode variant gated to scheduled runs.

## Cells

1. **Cold install** — clean home, no codex plugin → `openclaw doctor --fix` from a config that needs codex. Assert: clear remediation message, install completes, retry succeeds, no `$` leakage to api-key path.
2. **OAuth-only with mixed-profiles** — both `openai-codex:*` and `openai:*` profiles in `auth-profiles.json` → assert codex auth picked, not the api-key path. This is the residual #78499 case (`Codex app-server auth profile "openai:media-api" must belong to provider "openai-codex" or a supported alias`).
3. **Pinned-old codex plugin + new openclaw** — codex plugin pinned to release N-1, openclaw on N → assert version mismatch detected and reported with a clear remediation hint. Sets up the regression coverage pash asked for ("pinning a certain version of the codex harness with a version of openclaw, which is another potential source of bugs").
4. **Pinned-new codex plugin + old openclaw** — same axis flipped.
5. **Codex plugin install racing first agent turn** — concurrent install + agent run → assert ordering doesn't lose tokens or produce a duplicate response. Uses deterministic ordering primitives, not timing-based assertions.
6. **Doctor migration safety (`@ai-hpc`'s 4-cell matrix)** — codify the four manual cells:
   - oauth-only host (no `OPENAI_API_KEY`) → openai-codex profile picked, codex harness used
   - mixed-profile (codex OAuth + raw openai api-key) with no pin → openai-codex still picked
   - mixed-profile + `agents.defaults.agentRuntime.id="pi"` pin → doctor strips pin, codex auto-routes
   - mixed-profile + per-agent `agents.list[main].agentRuntime.id="pi"` pin → same, doctor strips pin, codex auto-routes

## Concrete deliverables

### Code

- **New** `extensions/qa-lab/src/codex-plugin-fixture.ts` — helpers:
  ```ts
  export async function seedCodexPluginAt(version: "missing" | "current" | "head" | string, agentDir: string): Promise<void>;
  export async function snapshotCodexPluginState(agentDir: string): Promise<{ version?: string; installed: boolean }>;
  ```
- **New** `extensions/qa-lab/src/codex-plugin-lifecycle.test.ts` — one `describe` block per cell. Asserted error messages are string-matched so wording regressions are caught.
- **New** `extensions/qa-lab/src/auth-profile-fixture.ts` — helpers to seed `auth-profiles.json` to a known shape (oauth-only, apikey-only, mixed).
- **Extend** `extensions/qa-lab/src/runtime-parity.ts` — add a `pluginState` axis to per-cell capture so the cells above plug into the unified report.
- **Extend** `.github/workflows/openclaw-release-checks.yml` — add a `qa_lab_codex_lifecycle_release_checks` step that runs the six cells.

### Tests

- Each cell is its own test, deterministic, mock-mode by default.
- Live-mode variant gated to `OPENCLAW_LIVE_TEST=1` and the scheduled cron, not on every release.
- Asserted error messages: when cell 3 reports a version mismatch, the assertion is on the literal string emitted (or a regex with high specificity) so any wording drift is caught.

## Acceptance criteria

- [ ] All six cells implemented, automated, mock-openai mode, complete <60s each.
- [ ] Failure-mode error messages are asserted by string-match so wording regressions are caught.
- [ ] Live-mode variant gated to scheduled runs (`OPENCLAW_LIVE_TEST=1`), not on every PR.
- [ ] Cell 5 (install race) uses deterministic ordering primitives — no `setTimeout` / sleep-based assertions.
- [ ] @ai-hpc's manual 4-cell matrix is fully reproduced as automated cells.
- [ ] Each cell, when it fails, emits a remediation hint that is also asserted by the test (so the user-visible remediation doesn't drift).
- [ ] `pnpm check:test-types` and `pnpm exec oxlint` clean.

## Out of scope

- Token efficiency (Phase 4).
- JSONL replay (Phase 5).
- Real-customer transcript ingestion.

## References

- Tracking parent: #80171
- Phase 1: #80172
- @ai-hpc's manual matrix: maintainer thread (Yesterday)
- #78499 — Codex auth profile selection (cell 2 covers this)
- #78407 — original migration bug (cell 6 covers the fixed-on-main paths)
- #79238 — most recent runtime-policy fix (the migration safety cells must hold against this surface)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Codex×Pi parity Phase 3] Codex-plugin lifecycle harness #80174

Goal

Scope

Cells

Concrete deliverables

Code

Tests

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Codex×Pi parity Phase 3] Codex-plugin lifecycle harness #80174

Description

Goal

Scope

Cells

Concrete deliverables

Code

Tests

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions