Skip to content

[Codex×Pi parity Phase 3] Codex-plugin lifecycle harness #80174

@100yenadmin

Description

@100yenadmin

Tracking parent: #80171
Depends on: Phase 1 #80172

Goal

Stress the codex-as-plugin install / update / version-pinning lifecycle that pash flagged: "codex is a plugin like anything else, so it needs to be downloaded and installed before you can use it. obviously, considering openai use codex harness be default now, this is a source of stress for me, and I want to make sure all the edge cases are covered."

This phase codifies @ai-hpc's manual 4-cell doctor-migration verification plus the additional plugin-lifecycle cells the maintainer thread surfaced.

Scope

Six cells, automated, mock-openai mode, per-cell <60s. Live-mode variant gated to scheduled runs.

Cells

  1. Cold install — clean home, no codex plugin → openclaw doctor --fix from a config that needs codex. Assert: clear remediation message, install completes, retry succeeds, no $ leakage to api-key path.
  2. OAuth-only with mixed-profiles — both openai-codex:* and openai:* profiles in auth-profiles.json → assert codex auth picked, not the api-key path. This is the residual [Bug]: doctor --fix rewrites Codex runtime model refs to openai/* and breaks Codex auth profile selection #78499 case (Codex app-server auth profile "openai:media-api" must belong to provider "openai-codex" or a supported alias).
  3. Pinned-old codex plugin + new openclaw — codex plugin pinned to release N-1, openclaw on N → assert version mismatch detected and reported with a clear remediation hint. Sets up the regression coverage pash asked for ("pinning a certain version of the codex harness with a version of openclaw, which is another potential source of bugs").
  4. Pinned-new codex plugin + old openclaw — same axis flipped.
  5. Codex plugin install racing first agent turn — concurrent install + agent run → assert ordering doesn't lose tokens or produce a duplicate response. Uses deterministic ordering primitives, not timing-based assertions.
  6. Doctor migration safety (@ai-hpc's 4-cell matrix) — codify the four manual cells:
    • oauth-only host (no OPENAI_API_KEY) → openai-codex profile picked, codex harness used
    • mixed-profile (codex OAuth + raw openai api-key) with no pin → openai-codex still picked
    • mixed-profile + agents.defaults.agentRuntime.id="pi" pin → doctor strips pin, codex auto-routes
    • mixed-profile + per-agent agents.list[main].agentRuntime.id="pi" pin → same, doctor strips pin, codex auto-routes

Concrete deliverables

Code

  • New extensions/qa-lab/src/codex-plugin-fixture.ts — helpers:
    export async function seedCodexPluginAt(version: "missing" | "current" | "head" | string, agentDir: string): Promise<void>;
    export async function snapshotCodexPluginState(agentDir: string): Promise<{ version?: string; installed: boolean }>;
  • New extensions/qa-lab/src/codex-plugin-lifecycle.test.ts — one describe block per cell. Asserted error messages are string-matched so wording regressions are caught.
  • New extensions/qa-lab/src/auth-profile-fixture.ts — helpers to seed auth-profiles.json to a known shape (oauth-only, apikey-only, mixed).
  • Extend extensions/qa-lab/src/runtime-parity.ts — add a pluginState axis to per-cell capture so the cells above plug into the unified report.
  • Extend .github/workflows/openclaw-release-checks.yml — add a qa_lab_codex_lifecycle_release_checks step that runs the six cells.

Tests

  • Each cell is its own test, deterministic, mock-mode by default.
  • Live-mode variant gated to OPENCLAW_LIVE_TEST=1 and the scheduled cron, not on every release.
  • Asserted error messages: when cell 3 reports a version mismatch, the assertion is on the literal string emitted (or a regex with high specificity) so any wording drift is caught.

Acceptance criteria

  • All six cells implemented, automated, mock-openai mode, complete <60s each.
  • Failure-mode error messages are asserted by string-match so wording regressions are caught.
  • Live-mode variant gated to scheduled runs (OPENCLAW_LIVE_TEST=1), not on every PR.
  • Cell 5 (install race) uses deterministic ordering primitives — no setTimeout / sleep-based assertions.
  • @ai-hpc's manual 4-cell matrix is fully reproduced as automated cells.
  • Each cell, when it fails, emits a remediation hint that is also asserted by the test (so the user-visible remediation doesn't drift).
  • pnpm check:test-types and pnpm exec oxlint clean.

Out of scope

  • Token efficiency (Phase 4).
  • JSONL replay (Phase 5).
  • Real-customer transcript ingestion.

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions