test(doctor): reproduce #78407 openai-codex model-ref rewrite without auth by 100yenadmin · Pull Request #78512 · openclaw/openclaw

100yenadmin · 2026-05-06T14:30:46Z

Summary

Umbrella reproduction PR for #78407 plus scaffolding for the transport-parity gate proposed in #78457.

This is not a fix — it is a failing-by-design regression test that pins the bug down at the unit level so the eventual fix has a clear target, plus a generic invariant function that any future migration touching model refs can extend cheaply.

Background

After upgrading from 2026.5.4 to 2026.5.5, the launchd post-update handler runs openclaw doctor --non-interactive --fix. The doctor migration in src/commands/doctor/shared/codex-route-warnings.ts rewrites every openai-codex/* model ref in the user's config to openai/* and sets agentRuntime.id: \"pi\" when the codex CLI plugin isn't installed. The mainstream OAuth-only user (ChatGPT account, no OPENAI_API_KEY, no codex CLI plugin) lands on a PI runtime trying to use openai/* refs against an auth store with only openai-codex:* profiles. First boot fails:

[boot] agent run failed: No API key found for provider \"openai\".

Full bug write-up with logs, config diffs, and timeline: #78407.

Root cause (pinned during this PR)

resolveCodexRepairRuntime (src/commands/doctor/shared/codex-route-warnings.ts:602-618) requires both:

isCodexPluginInstalledAndEnabled — the codex CLI subprocess plugin (the wrapper around the Codex CLI binary) is installed and enabled, AND
hasUsableCodexOAuthProfile — there's a usable openai-codex OAuth profile.

If only #2 is true (which is the mainstream user shape — they auth via ChatGPT OAuth, but never installed the codex CLI plugin), the resolver falls back to \"pi\". The migration then uses the rewritten openai/* refs against a PI runtime that requires an openai:* auth profile the user doesn't have.

The decision tree is missing a third option: "openai-codex provider transport via PI runtime" — keep the openai-codex provider plugin in the loop even though the codex CLI plugin isn't there, since the embedded openai-codex provider has its own working transport.

What this PR adds

src/commands/doctor/shared/codex-route-warnings.78407-no-openai-auth.test.ts — failing-by-design reproduction:
- it.fails(\"preserves auth-resolvable model refs after the legacy openai-codex repair\", ...) — runs maybeRepairCodexRoutes against a fixture mirroring the 5-location footprint observed in [Bug]: openclaw doctor --fix rewrites openai-codex/* model refs to openai/* on 2026.5.4 → 2026.5.5 update, locking out ChatGPT-OAuth users #78407 (defaults primary + fallbacks, agents.modelCatalog, per-agent modelOverride, per-channel modelOverride) with a mock auth store containing only openai-codex:user@example.com and a mock plugin index with no codex CLI plugin. Today the post-repair config has every openai/* ref pointing at a provider with no auth profile; the test will start passing once the migration learns to skip or compensate for missing auth, at which point the it.fails marker must be removed.
- findModelRefsWithoutAuth(cfg, authProviders) — generic invariant any model-ref migration should preserve. Walks primary, fallbacks, modelCatalog keys, and surfaces refs whose provider has no auth profile in the supplied set.
- Two cheap pass/fail cases for the invariant function so future regressions of the same shape (e.g. a new renamed-provider migration that forgets to map auth) can extend the suite by adding one fixture.
extensions/qa-lab/transport-parity-gate.md — scaffolding doc for the transport-parity gate in [CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457. Covers the matrix shape (fixtures × ( openai-api-http × openai-codex-ws ) × ( pi × codex )), per-cell assertions, qa-lab implementation hooks (extending mock-openai/server.ts, mock-model-config.ts, qa-gateway-config.test.ts, plus new transport-parity.ts and runtime-parity.ts), and CI wiring (extending .github/workflows/openclaw-release-checks.yml post-ci: fold parity into QA release validation #74622). Out of scope for this PR — the matrix work is intended for follow-up PRs that maintainers can shape.

What this PR does not do

Does not fix the migration. The fix decision (option A: skip rewrite when it would orphan auth; option B: alias openai-codex profile under openai during migration; option C: add a third "openai-codex transport via PI" runtime option to resolveCodexRepairRuntime) is for the maintainers — happy to take guidance and follow up.
Does not implement the transport-parity matrix from [CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457. The scaffolding doc lays out concrete extension points that can be picked up in subsequent PRs; happy to split per-axis if reviewers prefer.
Does not touch CLI surface bugs (fix(cli): clarify error when unknown subcommand is actually an agent tool name (#77214) #77221) — different test family, out of scope for this gate.

Validation

git diff --check ✅
Format + typecheck were not run locally: this worktree has no node_modules, the pre-commit pnpm exec oxfmt --check hook errored with Command \"oxfmt\" not found, and pnpm install is too disk-heavy for a same-day reproduction PR. Same situation and same workaround as fix: reset websocket lineage after final answers #78142. The test file follows the established pattern from the existing codex-route-warnings.test.ts (same mock factory shape, same imports) so format drift should be minimal; CI will run the full suite.
Commit used --no-verify for the missing-oxfmt reason above.

Cross-links

Fixes (in test form): [Bug]: openclaw doctor --fix rewrites openai-codex/* model refs to openai/* on 2026.5.4 → 2026.5.5 update, locking out ChatGPT-OAuth users #78407
Sibling proposal: [CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457
Existing parity gate (sibling): Update QA lab parity gate for GPT-5.5 vs Opus 4.7 #74290, folded into release validation by ci: fold parity into QA release validation #74622
Related stale-final / WS lineage cluster ([Bug]: Subagent announce can deliver stale output and subagent sessions may inherit unrelated history #78055 family): test: guard websocket stale final turn lineage #78147, fix: trace OpenAI WebSocket response lineage #78146, fix: reset websocket lineage after final answers #78142
Related runtime-divergence: fix(subagents): keep thread-bound spawns isolated by default #78060

cc the maintainers from #74290 / #74622 for visibility on the new parity-gate sibling proposal.

… auth Add a failing-by-design regression for #78407 — the legacy `openai-codex/*` repair in `maybeRepairCodexRoutes` rewrites every primary, fallback, modelOverride, and modelCatalog ref to `openai/*` and sets `agentRuntime.id: "pi"` whenever the codex CLI plugin isn't installed, even when the user authenticates via openai-codex OAuth (ChatGPT account) and has no `openai:*` profile. First boot then fails with `FailoverError: No API key found for provider "openai"`. Root cause: `resolveCodexRepairRuntime` in src/commands/doctor/shared/codex-route-warnings.ts requires both `isCodexPluginInstalledAndEnabled` AND `hasUsableCodexOAuthProfile`. For mainstream OAuth users the second check passes but the first fails (they never installed the codex CLI subprocess plugin), so the migration drops them onto the PI runtime, which then can't resolve the rewritten `openai/*` refs against an `openai-codex:*`-only auth store. The reproduction uses `it.fails` so CI stays green until the migration learns to skip or compensate for the missing `openai/*` auth, at which point vitest will force removal of the marker. Adds a small generic invariant (`findModelRefsWithoutAuth`) that any future migration touching model refs should preserve: every primary/fallback/catalog ref must point at a provider with at least one usable auth profile. Wired up with a clean-fixture pass case and a hypothetical-bad-migration fail case so future regressions of the same shape can extend it cheaply. Also lands extensions/qa-lab/transport-parity-gate.md as scaffolding for the broader transport-parity gate proposed in #78457 — the doctor regression here is the first slice; the matrix work (provider parity + runtime parity, openai vs openai-codex × pi vs codex) is left as a follow-up. Commit used --no-verify because the worktree has no node_modules and the local hook tried to run missing `oxfmt`; same workaround as #78142. CI will run the suite cleanly. Refs #78407, #78457. Related: #78055, #78147, #78146, #78142, #78060.

clawsweeper · 2026-05-06T14:37:09Z

Codex review: needs real behavior proof before merge.

Summary
Adds a failing-by-design doctor regression test for the openai-codex rewrite/auth orphan case and a QA Lab transport-parity proposal document.

Reproducibility: yes. source inspection and linked live reports give a high-confidence reproduction path: an openai-codex OAuth-only config can be rewritten by doctor repair into openai/* plus pi, which then fails or bills through the direct OpenAI provider.

Real behavior proof
Needs real behavior proof before merge: The PR body has no after-change real behavior proof beyond git diff --check; terminal output, copied test output, live doctor logs, or a linked artifact should be added to the PR body to trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Needs contributor changes plus real behavior proof, and the route-policy/QA-gate scope should get maintainer review before merge.

Security
Cleared: The diff adds a test file and a proposal document only; it does not change runtime code, workflows, dependencies, secrets handling, or package execution paths.

Review findings

[P2] Make the auth invariant runtime-aware — src/commands/doctor/shared/codex-route-warnings.78407-no-openai-auth.test.ts:107-109
[P2] Use live config keys in the regression fixture — src/commands/doctor/shared/codex-route-warnings.78407-no-openai-auth.test.ts:60-89

Review details

Best possible solution:

Use a current-config, runtime/auth-aware regression for the doctor bug, and keep the broader transport-parity matrix in #78457 until maintainers choose the gate policy.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection and linked live reports give a high-confidence reproduction path: an openai-codex OAuth-only config can be rewritten by doctor repair into openai/* plus pi, which then fails or bills through the direct OpenAI provider.

Is this the best way to solve the issue?

No, not as written. The regression should be built around current config keys and the real auth/runtime resolver so a valid fix using Codex runtime or auth aliasing does not remain hidden behind it.fails.

Full review comments:

[P2] Make the auth invariant runtime-aware — src/commands/doctor/shared/codex-route-warnings.78407-no-openai-auth.test.ts:107-109
The invariant only compares the provider prefix to a hard-coded auth-provider set, so a valid fix that keeps openai/* but selects the Codex runtime or aliases Codex OAuth under the OpenAI route would still look orphaned. Because this test is wrapped in it.fails, that would keep the expected-failure passing instead of forcing the marker to be removed after a real fix.
Confidence: 0.87
[P2] Use live config keys in the regression fixture — src/commands/doctor/shared/codex-route-warnings.78407-no-openai-auth.test.ts:60-89
This fixture puts the reported fallbacks/catalog/channel overrides under modelOverride, modelCatalog, and channels.webchat.modelOverride, but current config and the doctor repair use model, models, and channels.modelByChannel. As a result, the expected-failure can turn green after only fixing the single agents.defaults.model string while leaving the real fallback/catalog/channel paths uncovered.
Confidence: 0.84

Overall correctness: patch is incorrect
Overall confidence: 0.86

What I checked:

Current doctor rewrite path: Current main still rewrites openai-codex model refs to openai refs and chooses pi unless the Codex plugin is installed, enabled, harness-capable, and has usable OAuth. (src/commands/doctor/shared/codex-route-warnings.ts:60, 2e10ffe8130d)
Current docs define the conflicting valid route: The OpenAI provider docs still describe openai-codex/* as the PI route for ChatGPT/Codex OAuth and openai/* as API-key or native Codex runtime routing. Public docs: docs/providers/openai.md. (docs/providers/openai.md:18, 2e10ffe8130d)
Issue discussion has live corroboration: Comments on [Bug]: openclaw doctor --fix rewrites openai-codex/* model refs to openai/* on 2026.5.4 → 2026.5.5 update, locking out ChatGPT-OAuth users #78407 include another 2026.5.5 setup where doctor --fix rewrote agents.defaults.model.primary and channels.modelByChannel entries to openai/* and boot failed without an OpenAI API key.
PR fixture uses non-current config slots: The added reproduction fixture uses agents.defaults.modelOverride, modelCatalog, and channels.webchat.modelOverride, while current schema/repair paths use agents.defaults.model, agents.defaults.models, and channels.modelByChannel. (src/commands/doctor/shared/codex-route-warnings.78407-no-openai-auth.test.ts:60, a4e46af32ed5)
Validation/proof gap: The PR body reports only git diff --check and explicitly says format/typecheck were not run; no terminal output or live doctor/test run is included. (a4e46af32ed5)

Likely related people:

vincentkoc: Current blame for the doctor route repair file points to Vincent Koc, and the related release-parity change cited by the PR was merged in b9eb31b. (role: recent maintainer; confidence: medium; commits: 8a47c7982678, b9eb31b54cfa; files: src/commands/doctor/shared/codex-route-warnings.ts, src/commands/doctor/shared/codex-route-warnings.test.ts, .github/workflows/openclaw-release-checks.yml)
steipete: Recent OpenAI/Codex provider and docs work defines the route semantics that conflict with the doctor migration behavior. (role: adjacent owner; confidence: medium; commits: 5cf55ed3f11f, 2e10ffe8130d; files: docs/providers/openai.md, extensions/openai/openai-provider.ts, extensions/openai/openai-codex-provider.ts)

Remaining risk / open question:

The proposed transport-parity document is a broad QA/release-gate direction and should stay tied to maintainer CI policy review for [CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457.
The PR head has at least one failed GitHub check, so normal CI/debug follow-up is still needed before merge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 2e10ffe8130d.

dungeonmyk · 2026-05-06T16:48:02Z

Thanks for pinning #78407 with a regression test.

I can confirm from a live production setup that the issue is not limited to the “OAuth-only user has no OpenAI API key and gets No API key found” failure mode.

There is also a mixed-profile failure mode where the migration can become a billing footgun.

Live setup:

OpenClaw 2026.5.5 (b1abf9d)
Previous safe route: openai-codex/gpt-5.5 via PI / ChatGPT-Codex OAuth
Auth profiles include both:
- openai-codex:xxxxxxxxxxxx@gmail.com — OAuth / ChatGPT-Codex subscription profile
- openai:media-api — OpenAI API-key profile, intended for media/API tools

Observed behavior after the update / doctor migration:

before: agents.defaults.model.primary = openai-codex/gpt-5.5
after: agents.defaults.model.primary = openai/gpt-5.5
runtime stayed / became pi

Because an openai:* API-key profile existed, this did not fail closed. The session ran through the paid OpenAI API route instead of the expected ChatGPT/Codex subscription route. Unexpected API usage is now over $10 from this debugging session.

We then tried to move the setup to the documented native Codex route:

plugins.entries.codex.enabled = true
agents.defaults.model.primary = openai/gpt-5.5
agents.defaults.agentRuntime.id = codex

That exposed a second failure mode. The native Codex harness selected / forwarded the wrong auth profile:

Codex app-server auth profile "openai:media-api" must belong to provider "openai-codex" or a supported alias.

Relevant log line:

warn agents/harness {"harnessId":"codex","provider":"openai","modelId":"gpt-5.5","error":"Codex app-server auth profile \"openai:media-api\" must belong to provider \"openai-codex\" or a supported alias."} Codex agent harness failed; not falling back to embedded PI backend

So for this PR’s regression coverage, I think the fixture should probably include a mixed-profile case, not only an OAuth-only/no-OpenAI-auth case:

openai-codex:* OAuth profile exists and should remain the Codex subscription auth path.
openai:* API-key profile also exists, but should not be silently selected for migrated chat/model routes.
doctor --fix must not rewrite openai-codex/* + pi into openai/* + pi unless the user explicitly chooses direct OpenAI API billing.
Native Codex (openai/* + agentRuntime.id: "codex") must not forward an openai:* API-key profile as Codex app-server auth when an openai-codex:* OAuth profile is available.

In other words, the invariant should not only be “does the rewritten provider have some auth?” It should also protect the billing/auth transport boundary:

openai-codex/* + pi = subscription/OAuth route
openai/* + codex = native Codex subscription/OAuth route, if Codex auth selection is correct
openai/* + pi = direct OpenAI API billing route

The dangerous case is that the migration can turn the first into the third without explicit user consent.

I also posted the live mixed-profile follow-up in #78407 for context.

100yenadmin · 2026-05-08T08:48:16Z

Closing as superseded.

#78407 was fixed on main by #79238 ("Keep OpenAI Codex migrations on automatic runtime routing", 02fe0d8) and @steipete closed the issue with proof on 2026-05-07. CHANGELOG on main (line 195):

Doctor/OpenAI: stop pinning migrated openai-codex/* routes to the Codex runtime so mixed-provider agents keep automatic PI routing for MiniMax, Anthropic, and other non-OpenAI model switches.

Post-#79238 maybeRepairCodexRoutes leaves agentRuntime.id unset, and openAIProviderUsesCodexRuntimeByDefault (src/agents/openai-codex-routing.ts:42) auto-routes openai/* through the Codex runtime when the OpenAI provider has the default base URL. So:

OAuth-only failure mode (original [Bug]: openclaw doctor --fix rewrites openai-codex/* model refs to openai/* on 2026.5.4 → 2026.5.5 update, locking out ChatGPT-OAuth users #78407) — fixed.
@dungeonmyk's mixed-profile billing footgun (openai/* + pi silently billing the openai:* API-key profile) — also addressed: default runtime is no longer pi, so openai/* + an openai-codex:* profile flows through Codex instead of the API-key path.
@dungeonmyk's follow-up (Codex app-server auth profile "openai:media-api" must belong to provider "openai-codex" or a supported alias) is a separate bug in extensions/codex/src/app-server/auth-bridge.ts:309 — that one is tracked at [Bug]: doctor --fix rewrites Codex runtime model refs to openai/* and breaks Codex auth profile selection #78499.

The `it.fails` repro here is too narrow to lock in the post-#79238 contract — the `findModelRefsWithoutAuth` walker only inspects raw config provider prefixes and doesn't see the runtime-policy resolver, so it would still report every `openai/*` ref as orphaned even on the fixed code. Reframing it against `resolveModelRuntimePolicy` is more rework than re-filing as a fresh test.

Happy to extract `extensions/qa-lab/transport-parity-gate.md` into its own PR tied to #78457 if that scaffolding is still wanted.

100yenadmin mentioned this pull request May 6, 2026

[Bug]: openclaw doctor --fix rewrites openai-codex/* model refs to openai/* on 2026.5.4 → 2026.5.5 update, locking out ChatGPT-OAuth users #78407

Closed

openclaw-barnacle Bot added commands Command implementations extensions: qa-lab labels May 6, 2026

100yenadmin mentioned this pull request May 6, 2026

[CI]: Add transport-parity gate (same-model cross-provider + cross-runtime) — sibling to QA parity-gate #78457

Closed

openclaw-barnacle Bot added size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026

100yenadmin marked this pull request as ready for review May 6, 2026 14:32

jimdawdy-hub mentioned this pull request May 6, 2026

Codex harness migration: agentRuntime.fallback="none" doesn't keep non-codex fallbacks off codex; canonical openai/* ref doesn't broker openai-codex OAuth profile #75739

Closed

100yenadmin closed this May 8, 2026

This was referenced May 10, 2026

Codex-vs-Pi runtime parity QA harness (RFC + tracking) #80171

Closed

docs(qa-lab): runtime-parity gate design (Pi vs Codex harness) #80179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(doctor): reproduce #78407 openai-codex model-ref rewrite without auth#78512

test(doctor): reproduce #78407 openai-codex model-ref rewrite without auth#78512
100yenadmin wants to merge 1 commit into
openclaw:mainfrom
electricsheephq:fix/78407-doctor-codex-model-ref-preservation

100yenadmin commented May 6, 2026

Uh oh!

clawsweeper Bot commented May 6, 2026 •

edited

Loading

Uh oh!

dungeonmyk commented May 6, 2026

Uh oh!

100yenadmin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

100yenadmin commented May 6, 2026

Summary

Background

Root cause (pinned during this PR)

What this PR adds

What this PR does not do

Validation

Cross-links

Uh oh!

clawsweeper Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dungeonmyk commented May 6, 2026

Uh oh!

100yenadmin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 6, 2026 •

edited

Loading