Skip to content

fix(doctor): merge disjoint openai-codex model entries into canonical openai provider#90056

Merged
steipete merged 12 commits into
openclaw:mainfrom
openperf:fix/90047-codex-migration-merge-models
Jun 7, 2026
Merged

fix(doctor): merge disjoint openai-codex model entries into canonical openai provider#90056
steipete merged 12 commits into
openclaw:mainfrom
openperf:fix/90047-codex-migration-merge-models

Conversation

@openperf

@openperf openperf commented Jun 3, 2026

Copy link
Copy Markdown
Member

Summary

  • Problem: Issue Codex migration (2026.6.1) drops the gpt-5.5 model when a canonical openai provider exists for embeddings — agents go silent #90047 reports that the 2026.6.1 doctor migration silently drops the entire openai-codex provider when any canonical openai provider already exists — including cases where both serve disjoint purposes (e.g. openai for embeddings via API key, openai-codex for chat via OAuth/ChatGPT backend). After a routine update, gpt-5.5 was gone and every agent turn failed.
  • Root Cause: In migrateLegacyOpenAICodexProvider (legacy-config-migrations.runtime.models.ts:997), the else branch for the shadowed-provider case recorded "Removed" and called delete providers[providerId]. The already-normalized normalized.value — containing the API-renamed model entries — was discarded without checking for model rows absent from the canonical provider. The branch assumed a canonical openai provider is a superset of openai-codex, which is false when the two served disjoint models.
  • Fix: Replace the silent drop with a targeted merge. For each model entry in the normalized legacy provider: (1) skip those whose id already appears in the canonical provider's model id Set or whose name appears in its name Set (separate Sets to avoid cross-type false positives); (2) stamp the legacy provider's baseUrl and api onto merged models that lack their own, so they continue routing to the correct endpoint (e.g. chatgpt.com/backend-api not api.openai.com/v1); (3) skip entries with neither id nor name to avoid duplicating malformed records. Only when the filtered set is empty is the old "Removed … already exists" message emitted.
  • What changed:
    • src/commands/doctor/shared/legacy-config-migrations.runtime.models.ts — replace the else drop with a merge-or-remove branch; separate canonicalModelIds/canonicalModelNames Sets; stamp legacy provider baseUrl/api onto merged models that lack them.
    • src/commands/doctor/shared/legacy-config-migrate.test.ts — 3 new tests: disjoint merge with baseUrl preserved (the exact reporter scenario); partial-conflict merge (one duplicate skipped, one new entry with endpoint stamped); all-duplicate scenario still triggers the "Removed" message.
  • What did NOT change (scope boundary):
    • The hasCanonicalOpenAIProvider check and normalizeLegacyOpenAIResponsesApi are unchanged.
    • The move branch (!hasCanonicalOpenAIProvider) is unchanged.
    • Config schema, defaults, and other doctor migrations are unchanged. Plugin surface unchanged.

Reproduction

  1. Configure models.providers.openai (embeddings, API key, baseUrl: https://api.openai.com/v1) and models.providers.openai-codex (chat, OAuth, baseUrl: https://chatgpt.com/backend-api, models: [{ id: "gpt-5.5" }]).
  2. Run openclaw doctor --fix or openclaw update on OpenClaw 2026.6.1.
  3. Before this PR: migration deletes openai-codex entirely; gpt-5.5 is gone; agents fail.
  4. After this PR: migration merges gpt-5.5 (api → openai-chatgpt-responses, baseUrl: https://chatgpt.com/backend-api stamped) into openai.models[]; both endpoints preserved; agents continue working.

Real behavior proof

Behavior addressed (#90047): migrateLegacyOpenAICodexProvider now merges disjoint model entries with correct endpoint metadata instead of dropping them.

Real environment tested (Ubuntu Server Node 24, source-level — Vitest against migrateLegacyConfigForTest with the reporter's exact multi-provider config): new tests reproduce the exact upgrade scenario and assert that the merged model array contains the correct baseUrl and renamed api.

Exact steps or command run after this patch: node scripts/run-vitest.mjs src/commands/doctor/shared/legacy-config-migrate.test.ts src/commands/doctor/shared/legacy-config-migrations.runtime.models.test.ts — 116 tests pass (113 pre-existing + 3 new). pnpm exec oxlint clean. pnpm format:check clean.

Evidence after fix:

Tests  116 passed (116)

New tests: merges disjoint model entries from legacy codex into canonical openai and preserves legacy baseUrl (#90047) — asserts openai.models = [{ id: "text-embedding-3-small" }, { id: "gpt-5.5", api: "openai-chatgpt-responses", baseUrl: "https://chatgpt.com/backend-api" }]; skips already-present model ids when merging legacy codex into canonical openai — one duplicate skipped, one new entry stamped with legacy baseUrl; removes openai-codex when all its models already exist in canonical openai — canonical unchanged, "Removed" message emitted.

What was not tested: live openclaw doctor --fix run against a real multi-provider config file.

Repro confirmation: the disjoint-merge test fails on the pre-patch tree (openai.models only contains the embeddings entry, gpt-5.5 absent) and passes with the fix.

Risk / Mitigation

  • Risk: Wrong endpoint after merge. Mitigation: legacy provider baseUrl/api are stamped onto merged models that lack their own, tested by the new baseUrl assertion.
  • Risk: Duplicate model rows. Mitigation: separate id/name Sets for dedup; cross-type false positives eliminated; all-duplicate path tested with "Removed" message assertion.
  • Risk: Config surface change requiring Peter's approval. Mitigation: bug fix in an existing migration's behavior — no new config key, schema, or migration added.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Doctor / migrations

Linked Issue/PR

Fixes #90047

@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: S labels Jun 3, 2026
@clawsweeper

clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 4:27 AM ET / 08:27 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +260, Tests +405. Total +665 across 4 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a58a6f63cac3.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove P1: Current review triage priority is none.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 compatibility: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 auth-provider: Current PR review selected no merge-risk labels.
  • remove status: 👀 ready for maintainer look: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +260, Tests +405. Total +665 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 2 272 12 +260
Tests 2 405 0 +405
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 677 12 +665

What I checked:

  • failure reason: timeout.
  • codex failure detail: Codex review failed for this PR: spawnSync codex ETIMEDOUT.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. labels Jun 3, 2026
@openperf openperf force-pushed the fix/90047-codex-migration-merge-models branch from 91cdb40 to 1b1fa93 Compare June 4, 2026 01:19
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 4, 2026
@steipete steipete self-assigned this Jun 7, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 7, 2026
@steipete steipete force-pushed the fix/90047-codex-migration-merge-models branch from 815da58 to e5e6a11 Compare June 7, 2026 08:48
@steipete

steipete commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Land-ready maintainer pass complete.

What changed after review:

  • Preserved safe legacy openai-codex model fields when merging into the canonical openai provider.
  • Blocked unsafe provider-level defaults and auth/header/request state from being silently copied into individual models.
  • Preserved models-add metadata markers and provider/model params without overwriting later canonical provider normalization.
  • Added preview warnings for blocked normalized legacy provider cleanup.
  • Rebased onto current main and fixed the unrelated qa-lab saved exit-code test type mismatch exposed by the current type lane.

Local proof:

  • node scripts/run-vitest.mjs src/commands/doctor/shared/legacy-config-migrate.test.ts src/commands/doctor/shared/legacy-config-migrations.runtime.models.test.ts src/commands/doctor/shared/legacy-models-add-metadata.test.ts src/commands/doctor/shared/preview-warnings.test.ts extensions/qa-lab/src/live-transports/whatsapp/cli.runtime.test.ts
  • pnpm check:test-types
  • pnpm exec oxfmt --check on touched doctor/qa-lab files
  • pnpm exec oxlint on touched doctor/qa-lab files
  • git diff --check

Autoreview:

  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main --parallel-tests "node scripts/run-vitest.mjs src/commands/doctor/shared/legacy-config-migrate.test.ts src/commands/doctor/shared/legacy-config-migrations.runtime.models.test.ts src/commands/doctor/shared/legacy-models-add-metadata.test.ts src/commands/doctor/shared/preview-warnings.test.ts"
  • Result: clean, no accepted/actionable findings.

GitHub CI on e5e6a11 is green, including check-test-types, check-lint, check-prod-types, build-artifacts, doctor-shared shard, and security/quality gates.

openperf and others added 8 commits June 7, 2026 09:53
… openai provider

When both an openai-codex and an openai provider existed (e.g. codex
for chat via OAuth, openai for embeddings via API key), the 2026.6.1
doctor migration silently dropped the entire openai-codex provider
without carrying its model definitions over to the canonical openai
provider, causing every agent turn to fail after a routine update.

In migrateLegacyOpenAICodexProvider, replace the silent drop with a
merge: collect model entries from the normalized legacy provider, skip
those whose id or name already exists in the canonical provider (using
separate id/name Sets to avoid cross-type false positives), and append
the remaining disjoint entries. For each merged model that lacks its
own baseUrl or api, stamp the legacy provider's values so the model
continues to route to the correct endpoint (e.g. chatgpt.com/backend-api
instead of api.openai.com/v1). An entry with neither id nor name is
excluded to avoid duplicating malformed records. Only when no entries
survive the filter is the old removal message emitted.

Fixes openclaw#90047
@steipete steipete force-pushed the fix/90047-codex-migration-merge-models branch from e5e6a11 to b9495fb Compare June 7, 2026 08:54
@steipete

steipete commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Final rebase/proof update:

  • Rebased the reviewed stack onto current main 3e4b10f.
  • Dropped the temporary qa-lab exit-code type commit because main already contains the stronger typeof process.exitCode fix.
  • Local proof after rebase: oxfmt/oxlint on touched doctor files, git diff --check, focused doctor Vitest suite, and pnpm check:test-types.
  • GitHub CI is green on b9495fb, including check-test-types, check-lint, check-prod-types, build-artifacts, doctor-shared shard, and security/quality gates.

@steipete steipete merged commit e06f6ff into openclaw:main Jun 7, 2026
159 of 161 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex migration (2026.6.1) drops the gpt-5.5 model when a canonical openai provider exists for embeddings — agents go silent

2 participants