fix(doctor): preserve explicit agentRuntime pin during codex model migration [AI-assisted]#84142
fix(doctor): preserve explicit agentRuntime pin during codex model migration [AI-assisted]#84142nxmxbbd wants to merge 2 commits into
Conversation
|
Codex review: passed. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. The current-main source path shows the synthesis-before-merge ordering, and the PR body supplies before/after terminal output invoking the production repair function on the reported config. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Rarity: 🥚 common. What is this egg doing here?
Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land this focused doctor migration fix after exact-head required checks pass, while keeping the broader native Codex token-usage policy separate from this config-preservation repair. Do we have a high-confidence way to reproduce the issue? Yes. The current-main source path shows the synthesis-before-merge ordering, and the PR body supplies before/after terminal output invoking the production repair function on the reported config. Is this the best way to solve the issue? Yes. The guard is narrowly placed at the Codex runtime synthesis point, preserves explicit non-default pins, and leaves existing Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 5c9a8f33b312. |
|
@clawsweeper automerge |
|
🦞🔧 Source: I will update this PR branch, or open a safe credited replacement, if the repair worker finds a narrow CI fix. Automerge progress:
|
9d93263 to
47bd8c1
Compare
Adds a regression test in codex-route-warnings.test.ts that mirrors the exact user-reported config from openclaw#84038: - agents.defaults.model.primary = "openai-codex/gpt-5.4" - agents.defaults.models["openai-codex/gpt-5.4"].agentRuntime = { id: "pi" } The user expects doctor --fix to preserve the explicit non-default runtime pin while migrating the legacy model ref. Current behavior: the pin is silently overwritten by a synthesized { id: "codex" } during the rewriteModelsMap spread-merge. This commit intentionally lands the test RED so a follow-up fix can flip it to GREEN with a focused diff. Co-Authored-By: Nex <nex@dbitstec.com>
…gacy model-ref migration
`maybeRepairCodexRoutes` migrates `openai-codex/*` model refs to their
canonical `openai/*` form, then calls `ensureCodexRuntimePolicy` on each
hit so the rewritten ref keeps Codex auth routing. When the user already
had an explicit non-default pin on the legacy form (for example
`models["openai-codex/gpt-5.4"].agentRuntime = { id: "pi" }` as a
deliberate opt-out from the broken native Codex runtime), the previous
sequence silently overwrote it:
1. The `AGENT_MODEL_CONFIG_KEYS` loop rewrote `model.primary`, then
`preserveCodexRuntimePolicyForNewHits` synthesized
`models["openai/gpt-5.4"].agentRuntime = { id: "codex" }` because
that canonical entry did not yet exist.
2. `rewriteModelsMap` then renamed `openai-codex/gpt-5.4` →
`openai/gpt-5.4`. Its spread merge
`{ ...legacyRecord, ...canonicalRecord }` placed the synthesized
`{ id: "codex" }` last, silently dropping the user's `{ id: "pi" }`
pin.
End state: the explicit PI runtime opt-out was gone, gateway fell back
to the native Codex runtime, and per-request token usage grew 3-4x
because the upstream Codex runtime issue still applies.
Fix: in `ensureCodexRuntimePolicy`, skip synthesizing the default
`{ id: "codex" }` pin when the user already has an explicit non-default
pin on a legacy `openai-codex/*` form of the same canonical model ref.
`rewriteModelsMap` then carries the legacy pin forward unmodified via
its existing `canonicalEntry ?? legacyEntry` falsy branch.
This narrow fix matches the existing behavior covered by
`preserves explicit model-scoped runtime pins when repairing legacy
model map keys` (only the models map, no `model.primary` slot to trigger
synthesis) and `overwrites non-concrete model-scoped runtime pins when
preserving Codex route intent` (auto/default pins still get codex
applied). The combined-config path from openclaw#84038 was the previously
uncovered regression.
Fixes openclaw#84038.
Co-Authored-By: Nex <nex@dbitstec.com>
47bd8c1 to
6952ea7
Compare
|
ClawSweeper 🐠 reef update Thanks for the contribution. The source branch was not safely writable by ClawSweeper, so it opened a replacement PR and kept the credit trail visible. Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
fish notes: model gpt-5.5, reasoning high; reviewed against 41e5043. |
Summary
openclaw doctor --fixsilently overwrote an explicitagentRuntime: { id: "pi" }opt-out with{ id: "codex" }while migrating legacyopenai-codex/*model refs to canonicalopenai/*form, flipping the user from the PI runtime back onto the native Codex runtime (3-4× token inflation per request, per [Bug]: doctor --fix silently migrates intentional openai-codex/ config to openai/, breaking PI+OAuth runtime and causing 3-4x token inflation #84038).openai-codex/*form of that model ref already has an explicit non-defaultagentRuntime.id(anything other thanauto/default/ missing).rewriteModelsMapthen carries the legacy pin forward through its existing legacy-wins fallback branch.src/commands/doctor/shared/codex-route-warnings.ts(+24, -0). New helperlegacyEntryHasExplicitNonDefaultRuntimePin(models, canonicalModelRef)plus one guard inensureCodexRuntimePolicybeforesetModelRuntimePolicy.repairCodexSessionStoreRoutes,clearStaleSessionRuntimePins) is untouched. Top-levelagents.defaults.agentRuntimeclearing (clearLegacyAgentRuntimePolicy) is untouched. The migration of model refs themselves (openai-codex/X→openai/X) is unchanged.auto/defaultruntime pins remain overwritable.Motivation
The native Codex runtime currently produces 3-4× the token usage of the PI runtime for the same GPT-5.x request (the upstream Codex issue OpenClaw cannot fix at the provider layer). Users who explicitly pin
agentRuntime: { id: "pi" }to opt out of this need that opt-out to survivedoctor --fix. Every doctor run today silently puts them back on the broken runtime.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
tools.web.search; cited as design precedent below)Real behavior proof
Behavior or issue addressed:
doctor --fixoverwritesagents.defaults.models["openai-codex/gpt-5.4"].agentRuntime = { id: "pi" }with{ id: "codex" }while migrating the legacyopenai-codex/*model ref toopenai/*, silently re-enrolling the user into the native Codex runtime they explicitly opted out of.Real environment tested: Local fresh worktree off
upstream/main f07c87405con Linux 6.17.0 (Node 22.22.1, pnpm 11.1.0). Two sibling worktrees: baseline (upstream/main, no fix) at/tmp/openclaw-84038-baselineand fix branch at/tmp/openclaw-84038. ProductionmaybeRepairCodexRoutes(the same functionsrc/commands/doctor/repair-sequencing.ts:82calls duringopenclaw doctor --fix) invoked directly against the exact 4-key user config from [Bug]: doctor --fix silently migrates intentional openai-codex/ config to openai/, breaking PI+OAuth runtime and causing 3-4x token inflation #84038.Exact steps or command run after this patch:
Evidence after fix:
Terminal capture from this branch, copied live output:
Observed result after fix: the legacy
openai-codex/gpt-5.4model ref is migrated to canonicalopenai/gpt-5.4, the corresponding map entry follows the rename, and the user's explicitagentRuntime: { id: "pi" }opt-out is preserved on the migrated entry. The previous "Set agents.defaults.models.openai/gpt-5.4.agentRuntime.id to 'codex'" line is correctly absent because no synthesized default needs setting, andplugins.entries.codex.enabledstaysfalseas the user requested (no spurious auto-enable, because no route now claims the codex runtime).What was not tested: live gateway runtime / actual provider request emission (the runtime resolution side is exercised by existing
resolveAgentHarnessPolicytests incodex-route-warnings.test.tsat L3000-3006 and L3036-3042 covering both pi and codex shapes), and the macOS install path the original reporter is on (the migration code path is platform-agnostic, so the issue surface should reproduce / fix on macOS the same way).Before evidence (baseline, same script same input on
upstream/main f07c87405c):Root Cause
rewriteAgentModelRefs, theAGENT_MODEL_CONFIG_KEYSloop processesmodelfirst and callspreserveCodexRuntimePolicyForNewHits→ensureCodexRuntimePolicy(modelRef="openai/X")while the canonical entrymodels["openai/X"]does not yet exist.ensureCodexRuntimePolicysynthesizes{ agentRuntime: { id: "codex" } }on a fresh empty entry.rewriteModelsMapthen renamesopenai-codex/X→openai/X, and its{ ...legacyRecord, ...canonicalRecord }spread places the just-synthesized codex pin last, silently dropping the user's{ id: "pi" }.ensureCodexRuntimePolicyonly looked at the canonical entry's existingagentRuntimewhen deciding whether to synthesize. It did not consider that a legacy form of the same model ref in the samemodelsmap might carry an explicit non-default pin the user expected to keep through the migration.codex-route-warnings.test.tsalready encode the maintainer intent to preserve such pins —"preserves explicit model-scoped runtime pins when repairing legacy model map keys"(L2974) for the models-map-only case, and"overwrites non-concrete model-scoped runtime pins when preserving Codex route intent"(L3009) for theauto/defaultcase. The combined-config path (legacy ref present in BOTHmodel.primaryandmodels[...]) was the previously uncovered seam.Regression Test Plan
src/commands/doctor/shared/codex-route-warnings.test.tsagents.defaults.model.primary = "openai-codex/X"ANDagents.defaults.models["openai-codex/X"].agentRuntime = { id: "pi" }. AftermaybeRepairCodexRoutes({ shouldRepair: true }), the migrated entrymodels["openai/X"]must haveagentRuntime = { id: "pi" }, andmodels["openai-codex/X"]must be gone.rewriteAgentModelRefs; a unit-level config-in/config-out assertion againstmaybeRepairCodexRoutesexercises the full repair pipeline (model-slot rewrite,ensureCodexRuntimePolicydecision,rewriteModelsMaprename + merge) without needing a real gateway or provider auth wiring.model.primary-absent variant which already passes today; the combined-config variant from [Bug]: doctor --fix silently migrates intentional openai-codex/ config to openai/, breaking PI+OAuth runtime and causing 3-4x token inflation #84038 was uncovered.it(...)block (RED on the prior commitcd0f8f800e, GREEN on the fix commit9d932635ae).User-visible / Behavior Changes
openclaw doctor --fixno longer rewritesagentRuntime.idto"codex"on a canonicalopenai/Xentry when a legacyopenai-codex/Xform of the same model ref carries an explicit non-default pin (e.g.{ id: "pi" }). The legacy pin survives the rename.plugins.entries.codexis no longer auto-enabled in that scenario (the migrated route now claims thepiruntime, notcodex, soenableCodexPluginForRequiredRoutesdoes not need to touch the plugin entry). This matches the user's stated intent in [Bug]: doctor --fix silently migrates intentional openai-codex/ config to openai/, breaking PI+OAuth runtime and causing 3-4x token inflation #84038 ("explicitly disabled the codex plugin"). Users who leftagentRuntimeunset, or set it toauto/default/codex, see no behavior change.Diagram
Security Impact
NoNoNoNoNoThe patch is a local config-rewrite decision change. No auth, secret, capability, network, or filesystem surface changes.
Repro + Verification
Environment
agents,auth,plugins— verbatim as posted in [Bug]: doctor --fix silently migrates intentional openai-codex/ config to openai/, breaking PI+OAuth runtime and causing 3-4x token inflation #84038.Steps
upstream/main f07c87405cin worktree A andfix/84038-preserve-pi-runtimein worktree B.pnpm install --prefer-offline._proof-trace.mjsinto each worktree. The script importsmaybeRepairCodexRoutesfromsrc/commands/doctor/shared/codex-route-warnings.tsand calls it withshouldRepair: trueon the issue's exact config.node --import tsx ./_proof-trace.mjsand compare stdout.Expected
models["openai/gpt-5.4"].agentRuntime = { id: "pi" }. NoSet agents.defaults.models.openai/gpt-5.4.agentRuntime.id to "codex"line in changes. NoEnabled plugins.entries.codexline.Actual
models["openai/gpt-5.4"].agentRuntime = { id: "codex" }, with both theSet ...andEnabled ...lines present in changes.Evidence
Live-output evidence captured in the Real behavior proof section above. Repro test landed in
src/commands/doctor/shared/codex-route-warnings.test.ts(it("preserves an explicit non-default agentRuntime pin on the legacy model entry during migration (#84038)", ...)). Lint, typecheck, and the fullcodex-route-warnings.test.ts+repair-sequencing.test.tssweep all pass; these are supplemental to the live-output evidence.Human Verification
node --import tsxoutput captured both directions).upstream/mainbaseline → migration runs, pin overwritten (matches the user's reported regression).codex-route-warnings.test.ts, 10/10 inrepair-sequencing.test.ts. Specifically L2974 ("preserves explicit model-scoped runtime pins when repairing legacy model map keys" — pi pin preserved when onlymodelsmap is set, nomodel.primary) and L3009 ("overwrites non-concrete model-scoped runtime pins when preserving Codex route intent" —autopin still gets overwritten by codex).agentRuntime: { id: "auto" }legacy pin → not preserved (still overwritten by codex synthesis; matches existing maintainer intent).agentRuntime: { id: "default" }legacy pin → not preserved (same as auto).agentRuntimeat all → unchanged (codex synthesis still happens).ensureCodexRuntimePolicy's existing pre-check returns early, no synthesis happens,rewriteModelsMapkeeps canonical via its current spread order.resolveAgentHarnessPolicytests in the same file); the macOS install path the original reporter is on (the migration logic is platform-agnostic).Review Conversations
Compatibility / Migration
YesNoNoagentRuntimepin (the vast majority) see identical doctor behavior. Users on a non-default explicit pin keep it after upgrade — no manual reapply needed anymore.Risks and Mitigations
agentRuntime: { id: "pi" }(or similar non-codex pin) on a legacyopenai-codex/Xref to be REPLACED by{ id: "codex" }during migration no longer gets that automatic replacement."overwrites non-concrete model-scoped runtime pins when preserving Codex route intent"(L3009) keepsauto/defaultoverwritable for users who want default-policy behavior. A user who actually wants codex can set{ id: "codex" }explicitly, which the fix also leaves alone (existingensureCodexRuntimePolicyearly-exit on concrete non-default pin handles this case).plugins.entries.codexauto-enable. When the user's preserved pin is non-codex, the migrated route no longer claims the codex runtime, so the codex plugin is no longer auto-enabled.AI-assisted disclosure
maybeRepairCodexRoutesfunction from a Node script on the human-operator's machine using the exact user config from the issue body. Co-author trailer:Co-Authored-By: Nex <nex@dbitstec.com>.rewriteAgentModelRefs, whyensureCodexRuntimePolicyis the right place for the early-return, and the trade-off vs reorderingrewriteModelsMap(a second considered approach, rejected because it shifts an existing snapshot test's expectedchanges[]ordering for no semantic gain).codex review --base origin/mainwas not run because Codex CLI is not installed on the human-operator's local machine. Equivalent diff-review steps performed: typecheck (pnpm check:changed --base upstream/main, lanecheck-prod-types+check-test-types), lint (oxlint0 warnings 0 errors on 8650 files / 217 rules),oxfmt --checkclean, repro test exercises the exact behavior.