Skip to content

Codex migration (2026.6.1) drops the gpt-5.5 model when a canonical openai provider exists for embeddings — agents go silent #90047

@holgergruenhagen

Description

@holgergruenhagen

Summary

First off — thank you for OpenClaw and for moving the OpenAI Codex handling onto a cleaner, canonical footing in #88605. The intent (no more openai-codex as a first-class runtime/provider id) makes a lot of sense.

We hit a sharp edge of that migration when upgrading to 2026.6.1: the legacy-config migration removed our openai-codex model provider but did not carry its model definition over to the canonical openai provider. The only place our chat model gpt-5.5 was defined was that provider, so after the upgrade the model simply no longer existed — and every agent went silent.

I think the scenario below may not have been considered, because it depends on a perfectly reasonable reason to have both providers configured at the same time.

Why we had a canonical openai provider and an openai-codex provider

These two providers served two different, legitimate purposes with two different auth modes:

  • openai (api_key, baseUrl: https://api.openai.com/v1) — existed only to host text-embedding-3-small for memory / vector search (agents.defaults.memorySearch.provider: "openai", model: "text-embedding-3-small"). This is the standard embeddings setup.
  • openai-codex (oauth, baseUrl: https://chatgpt.com/backend-api, api: openai-codex-responses) — hosted the actual chat model gpt-5.5, used by every agent via agents.defaults.model.primary: "openai/gpt-5.5" with agentRuntime.id: "codex".

So the canonical openai provider was present, but it was embeddings-only — it did not contain gpt-5.5. The migration treated the existence of any canonical openai provider as "the codex provider is now redundant" and dropped it, taking gpt-5.5 with it.

Root cause

In migrateLegacyOpenAICodexProvider (dist legacy-config-migrations-*.js), the relevant branch is roughly:

if (!hasCanonicalOpenAIProvider(providers)) {
  providers[OPENAI_PROVIDER_ID] = normalized.value;          // move codex -> openai
  changes.push(`Moved models.providers.openai-codex → models.providers.openai.`);
} else {
  changes.push(`Removed models.providers.openai-codex because models.providers.openai already exists.`);
}
delete providers[providerId];

When a canonical openai provider already exists, the codex provider is deleted outright. Its models[] (including gpt-5.5) are not merged into the existing openai provider, and the already-computed normalized.value (with the openai-codex-responsesopenai-chatgpt-responses api rename applied) is discarded. The assumption seems to be that a canonical openai provider already covers everything the codex provider did — which isn't true when the two providers hold disjoint models (embeddings vs. the ChatGPT-backend chat model).

This behavior runs both during openclaw doctor --fix and during the post-openclaw update doctor step, so the upgrade applies it automatically.

Impact

After the upgrade, openai/gpt-5.5 was unresolvable. Every agent turn failed. The user-facing symptom chain:

  1. Codex app-server auth profile "openai:<account>" was not found. (native Codex path)
  2. embedded failover → 401 Unauthorized: Missing bearer or basic authentication in header, url: https://api.openai.com/v1/responses
  3. Embedded agent failed before reply: …

Net effect: a fully working multi-agent fleet went completely silent after a routine update, with no obvious pointer that a model had been removed (config still validated cleanly, since the dangling auth profile alone is accepted).

Environment

  • OpenClaw: 2026.6.1 (2e08f0f)
  • Node: v22.22.2
  • Install: npm global (Linux)
  • Auth: Codex/ChatGPT OAuth for the chat model; OpenAI api_key path only for embeddings
  • Default route: openai/gpt-5.5, agentRuntime.id: "codex"

Steps to reproduce

  1. On a pre-6.1 install, configure two model providers:
    • models.providers.openai (api_key) with only an embedding model (e.g. text-embedding-3-small), referenced by agents.defaults.memorySearch.
    • models.providers.openai-codex (oauth, api: openai-codex-responses) with the chat model gpt-5.5.
    • agents.defaults.model.primary: "openai/gpt-5.5", runtime codex.
  2. Upgrade to 2026.6.1 (or run openclaw doctor --fix).
  3. Observe: models.providers.openai-codex is removed; gpt-5.5 no longer exists under any provider; all agents fail with the auth-profile/401 chain above.

Suggested fix

When removing a "shadowed" openai-codex provider because a canonical openai provider already exists, merge rather than drop:

  • Move any models from openai-codex that are not already present in openai into openai.models[].
  • Apply the existing openai-codex-responsesopenai-chatgpt-responses rename to those moved models (the code already computes this as normalized.value).
  • Preserve each moved model's baseUrl (the ChatGPT-backend models rely on a per-model baseUrl, since the openai provider's own baseUrl points at api.openai.com).

That way the canonical-openai-already-exists case keeps both the embeddings model and the ChatGPT-backend chat model, and the migration stays loss-free.

A secondary, lower-priority nicety: when a model route (e.g. openai/gpt-5.5) becomes unresolvable as a result of a migration, surfacing a doctor warning would have made this immediately diagnosable.

Workaround (for anyone hitting this)

  1. Add the chat model back into the canonical openai provider's models[], using the new api id and a per-model baseUrl:
    {
      "id": "gpt-5.5",
      "name": "GPT-5.5",
      "baseUrl": "https://chatgpt.com/backend-api",
      "api": "openai-chatgpt-responses",
      "reasoning": false,
      "input": ["text"],
      "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
      "contextWindow": 200000,
      "contextTokens": 195000,
      "maxTokens": 8192
    }
    (Do not recreate a separate openai-codex provider — it fails 6.1 schema validation, and doctor --fix would remove it again.)
  2. Run openclaw doctor --fix to rename the dangling openai-codex:<account> auth profile to the canonical openai:<account> and rewrite references. Existing OAuth credentials are reused — no re-login is required.
  3. Restart the gateway. openclaw config validate should be clean and agents respond again.

Thanks again for the great work, and happy to help confirm a fix against this configuration shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions