Skip to content

[Bug]: v2026.5.28 — fallback iterator leaks one candidate's modelId into every subsequent provider lookup; produces doubled-prefix errors fleet-wide #88560

@cjalden

Description

@cjalden

Bug type

Regression (worked on v2026.5.22, broken on v2026.5.27+).

Beta release blocker

No (but production-breaking on the v2026.5.28 stable release).

Summary

Two compounding regressions in v2026.5.28 break failover for any agent whose agents.defaults.models map contains a fully-qualified key like "anthropic/claude-haiku-4-5":

  1. Doubled provider prefix. OC resolves agents.defaults.models["anthropic/claude-haiku-4-5"] with params.modelId = "anthropic/claude-haiku-4-5" (the key string, not stripped). Downstream code re-prepends the provider, producing "anthropic/anthropic/claude-haiku-4-5".
  2. Fallback iterator state leak. Once a candidate fails with this bad id, every subsequent candidate in the fallback chain is queried using the same leaked modelId — only the provider prefix swaps. So sonnet-4-6 → opus-4-7 → grok-4 → gemini-2.5-pro → gpt-4o → ... all fail with errors of the form <that-provider>/anthropic/claude-haiku-4-5.

Net effect: agents with any haiku reference in their model chain have no working fallback path at all. Heartbeats, cron jobs, and main-lane conversations all fail.

This is related to but distinct from #88517 — that report covers the case where payload.model = anthropic/claude-haiku-4-5 is parsed cleanly (single-prefix error, the resolver gets provider=anthropic, modelId=claude-haiku-4-5). Our case is the catalog-keyed form where the resolver never strips the prefix from the dict key. Likely the same underlying fix in normalizeStaticProviderModelId would cover both, but the symptom surfaces differ.

Also related to #77167 (double-prefix nvidia, closed-as-implemented by clawsweeper — the cited fix in model-ref-shared.test.ts:5 only covers nvidia, not anthropic). And #88470 (the openai-codex/anthropic/... strings in our fallback chain match its codex-runtime-prefix variant — same root cause family).

Steps to reproduce

  1. Configure an agent with the standard prefixed key form in its model catalog:
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6",
        "fallbacks": ["anthropic/claude-haiku-4-5", "anthropic/claude-opus-4-7"]
      },
      "models": {
        "anthropic/claude-sonnet-4-6": {},
        "anthropic/claude-haiku-4-5": {},
        "anthropic/claude-opus-4-7": {}
      }
    }
  },
  "models": {}
}
  1. Trigger any session (cron, heartbeat, or message).
  2. Observe [model-fallback/decision] reason=model_not_found detail=Unknown model: anthropic/anthropic/claude-haiku-4-5 in gateway.err.log.
  3. Observe that all fallback candidates fail with the same leaked modelId rather than each being queried on its own merits.

Expected behavior

Either:

  • agents.defaults.models keys of the form "anthropic/claude-haiku-4-5" should resolve via the bundled static catalog without requiring models.providers["anthropic"].models[] to be populated (the v2026.5.22 behaviour); and
  • Each fallback candidate should be looked up using its own modelId, not the prior candidate's.

Actual behavior

From a single session attempting failover through 9 candidates:

[diagnostic] message processed: ... outcome=error ... error="FallbackSummaryError: All models failed (9):
  anthropic/claude-sonnet-4-6: Unknown model: anthropic/anthropic/claude-haiku-4-5 ... (model_not_found)
| openai/gpt-4o:               Requested agent harness "codex" does not support openai/anthropic/claude-haiku-4-5 ...
| xai/grok-4:                  Unknown model: xai/anthropic/claude-haiku-4-5 (model_not_found)
| google/gemini-2.5-pro:       Unknown model: google/anthropic/claude-haiku-4-5 (model_not_found)
| anthropic/claude-haiku-4-5:  Unknown model: anthropic/anthropic/claude-haiku-4-5 ... (model_not_found)
| openai/gpt-4o-mini:          Requested agent harness "codex" does not support openai/anthropic/claude-haiku-4-5 ...
| google/gemini-2.0-flash:     Unknown model: google/anthropic/claude-haiku-4-5 (model_not_found)
| xai/grok-4-mini:             Unknown model: xai/anthropic/claude-haiku-4-5 (model_not_found)
| anthropic/claude-opus-4-7:   Unknown model: anthropic/anthropic/claude-haiku-4-5 ..."

Note the modelId anthropic/claude-haiku-4-5 persists through every candidate; only the leading provider prefix changes per candidate. The candidate's own modelId is never queried.

Affected agents

6 of 7 agents in our fleet are affected, ranging from 17 to 953 failures over 72 hours. The one unaffected agent is the only one whose agents.defaults.models does not contain a haiku entry.

Source pointers

In the shipped bundle on disk (/opt/homebrew/lib/node_modules/openclaw/dist/):

  • model-B0BrTDWx.js, buildUnknownModelError: builds the error string as raw ${params.provider}/${params.modelId} with no normalization, so an already-prefixed modelId becomes the doubled form.
  • model-B0BrTDWx.js, buildMissingProviderModelRegistrationHint: uses modelKey(provider, modelId) (which DOES normalize via a startsWith check), but then in the suggested-fix string it interpolates the unnormalized params.modelId. Inconsistent normalization between sibling code paths in the same function.
  • model-ref-shared-DKdSOt8D.js, normalizeStaticProviderModelId: special-cases google/openrouter/xai/together but returns anthropic model unchanged, even when the model id already contains the anthropic/ prefix.

Workaround used

For #88517 the reporter added { "id": "claude-haiku-4-5" } (unprefixed) to models.providers["anthropic"].models[]. For our catalog-keyed variant the same workaround would need the prefixed form per the hint text, which is itself suspect. Pinning to openclaw@2026.5.22 restores correct behaviour pending an upstream fix.

OpenClaw version

OpenClaw 2026.5.28 (e932160). First errors appeared in our logs 2026-05-28 22:02 PT (~10h after v2026.5.27 was released), so likely introduced in v2026.5.27.

Operating system

macOS Darwin 25.5.0 (arm64), Mac Mini M4.

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions