fix(agents): preserve fallback candidates in live sessions by goldmar · Pull Request #57227 · openclaw/openclaw

goldmar · 2026-03-29T18:12:03Z

Summary

stop live-session model selection from snapping back to the agent default when there is no explicit session override
preserve normal live /model behavior by still honoring stored session overrides from the session store
add regression coverage for agent-session fallback so cross-provider retries keep running after a primary rate limit

Root cause

resolveLiveSessionModelSelection() treated the agent default model as the persisted live-session selection whenever an agentId existed. During fallback, the embedded runner compared the current retry candidate against that agent default and threw LiveSessionModelSwitchError, which made internal fallback candidates look like user-requested live model switches.

Fix

Use the current attempt's provider/model as the default live-session selection baseline. Only switch away when the session store has an explicit override.

Validation

pnpm --dir /home/openclaw/openclaw exec vitest run src/agents/live-model-switch.test.ts
pnpm --dir /home/openclaw/openclaw exec vitest run --config vitest.e2e.config.ts src/agents/model-fallback.run-embedded.e2e.test.ts
rebuilt local dist/, restarted the gateway, reverted the live config back to anthropic/claude-sonnet-4-6, and verified a direct agent turn fell back to openai-codex/gpt-5.4
local command used for the live check:
- openclaw agent --agent main --message 'Reply with FALLBACK_FIX_OK and nothing else.' --json
observed result for run a2ac9644-34bf-41db-9457-50261e59b067:
- Anthropic primary returned 429 rate_limit_error
- fallback then succeeded on openai-codex/gpt-5.4
- no fresh live session model switch detected before attempt log entries were emitted for that run

greptile-apps · 2026-03-29T18:13:24Z

Greptile Summary

This PR fixes a bug in resolveLiveSessionModelSelection() where cross-provider fallback candidates were incorrectly treated as user-requested live model switches for agent sessions. The root cause was that when an agentId was present, the function used the agent's configured default model as the session baseline instead of the current attempt's provider/model — causing hasDifferentLiveSessionModelSelection to fire a LiveSessionModelSwitchError on every fallback attempt.

Changes:

live-model-switch.ts: Removes the resolveDefaultModelForAgent call and uses params.defaultProvider/params.defaultModel directly as the baseline. The call site in pi-embedded-runner/run.ts already passes the current attempt's provider and model for these fields, so no change was needed there.
live-model-switch.test.ts: Removes the now-unnecessary resolveDefaultModelForAgent mock and adds a dedicated test for the no-override fallback path.
model-fallback.run-embedded.e2e.test.ts: Adds an e2e regression test covering cross-provider fallback for agent sessions without explicit session store overrides (agentId: "main").

Confidence Score: 5/5

Safe to merge — targeted, well-understood fix with matching unit and e2e regression coverage.

The change is minimal and surgical: one function call removed, two field accesses simplified to use the already-correct caller-supplied values. The call site in run.ts already passes the current attempt's provider/model, so the fix is properly integrated. New unit test and e2e regression test both cover the exact scenario described in the root cause. No P0 or P1 issues found.

No files require special attention.

Important Files Changed

Filename	Overview
src/agents/live-model-switch.ts	Core fix: removes resolveDefaultModelForAgent call and uses params.defaultProvider/defaultModel directly as the session baseline, preventing false LiveSessionModelSwitchError during fallback.
src/agents/live-model-switch.test.ts	Removes now-unnecessary resolveDefaultModelForAgent mock and adds a dedicated unit test for the no-override fallback path.
src/agents/model-fallback.run-embedded.e2e.test.ts	Adds agentId param to runEmbeddedFallback and a new e2e regression test covering cross-provider fallback for agent sessions without explicit session store overrides.

_{Reviews (1): Last reviewed commit: "fix(agents): preserve fallback candidate..." | Re-trigger Greptile}

goldmar · 2026-03-29T18:28:30Z

Follow-up validation after local deploy:

rebuilt the patched repo and restarted the local gateway on build 0a9bfa8
exercised the previously failing Telegram/live-session path directly against session 27c82782-4e2a-40a3-a626-f7b94b796154

command used:

openclaw agent --agent main --session-id 27c82782-4e2a-40a3-a626-f7b94b796154 --message 'Reply with POST_DEPLOY_TELEGRAM_FALLBACK_OK and nothing else.' --json\n  ```\n- result completed on `openai-codex/gpt-5.4` with payload `POST_DEPLOY_TELEGRAM_FALLBACK_OK`\n- fresh logs for run `f2eded73-809c-4d58-a0ff-17d6316a343f` showed Anthropic `429` on the primary attempt, then `model_fallback_decision` `candidate_succeeded` for `openai-codex/gpt-5.4`\n- importantly, there were no new `live session model switch detected before attempt` lines for that run\n\nSo the follow-up commit fixes the remaining Telegram/live-session regression that still reproduced after the first patch.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85e00a7252

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T18:32:45Z

+  if (authProfileIdSource !== "user") {
+    return {
+      authProfileId: undefined,
+      authProfileIdSource: undefined,
+    };


Preserve auto auth overrides in live-selection baseline

Dropping every non-user authProfileOverride here makes persisted session state disagree with the runner’s current state in normal flows where the override source is auto (for example, overrides persisted by resolveSessionAuthProfileOverride during /btw, then forwarded by runAgentAttempt into runEmbeddedPiAgent). In that case resolveLiveSessionModelSelection now returns authProfileId: undefined, hasDifferentLiveSessionModelSelection flags a change before any attempt, and the run throws LiveSessionModelSwitchError, which can skip the primary candidate or fail the turn when no fallback remains.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6cb6d8cf7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T18:47:13Z

+  const authProfileIdSource =
+    entry.authProfileOverrideSource ??
+    (typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user");


Keep unknown auth override source from forcing live switch

When a session entry has authProfileOverride but no authProfileOverrideSource (legacy/manual store data), this fallback now coerces the source to "user". runEmbeddedPiAgent still builds current from the raw session entry source (often undefined), so hasDifferentLiveSessionModelSelection() treats the same profile ID as a model-switch request and throws LiveSessionModelSwitchError before any attempt runs. On CLI paths that do not catch and rewrite this error, that can cause the turn to fail after exhausting candidates without actually calling a model.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85037da0e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T19:04:27Z

+    entry.authProfileOverrideSource === "auto"
+      ? "auto"
+      : normalizeSelectedAuthProfileSource(
+          authProfileId,
+          typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user",


Honor explicit user source before compaction fallback

This source resolution path can misclassify persisted manual profile locks: if a session entry contains authProfileOverrideSource: "user" plus legacy/stale authProfileOverrideCompactionCount, the explicit user source is ignored and recomputed as "auto", so resolveLiveSessionModelSelection drops the auth profile entirely. In that state, runs that currently have a user-pinned profile can be treated as changed and throw LiveSessionModelSwitchError before attempts. The compaction-count fallback should only run when authProfileOverrideSource is missing, not when it is explicitly "user".

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a82be707f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T19:14:49Z

+    entry.authProfileOverrideSource ??
+    (typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user");


Handle legacy auto auth overrides without source metadata

This fallback infers "auto" when authProfileOverrideSource is missing but authProfileOverrideCompactionCount exists, and resolvePersistedAuthProfileSelection() then drops the profile entirely. In CLI/agent runs, runEmbeddedPiAgent builds current from the raw session entry source (often still undefined for legacy stores), so hasDifferentLiveSessionModelSelection() treats current authProfileId vs next undefined as a real switch and can throw LiveSessionModelSwitchError before attempts. This is a regression for legacy session rows that have compaction metadata but no source field, causing unnecessary candidate skips/failures during fallback.

Useful? React with 👍 / 👎.

ether-btc · 2026-03-29T19:48:40Z

Staff Engineer Review: APPROVED

PR: fix(agents): preserve fallback candidates in live sessions
Author: goldmar

Root Cause Analysis: CORRECT

The fix properly addresses the root cause. resolveLiveSessionModelSelection() was using resolveDefaultModelForAgent() as the baseline whenever an agentId existed. During fallback retries, the runner compared the retry candidate (e.g., openai:gpt-5.4) against the agent-default baseline (e.g., anthropic:claude-opus-4-6) and threw LiveSessionModelSwitchError — treating internal retry candidates as user-initiated live switches.

The fix removes resolveDefaultModelForAgent() entirely and uses the passed-in defaultProvider/defaultModel (the current attempt's provider/model) as the baseline. Cross-provider fallback now works because the baseline moves with each retry attempt.

Code Quality: SOUND

normalizeSelectedAuthProfileSource — Clean utility. Correctly normalizes authProfileOverrideSource to auto | user | undefined. No edge cases missed.

resolvePersistedAuthProfileSelection — Correctly filters out auto-selected profiles during fallback. The compaction metadata heuristic is a sound backwards-compat approach for distinguishing legacy data.

hasDifferentLiveSessionModelSelection — The new comparison logic handles all edge cases:

Same profile + different source → detected as change
Missing → present → detected as change
Present → missing → detected as change
Both undefined → not a change

Return value consistency — The final return is cleaner: always returns authProfileIdSource directly. resolvePersistedAuthProfileSelection already handles the undefined case, so behavior is preserved.

Test Quality: STRONG

Unit tests cover all auth profile source edge cases including the tricky legacy/stale compaction metadata scenarios. The E2E tests directly validate the bug:

"falls back across providers for agent sessions without explicit model overrides" — the core fix
"ignores auto-selected persisted auth profiles when crossing providers during fallback" — regression coverage

Mock cleanup is correct: resolveDefaultModelForAgentMock was properly removed since that function is no longer called.

Minor Observation (non-blocking)

The compaction-count heuristic is a reasonable backwards-compat inference, but it's not obvious from the call site. A brief code comment would help future readers. Not worth blocking over.

VERDICT: APPROVED

The fix is correct, the tests are thorough, and the auth profile source handling is more rigorous than before. No blockers.

goldmar · 2026-03-29T21:19:59Z

@joshavant @tyler6204 I addressed the follow-up review comments and the current checks are looking healthy again. This should be ready for another look when you have a minute.

goldmar · 2026-03-30T05:04:03Z

Looks like this was independently addressed in bd89e07 on main — closing this one. Thanks for the quick fix!

fix(agents): preserve fallback candidates in live sessions

0a9bfa8

openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 29, 2026

fix(agents): ignore auto auth overrides in live switches

85e00a7

chatgpt-codex-connector Bot reviewed Mar 29, 2026

View reviewed changes

Agents: tolerate auto auth live baseline drift

b6cb6d8

openclaw-barnacle Bot added size: M and removed size: S labels Mar 29, 2026

chatgpt-codex-connector Bot reviewed Mar 29, 2026

View reviewed changes

Agents: normalize legacy live auth source

85037da

chatgpt-codex-connector Bot reviewed Mar 29, 2026

View reviewed changes

Agents: preserve explicit auth profile source

a82be70

chatgpt-codex-connector Bot reviewed Mar 29, 2026

View reviewed changes

Agents: preserve legacy auto auth fallback state

86af8c9

goldmar closed this Mar 30, 2026

goldmar deleted the fix/live-session-fallback-regression branch March 30, 2026 05:06

ms7-git mentioned this pull request Mar 31, 2026

[Bug]: LiveSessionModelSwitchError in cron jobs with payload.model override on 2026.3.28 #58459

Closed

		entry.authProfileOverrideSource ??
		(typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user");

Uh oh!

Conversation

goldmar commented Mar 29, 2026

Summary

Root cause

Fix

Validation

Uh oh!

greptile-apps Bot commented Mar 29, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

goldmar commented Mar 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

ether-btc commented Mar 29, 2026

Staff Engineer Review: APPROVED

Root Cause Analysis: CORRECT

Code Quality: SOUND

Test Quality: STRONG

Minor Observation (non-blocking)

VERDICT: APPROVED

Uh oh!

goldmar commented Mar 29, 2026

Uh oh!

goldmar commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants