fix(agents): preserve fallback candidates in live sessions#57227
fix(agents): preserve fallback candidates in live sessions#57227goldmar wants to merge 6 commits into
Conversation
Greptile SummaryThis PR fixes a bug in Changes:
Confidence Score: 5/5Safe to merge — targeted, well-understood fix with matching unit and e2e regression coverage. The change is minimal and surgical: one function call removed, two field accesses simplified to use the already-correct caller-supplied values. The call site in run.ts already passes the current attempt's provider/model, so the fix is properly integrated. New unit test and e2e regression test both cover the exact scenario described in the root cause. No P0 or P1 issues found. No files require special attention.
|
| Filename | Overview |
|---|---|
| src/agents/live-model-switch.ts | Core fix: removes resolveDefaultModelForAgent call and uses params.defaultProvider/defaultModel directly as the session baseline, preventing false LiveSessionModelSwitchError during fallback. |
| src/agents/live-model-switch.test.ts | Removes now-unnecessary resolveDefaultModelForAgent mock and adds a dedicated unit test for the no-override fallback path. |
| src/agents/model-fallback.run-embedded.e2e.test.ts | Adds agentId param to runEmbeddedFallback and a new e2e regression test covering cross-provider fallback for agent sessions without explicit session store overrides. |
Reviews (1): Last reviewed commit: "fix(agents): preserve fallback candidate..." | Re-trigger Greptile
|
Follow-up validation after local deploy:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85e00a7252
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (authProfileIdSource !== "user") { | ||
| return { | ||
| authProfileId: undefined, | ||
| authProfileIdSource: undefined, | ||
| }; |
There was a problem hiding this comment.
Preserve auto auth overrides in live-selection baseline
Dropping every non-user authProfileOverride here makes persisted session state disagree with the runner’s current state in normal flows where the override source is auto (for example, overrides persisted by resolveSessionAuthProfileOverride during /btw, then forwarded by runAgentAttempt into runEmbeddedPiAgent). In that case resolveLiveSessionModelSelection now returns authProfileId: undefined, hasDifferentLiveSessionModelSelection flags a change before any attempt, and the run throws LiveSessionModelSwitchError, which can skip the primary candidate or fail the turn when no fallback remains.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b6cb6d8cf7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const authProfileIdSource = | ||
| entry.authProfileOverrideSource ?? | ||
| (typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user"); |
There was a problem hiding this comment.
Keep unknown auth override source from forcing live switch
When a session entry has authProfileOverride but no authProfileOverrideSource (legacy/manual store data), this fallback now coerces the source to "user". runEmbeddedPiAgent still builds current from the raw session entry source (often undefined), so hasDifferentLiveSessionModelSelection() treats the same profile ID as a model-switch request and throws LiveSessionModelSwitchError before any attempt runs. On CLI paths that do not catch and rewrite this error, that can cause the turn to fail after exhausting candidates without actually calling a model.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85037da0e2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| entry.authProfileOverrideSource === "auto" | ||
| ? "auto" | ||
| : normalizeSelectedAuthProfileSource( | ||
| authProfileId, | ||
| typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user", |
There was a problem hiding this comment.
Honor explicit user source before compaction fallback
This source resolution path can misclassify persisted manual profile locks: if a session entry contains authProfileOverrideSource: "user" plus legacy/stale authProfileOverrideCompactionCount, the explicit user source is ignored and recomputed as "auto", so resolveLiveSessionModelSelection drops the auth profile entirely. In that state, runs that currently have a user-pinned profile can be treated as changed and throw LiveSessionModelSwitchError before attempts. The compaction-count fallback should only run when authProfileOverrideSource is missing, not when it is explicitly "user".
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a82be707f6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| entry.authProfileOverrideSource ?? | ||
| (typeof entry.authProfileOverrideCompactionCount === "number" ? "auto" : "user"); |
There was a problem hiding this comment.
Handle legacy auto auth overrides without source metadata
This fallback infers "auto" when authProfileOverrideSource is missing but authProfileOverrideCompactionCount exists, and resolvePersistedAuthProfileSelection() then drops the profile entirely. In CLI/agent runs, runEmbeddedPiAgent builds current from the raw session entry source (often still undefined for legacy stores), so hasDifferentLiveSessionModelSelection() treats current authProfileId vs next undefined as a real switch and can throw LiveSessionModelSwitchError before attempts. This is a regression for legacy session rows that have compaction metadata but no source field, causing unnecessary candidate skips/failures during fallback.
Useful? React with 👍 / 👎.
Staff Engineer Review: APPROVEDPR: fix(agents): preserve fallback candidates in live sessions Root Cause Analysis: CORRECTThe fix properly addresses the root cause. The fix removes Code Quality: SOUNDnormalizeSelectedAuthProfileSource — Clean utility. Correctly normalizes resolvePersistedAuthProfileSelection — Correctly filters out auto-selected profiles during fallback. The compaction metadata heuristic is a sound backwards-compat approach for distinguishing legacy data. hasDifferentLiveSessionModelSelection — The new comparison logic handles all edge cases:
Return value consistency — The final return is cleaner: always returns Test Quality: STRONGUnit tests cover all auth profile source edge cases including the tricky legacy/stale compaction metadata scenarios. The E2E tests directly validate the bug:
Mock cleanup is correct: Minor Observation (non-blocking)The compaction-count heuristic is a reasonable backwards-compat inference, but it's not obvious from the call site. A brief code comment would help future readers. Not worth blocking over. VERDICT: APPROVEDThe fix is correct, the tests are thorough, and the auth profile source handling is more rigorous than before. No blockers. |
|
@joshavant @tyler6204 I addressed the follow-up review comments and the current checks are looking healthy again. This should be ready for another look when you have a minute. |
|
Looks like this was independently addressed in bd89e07 on main — closing this one. Thanks for the quick fix! |
Summary
/modelbehavior by still honoring stored session overrides from the session storeRoot cause
resolveLiveSessionModelSelection()treated the agent default model as the persisted live-session selection whenever anagentIdexisted. During fallback, the embedded runner compared the current retry candidate against that agent default and threwLiveSessionModelSwitchError, which made internal fallback candidates look like user-requested live model switches.Fix
Use the current attempt's provider/model as the default live-session selection baseline. Only switch away when the session store has an explicit override.
Validation
pnpm --dir /home/openclaw/openclaw exec vitest run src/agents/live-model-switch.test.tspnpm --dir /home/openclaw/openclaw exec vitest run --config vitest.e2e.config.ts src/agents/model-fallback.run-embedded.e2e.test.tsdist/, restarted the gateway, reverted the live config back toanthropic/claude-sonnet-4-6, and verified a direct agent turn fell back toopenai-codex/gpt-5.4openclaw agent --agent main --message 'Reply with FALLBACK_FIX_OK and nothing else.' --jsona2ac9644-34bf-41db-9457-50261e59b067:429 rate_limit_erroropenai-codex/gpt-5.4live session model switch detected before attemptlog entries were emitted for that run