fix: handle LiveSessionModelSwitchError in cron isolated sessions#57972
fix: handle LiveSessionModelSwitchError in cron isolated sessions#57972obviyus merged 3 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 637d62a54e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| provider = err.provider; | ||
| model = err.model; | ||
| fallbackProvider = err.provider; | ||
| fallbackModel = err.model; |
There was a problem hiding this comment.
Carry auth profile through model-switch retry
LiveSessionModelSwitchError is also used when only the auth profile changes (the live-selection diff includes authProfileId/authProfileIdSource), but this retry block updates only provider/model. Because runEmbeddedPiAgent is still called with the original authProfileId values captured earlier, a switch request that changes auth profile can keep throwing the same switch error and loop indefinitely inside while (true). Propagate err.authProfileId and err.authProfileIdSource into the retry state (and persisted session override as needed) before continue so the second attempt can actually converge.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR fixes a regression (introduced in the 2026-03-28 release) where cron jobs with
Confidence Score: 5/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/cron/isolated-agent/run.live-session-model-switch.test.ts
Line: 133-146
Comment:
**Weak assertion on model-switch persistence**
The test only checks that `updateSessionStoreMock` was called at all — but `updateSessionStore` is also invoked during the pre-run persistence step (before the first `runPrompt` attempt, where `modelProvider` and `model` are written for the "in-progress" display). This means the assertion would pass even if the model-switch branch were deleted, as long as the pre-run call still fires.
To actually guard the model-switch path, consider asserting the _specific arguments_ passed on the call triggered by the switch, e.g.:
```typescript
const calls = updateSessionStoreMock.mock.calls;
// The second call should carry the switched model
const switchedEntry = calls[calls.length - 1][1]; // updater fn or direct arg
// Or inspect cronSession.sessionEntry.model directly if the harness exposes it
```
Alternatively, a stronger check would verify `callCount === 2` **and** that the second `updateSessionStoreMock` invocation contained `provider: "anthropic", model: "claude-sonnet-4-6"`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/cron/isolated-agent/run.ts
Line: 571-578
Comment:
**`authProfileId`/`authProfileIdSource` not updated on model switch**
The reference implementation in `agent-runner-execution.ts` (lines 619–622) also propagates `authProfileId` and `authProfileIdSource` from the error on every switch:
```typescript
params.followupRun.run.authProfileId = err.authProfileId;
params.followupRun.run.authProfileIdSource = err.authProfileId ? err.authProfileIdSource : undefined;
```
The cron runner omits these updates because `authProfileId` is a `const` at line 430. For isolated sessions, the auth profile is always resolved fresh before the first attempt (via `resolveSessionAuthProfileOverride`), so in practice the error's `authProfileId` should match the already-resolved value and this divergence is harmless. However, if the embedded runner ever returns a `LiveSessionModelSwitchError` with a _different_ auth-profile hint (e.g., the session store's `authProfileOverride` was written by an external agent between the resolution step and the actual run), the retry would proceed with the stale auth profile and could fail with a 401.
Consider converting `authProfileId` (and `authProfileIdSource`) to `let` and updating them in the catch block to stay aligned with the reference pattern.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: handle LiveSessionModelSwitchError ..." | Re-trigger Greptile |
| } catch (err) { | ||
| if (err instanceof LiveSessionModelSwitchError) { | ||
| provider = err.provider; | ||
| model = err.model; | ||
| fallbackProvider = err.provider; | ||
| fallbackModel = err.model; | ||
| cronSession.sessionEntry.modelProvider = err.provider; | ||
| cronSession.sessionEntry.model = err.model; |
There was a problem hiding this comment.
authProfileId/authProfileIdSource not updated on model switch
The reference implementation in agent-runner-execution.ts (lines 619–622) also propagates authProfileId and authProfileIdSource from the error on every switch:
params.followupRun.run.authProfileId = err.authProfileId;
params.followupRun.run.authProfileIdSource = err.authProfileId ? err.authProfileIdSource : undefined;The cron runner omits these updates because authProfileId is a const at line 430. For isolated sessions, the auth profile is always resolved fresh before the first attempt (via resolveSessionAuthProfileOverride), so in practice the error's authProfileId should match the already-resolved value and this divergence is harmless. However, if the embedded runner ever returns a LiveSessionModelSwitchError with a different auth-profile hint (e.g., the session store's authProfileOverride was written by an external agent between the resolution step and the actual run), the retry would proceed with the stale auth profile and could fail with a 401.
Consider converting authProfileId (and authProfileIdSource) to let and updating them in the catch block to stay aligned with the reference pattern.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/isolated-agent/run.ts
Line: 571-578
Comment:
**`authProfileId`/`authProfileIdSource` not updated on model switch**
The reference implementation in `agent-runner-execution.ts` (lines 619–622) also propagates `authProfileId` and `authProfileIdSource` from the error on every switch:
```typescript
params.followupRun.run.authProfileId = err.authProfileId;
params.followupRun.run.authProfileIdSource = err.authProfileId ? err.authProfileIdSource : undefined;
```
The cron runner omits these updates because `authProfileId` is a `const` at line 430. For isolated sessions, the auth profile is always resolved fresh before the first attempt (via `resolveSessionAuthProfileOverride`), so in practice the error's `authProfileId` should match the already-resolved value and this divergence is harmless. However, if the embedded runner ever returns a `LiveSessionModelSwitchError` with a _different_ auth-profile hint (e.g., the session store's `authProfileOverride` was written by an external agent between the resolution step and the actual run), the retry would proceed with the stale auth profile and could fail with a 401.
Consider converting `authProfileId` (and `authProfileIdSource`) to `let` and updating them in the catch block to stay aligned with the reference pattern.
How can I resolve this? If you propose a fix, please make it concise.The main agent runner catches LiveSessionModelSwitchError and retries with the requested model, but cron isolated sessions hit this error and fail immediately. This extends the retry to cover cron execution. When a cron job with `sessionTarget: 'isolated'` specifies a `model` different from the agent's primary, the embedded runner throws LiveSessionModelSwitchError (because the session initialized with the wrong model). The fix wraps the initial runPrompt call in a retry loop that catches this error, updates provider/model state, and re-runs — mirroring the existing retry logic in agent-runner-execution.ts. Fixes openclaw#57206
fd33bac to
baecfbb
Compare
obviyus
left a comment
There was a problem hiding this comment.
Reviewed latest changes; landing now.
@issaba1) * fix: handle LiveSessionModelSwitchError in cron isolated sessions The main agent runner catches LiveSessionModelSwitchError and retries with the requested model, but cron isolated sessions hit this error and fail immediately. This extends the retry to cover cron execution. When a cron job with `sessionTarget: 'isolated'` specifies a `model` different from the agent's primary, the embedded runner throws LiveSessionModelSwitchError (because the session initialized with the wrong model). The fix wraps the initial runPrompt call in a retry loop that catches this error, updates provider/model state, and re-runs — mirroring the existing retry logic in agent-runner-execution.ts. Fixes openclaw#57206 * fix: carry auth profile through cron model retry * fix: complete cron isolated model-switch retry (openclaw#57972) (thanks @issaba1) --------- Co-authored-by: Isaac Saba <isaacsaba@Isaacs-Mac-mini.local> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
@issaba1) * fix: handle LiveSessionModelSwitchError in cron isolated sessions The main agent runner catches LiveSessionModelSwitchError and retries with the requested model, but cron isolated sessions hit this error and fail immediately. This extends the retry to cover cron execution. When a cron job with `sessionTarget: 'isolated'` specifies a `model` different from the agent's primary, the embedded runner throws LiveSessionModelSwitchError (because the session initialized with the wrong model). The fix wraps the initial runPrompt call in a retry loop that catches this error, updates provider/model state, and re-runs — mirroring the existing retry logic in agent-runner-execution.ts. Fixes openclaw#57206 * fix: carry auth profile through cron model retry * fix: complete cron isolated model-switch retry (openclaw#57972) (thanks @issaba1) --------- Co-authored-by: Isaac Saba <isaacsaba@Isaacs-Mac-mini.local> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
@issaba1) * fix: handle LiveSessionModelSwitchError in cron isolated sessions The main agent runner catches LiveSessionModelSwitchError and retries with the requested model, but cron isolated sessions hit this error and fail immediately. This extends the retry to cover cron execution. When a cron job with `sessionTarget: 'isolated'` specifies a `model` different from the agent's primary, the embedded runner throws LiveSessionModelSwitchError (because the session initialized with the wrong model). The fix wraps the initial runPrompt call in a retry loop that catches this error, updates provider/model state, and re-runs — mirroring the existing retry logic in agent-runner-execution.ts. Fixes openclaw#57206 * fix: carry auth profile through cron model retry * fix: complete cron isolated model-switch retry (openclaw#57972) (thanks @issaba1) --------- Co-authored-by: Isaac Saba <isaacsaba@Isaacs-Mac-mini.local> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
@issaba1) * fix: handle LiveSessionModelSwitchError in cron isolated sessions The main agent runner catches LiveSessionModelSwitchError and retries with the requested model, but cron isolated sessions hit this error and fail immediately. This extends the retry to cover cron execution. When a cron job with `sessionTarget: 'isolated'` specifies a `model` different from the agent's primary, the embedded runner throws LiveSessionModelSwitchError (because the session initialized with the wrong model). The fix wraps the initial runPrompt call in a retry loop that catches this error, updates provider/model state, and re-runs — mirroring the existing retry logic in agent-runner-execution.ts. Fixes openclaw#57206 * fix: carry auth profile through cron model retry * fix: complete cron isolated model-switch retry (openclaw#57972) (thanks @issaba1) --------- Co-authored-by: Isaac Saba <isaacsaba@Isaacs-Mac-mini.local> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
@issaba1) * fix: handle LiveSessionModelSwitchError in cron isolated sessions The main agent runner catches LiveSessionModelSwitchError and retries with the requested model, but cron isolated sessions hit this error and fail immediately. This extends the retry to cover cron execution. When a cron job with `sessionTarget: 'isolated'` specifies a `model` different from the agent's primary, the embedded runner throws LiveSessionModelSwitchError (because the session initialized with the wrong model). The fix wraps the initial runPrompt call in a retry loop that catches this error, updates provider/model state, and re-runs — mirroring the existing retry logic in agent-runner-execution.ts. Fixes openclaw#57206 * fix: carry auth profile through cron model retry * fix: complete cron isolated model-switch retry (openclaw#57972) (thanks @issaba1) --------- Co-authored-by: Isaac Saba <isaacsaba@Isaacs-Mac-mini.local> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Problem
When a cron job with
sessionTarget: 'isolated'specifies amodeldifferent from the agent's primary, the session throwsLiveSessionModelSwitchErrorand fails immediately.The embedded runner initializes the isolated session using the gateway's default agent model. When the cron payload carries a different model override, the runner detects the mismatch mid-execution and throws
LiveSessionModelSwitchError. The main agent runner (agent-runner-execution.ts) already catches this error in a retry loop and re-runs with the requested model — but the cron isolated session runner (cron/isolated-agent/run.ts) does not, so every affected cron job fails.Fix
Wrap the initial
runPrompt(commandBody)call in awhile (true)retry loop that catchesLiveSessionModelSwitchError, updates theprovider/modelstate (and persists it to the session store), andcontinues — mirroring the existing retry pattern inagent-runner-execution.tsexactly.Changed files:
src/cron/isolated-agent/run.ts— addsLiveSessionModelSwitchErrorimport + retry loop around the initialrunPromptcallsrc/cron/isolated-agent/run.live-session-model-switch.test.ts— new test suite covering retry success, session entry update, non-infinite-loop on repeated errors, and no-retry on other errorsImpact
Testing
src/cron/isolated-agent/run.tsandsrc/agents/live-model-switch.tsagent-runner-execution.ts(lines 615–625)Fixes #57206