Summary
Cron jobs that specify model: openai/gpt-5.4-nano in their payload are not using that model when they execute in isolated sessions. Instead, resolvePersistedLiveSelection() forces the session back to anthropic/claude-sonnet-4-6, overriding the cron's declared model intent. Under Anthropic overload conditions, this causes every cron to amplify the overload instead of gracefully degrading.
Root Cause
The scheduler fires multiple crons simultaneously (observed: 24 UUIDs in one burst). Each cron session goes through resolvePersistedLiveSelection(), which promotes the persisted Sonnet selection over payload.model. This means:
- Crons configured for nano/codex still hit Anthropic Sonnet
- A burst of 24 simultaneous crons = 24 simultaneous Sonnet requests under overload
- Each gets a 503, retries, and amplifies the cascade
Impact
Observed 2026-03-31: SLBE nightly optimizer (configured model: openai/gpt-5.4-nano) failed at 22:00 PT with model_fallback_decision: candidate_failed because the effective model was Sonnet, not nano.
Expected Behavior
- Isolated cron sessions should honor
payload.model as the effective model
resolvePersistedLiveSelection() should not apply to ephemeral/isolated cron sessions
- The scheduler should add jitter to cron bursts (stagger simultaneous crons by 2-5s each)
Proposed Fix
- Skip
resolvePersistedLiveSelection() for sessions with runtime: isolated - isolated sessions are ephemeral and have no meaningful persisted state to restore
- Add scheduler jitter: when N crons are due at the same tick, spread them across a configurable window (default: 5s)
- Honor
payload.model as authoritative for isolated sessions
Related
#24378 #32533
Summary
Cron jobs that specify
model: openai/gpt-5.4-nanoin their payload are not using that model when they execute in isolated sessions. Instead,resolvePersistedLiveSelection()forces the session back toanthropic/claude-sonnet-4-6, overriding the cron's declared model intent. Under Anthropic overload conditions, this causes every cron to amplify the overload instead of gracefully degrading.Root Cause
The scheduler fires multiple crons simultaneously (observed: 24 UUIDs in one burst). Each cron session goes through
resolvePersistedLiveSelection(), which promotes the persisted Sonnet selection overpayload.model. This means:Impact
Observed 2026-03-31: SLBE nightly optimizer (configured
model: openai/gpt-5.4-nano) failed at 22:00 PT withmodel_fallback_decision: candidate_failedbecause the effective model was Sonnet, not nano.Expected Behavior
payload.modelas the effective modelresolvePersistedLiveSelection()should not apply to ephemeral/isolated cron sessionsProposed Fix
resolvePersistedLiveSelection()for sessions withruntime: isolated- isolated sessions are ephemeral and have no meaningful persisted state to restorepayload.modelas authoritative for isolated sessionsRelated
#24378 #32533