fix: prefer queued.run.agentId over session key parsing for model fallback resolution by derricksy · Pull Request #24137 · openclaw/openclaw

derricksy · 2026-02-23T04:04:26Z

Summary

When a subagent spawned via `sessions_spawn` hits a rate limit or quota error, the configured `model.fallbacks` chain does not trigger. The agent retries the same primary model repeatedly and fails permanently instead of falling back to the next model.

Root Cause

In two places, the queue/followup runner resolves the agent ID for fallback lookup by parsing it from the session key string:

```ts
// src/auto-reply/reply/followup-runner.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
queued.run.config,
resolveAgentIdFromSessionKey(queued.run.sessionKey), // ← brittle
)

// src/auto-reply/reply/agent-runner-utils.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
run.config,
resolveAgentIdFromSessionKey(run.sessionKey), // ← same issue
)
```

`resolveAgentIdFromSessionKey` calls `parseAgentSessionKey`, which requires the format `agent::`. If `sessionKey` is absent, undefined, or in an unexpected format, it falls back to `DEFAULT_AGENT_ID` (the main agent). The main agent typically has no fallbacks configured, so `resolveFallbackCandidates` builds a single-entry candidates array — and the fallback chain never fires.

Both `queued.run.agentId` and `run.agentId` are already explicitly set on the run object (passed directly to `runEmbeddedPiAgent` in the same call site), making the string-parsing approach redundant and fragile.

Fix

Prefer the explicit `agentId` field directly, falling back to session key parsing only when it's absent:

```ts
// followup-runner.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
queued.run.config,
queued.run.agentId ?? resolveAgentIdFromSessionKey(queued.run.sessionKey),
)

// agent-runner-utils.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
run.config,
run.agentId ?? resolveAgentIdFromSessionKey(run.sessionKey),
)
```

Verification

Confirmed `queued.run.agentId` / `run.agentId` are set on all spawned subagent run objects
Confirmed the 429 quota error from `openai-codex` correctly classifies as `"rate_limit"` (not `"billing"`) via `classifyFailoverReason` — error classification is not a separate issue
Confirmed `resolveAgentModelFallbacksOverride` correctly returns the configured fallbacks array when given the correct `agentId`
Patched compiled JS locally, restarted gateway — swarm running clean across 5 isolated agents

Reproduction

Configure an agent in `agents.list` with `model.primary = openai-codex/gpt-5.2-codex` and `model.fallbacks = [kimi-coding/k2p5, minimax-portal/MiniMax-M2.5]`
Spawn a subagent using `sessions_spawn` with `agentId` set to that agent
Exhaust the agent's Codex token quota
Observe: agent retries `openai-codex` 4x with `stopReason: error`, fallback models never attempted

Related Issues

Possibly related to #11972, #19249, #5744. Addresses symptom reported in #24102.

OpenClaw version tested

2026.2.17 (macOS 14.x, npm global install)

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-23T04:07:08Z

src/auto-reply/reply/followup-runner.ts

@@ -133,7 +133,7 @@ export function createFollowupRunner(params: {
          agentDir: queued.run.agentDir,
          fallbacksOverride: resolveAgentModelFallbacksOverride(
            queued.run.config,
-            resolveAgentIdFromSessionKey(queued.run.sessionKey),
+            queued.run.agentId ?? resolveAgentIdFromSessionKey(queued.run.sessionKey),


Incomplete fix - same pattern exists in agent-runner-utils.ts. The resolveModelFallbackOptions function at src/auto-reply/reply/agent-runner-utils.ts:140-151 still uses resolveAgentIdFromSessionKey(run.sessionKey) without preferring run.agentId. This function is called by agent-runner-execution.ts:173 and agent-runner-memory.ts:100, so those code paths will still have the fallback issue.

Apply the same fix there:

export function resolveModelFallbackOptions(run: FollowupRun["run"]) { return { cfg: run.config, provider: run.provider, model: run.model, agentDir: run.agentDir, fallbacksOverride: resolveAgentModelFallbacksOverride( run.config, run.agentId ?? resolveAgentIdFromSessionKey(run.sessionKey), ), }; }

Prompt To Fix With AI

This is a comment left during a code review. Path: src/auto-reply/reply/followup-runner.ts Line: 136 Comment: Incomplete fix - same pattern exists in `agent-runner-utils.ts`. The `resolveModelFallbackOptions` function at `src/auto-reply/reply/agent-runner-utils.ts:140-151` still uses `resolveAgentIdFromSessionKey(run.sessionKey)` without preferring `run.agentId`. This function is called by `agent-runner-execution.ts:173` and `agent-runner-memory.ts:100`, so those code paths will still have the fallback issue. Apply the same fix there: ```typescript export function resolveModelFallbackOptions(run: FollowupRun["run"]) { return { cfg: run.config, provider: run.provider, model: run.model, agentDir: run.agentDir, fallbacksOverride: resolveAgentModelFallbacksOverride( run.config, run.agentId ?? resolveAgentIdFromSessionKey(run.sessionKey), ), }; } ``` How can I resolve this? If you propose a fix, please make it concise.

…lback resolution When a subagent spawned via sessions_spawn hits a rate limit, the model fallback chain fails to trigger because the fallbacksOverride is resolved using resolveAgentIdFromSessionKey(queued.run.sessionKey), which requires the session key to be in 'agent:<agentId>:<rest>' format. If the session key is absent, undefined, or in an unexpected format, resolveAgentIdFromSessionKey falls back to DEFAULT_AGENT_ID (main agent), which typically has no fallbacks configured — resulting in a single-candidate array and no model fallback. queued.run.agentId is already explicitly set on the run object (it is passed directly to runEmbeddedPiAgent three lines below this call), making the session key string-parsing approach redundant and fragile. Fix: prefer queued.run.agentId directly, falling back to session key parsing only when agentId is absent. Fixes: resolves the symptom reported in openclaw#24102 Related: openclaw#11972, openclaw#19249, openclaw#5744

steipete · 2026-02-25T04:36:09Z

Closing this one as not needed in current flow.

Reasoning:

Subagent/session flows already carry canonical agent:<id>:... keys.
followupRun.run.agentId is already set at construction and used downstream.
This PR also includes a large runtime log artifact (openclaw-2026-02-23.log), so even as hardening it would need cleanup.

If we still want defensive hardening, we can do a tiny clean follow-up with only:

queued.run.agentId ?? resolveAgentIdFromSessionKey(...)
run.agentId ?? resolveAgentIdFromSessionKey(...)

for readability/safety, no behavior change intended in normal paths.

openclaw-barnacle bot added the size: XS label Feb 23, 2026

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

derricksy force-pushed the fix/subagent-model-fallback-agentid branch 5 times, most recently from b0d7cac to eb6888f Compare February 23, 2026 04:46

derricksy force-pushed the fix/subagent-model-fallback-agentid branch from eb6888f to 46102a7 Compare February 23, 2026 05:01

openclaw-barnacle bot added size: L and removed size: XS labels Feb 23, 2026

steipete closed this Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prefer queued.run.agentId over session key parsing for model fallback resolution#24137

fix: prefer queued.run.agentId over session key parsing for model fallback resolution#24137
derricksy wants to merge 1 commit intoopenclaw:mainfrom
derricksy:fix/subagent-model-fallback-agentid

derricksy commented Feb 23, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 23, 2026

Uh oh!

steipete commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

derricksy commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Verification

Reproduction

Related Issues

OpenClaw version tested

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

derricksy commented Feb 23, 2026 •

edited

Loading