Skip to content

fix: prefer queued.run.agentId over session key parsing for model fallback resolution#24137

Closed
derricksy wants to merge 1 commit intoopenclaw:mainfrom
derricksy:fix/subagent-model-fallback-agentid
Closed

fix: prefer queued.run.agentId over session key parsing for model fallback resolution#24137
derricksy wants to merge 1 commit intoopenclaw:mainfrom
derricksy:fix/subagent-model-fallback-agentid

Conversation

@derricksy
Copy link

@derricksy derricksy commented Feb 23, 2026

Summary

When a subagent spawned via `sessions_spawn` hits a rate limit or quota error, the configured `model.fallbacks` chain does not trigger. The agent retries the same primary model repeatedly and fails permanently instead of falling back to the next model.

Root Cause

In two places, the queue/followup runner resolves the agent ID for fallback lookup by parsing it from the session key string:

```ts
// src/auto-reply/reply/followup-runner.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
queued.run.config,
resolveAgentIdFromSessionKey(queued.run.sessionKey), // ← brittle
)

// src/auto-reply/reply/agent-runner-utils.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
run.config,
resolveAgentIdFromSessionKey(run.sessionKey), // ← same issue
)
```

`resolveAgentIdFromSessionKey` calls `parseAgentSessionKey`, which requires the format `agent::`. If `sessionKey` is absent, undefined, or in an unexpected format, it falls back to `DEFAULT_AGENT_ID` (the main agent). The main agent typically has no fallbacks configured, so `resolveFallbackCandidates` builds a single-entry candidates array — and the fallback chain never fires.

Both `queued.run.agentId` and `run.agentId` are already explicitly set on the run object (passed directly to `runEmbeddedPiAgent` in the same call site), making the string-parsing approach redundant and fragile.

Fix

Prefer the explicit `agentId` field directly, falling back to session key parsing only when it's absent:

```ts
// followup-runner.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
queued.run.config,
queued.run.agentId ?? resolveAgentIdFromSessionKey(queued.run.sessionKey),
)

// agent-runner-utils.ts
fallbacksOverride: resolveAgentModelFallbacksOverride(
run.config,
run.agentId ?? resolveAgentIdFromSessionKey(run.sessionKey),
)
```

Verification

  • Confirmed `queued.run.agentId` / `run.agentId` are set on all spawned subagent run objects
  • Confirmed the 429 quota error from `openai-codex` correctly classifies as `"rate_limit"` (not `"billing"`) via `classifyFailoverReason` — error classification is not a separate issue
  • Confirmed `resolveAgentModelFallbacksOverride` correctly returns the configured fallbacks array when given the correct `agentId`
  • Patched compiled JS locally, restarted gateway — swarm running clean across 5 isolated agents

Reproduction

  1. Configure an agent in `agents.list` with `model.primary = openai-codex/gpt-5.2-codex` and `model.fallbacks = [kimi-coding/k2p5, minimax-portal/MiniMax-M2.5]`
  2. Spawn a subagent using `sessions_spawn` with `agentId` set to that agent
  3. Exhaust the agent's Codex token quota
  4. Observe: agent retries `openai-codex` 4x with `stopReason: error`, fallback models never attempted

Related Issues

Possibly related to #11972, #19249, #5744. Addresses symptom reported in #24102.

OpenClaw version tested

2026.2.17 (macOS 14.x, npm global install)

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@@ -133,7 +133,7 @@ export function createFollowupRunner(params: {
agentDir: queued.run.agentDir,
fallbacksOverride: resolveAgentModelFallbacksOverride(
queued.run.config,
resolveAgentIdFromSessionKey(queued.run.sessionKey),
queued.run.agentId ?? resolveAgentIdFromSessionKey(queued.run.sessionKey),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete fix - same pattern exists in agent-runner-utils.ts. The resolveModelFallbackOptions function at src/auto-reply/reply/agent-runner-utils.ts:140-151 still uses resolveAgentIdFromSessionKey(run.sessionKey) without preferring run.agentId. This function is called by agent-runner-execution.ts:173 and agent-runner-memory.ts:100, so those code paths will still have the fallback issue.

Apply the same fix there:

export function resolveModelFallbackOptions(run: FollowupRun["run"]) {
  return {
    cfg: run.config,
    provider: run.provider,
    model: run.model,
    agentDir: run.agentDir,
    fallbacksOverride: resolveAgentModelFallbacksOverride(
      run.config,
      run.agentId ?? resolveAgentIdFromSessionKey(run.sessionKey),
    ),
  };
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/followup-runner.ts
Line: 136

Comment:
Incomplete fix - same pattern exists in `agent-runner-utils.ts`. The `resolveModelFallbackOptions` function at `src/auto-reply/reply/agent-runner-utils.ts:140-151` still uses `resolveAgentIdFromSessionKey(run.sessionKey)` without preferring `run.agentId`. This function is called by `agent-runner-execution.ts:173` and `agent-runner-memory.ts:100`, so those code paths will still have the fallback issue.

Apply the same fix there:

```typescript
export function resolveModelFallbackOptions(run: FollowupRun["run"]) {
  return {
    cfg: run.config,
    provider: run.provider,
    model: run.model,
    agentDir: run.agentDir,
    fallbacksOverride: resolveAgentModelFallbacksOverride(
      run.config,
      run.agentId ?? resolveAgentIdFromSessionKey(run.sessionKey),
    ),
  };
}
```

How can I resolve this? If you propose a fix, please make it concise.

@derricksy derricksy force-pushed the fix/subagent-model-fallback-agentid branch 5 times, most recently from b0d7cac to eb6888f Compare February 23, 2026 04:46
…lback resolution

When a subagent spawned via sessions_spawn hits a rate limit, the model
fallback chain fails to trigger because the fallbacksOverride is resolved
using resolveAgentIdFromSessionKey(queued.run.sessionKey), which requires
the session key to be in 'agent:<agentId>:<rest>' format.

If the session key is absent, undefined, or in an unexpected format,
resolveAgentIdFromSessionKey falls back to DEFAULT_AGENT_ID (main agent),
which typically has no fallbacks configured — resulting in a single-candidate
array and no model fallback.

queued.run.agentId is already explicitly set on the run object (it is passed
directly to runEmbeddedPiAgent three lines below this call), making the
session key string-parsing approach redundant and fragile.

Fix: prefer queued.run.agentId directly, falling back to session key parsing
only when agentId is absent.

Fixes: resolves the symptom reported in openclaw#24102
Related: openclaw#11972, openclaw#19249, openclaw#5744
@derricksy derricksy force-pushed the fix/subagent-model-fallback-agentid branch from eb6888f to 46102a7 Compare February 23, 2026 05:01
@steipete
Copy link
Contributor

Closing this one as not needed in current flow.

Reasoning:

  • Subagent/session flows already carry canonical agent:<id>:... keys.
  • followupRun.run.agentId is already set at construction and used downstream.
  • This PR also includes a large runtime log artifact (openclaw-2026-02-23.log), so even as hardening it would need cleanup.

If we still want defensive hardening, we can do a tiny clean follow-up with only:

  • queued.run.agentId ?? resolveAgentIdFromSessionKey(...)
  • run.agentId ?? resolveAgentIdFromSessionKey(...)

for readability/safety, no behavior change intended in normal paths.

@steipete steipete closed this Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants