Embedded agent runs do not use model fallback chain

# Bug: Embedded agent runs do not use model fallback chain

## Summary

When the primary model returns overloaded (503) errors, the main agent lane correctly falls back to the configured fallback model. However, embedded agent runs (subagents, followups, heartbeats) do not trigger the model fallback mechanism and instead retry the primary model indefinitely until failure.

## Environment

- OpenClaw 2026.3.24 (cff6dc9)
- Ubuntu 22.04 on DigitalOcean (4 vCPU / 8 GB)
- Node 22.22.1
- Gateway: systemd service, loopback bind

## Model configuration

- Default: `anthropic/claude-sonnet-4-6`
- Fallback #1: `openai/gpt-4.1`
- Both models had valid auth profiles in `auth-profiles.json`

## Steps to reproduce

1. Configure a primary model and a fallback model with valid API keys
2. Wait for the primary model's API to return 503 overloaded errors
3. Send a message via Telegram (or any channel)

## Expected behavior

All agent runs (main lane AND embedded) should fall back to the configured fallback model when the primary model is overloaded.

## Actual behavior

- **Main lane run**: Correctly triggers `model_fallback_decision`, falls back to `openai/gpt-4.1`, and succeeds.
- **Embedded agent runs**: Only emit `embedded_run_agent_end` with `isError: true` and `failoverReason: "overloaded"`. No `model_fallback_decision` or `embedded_run_failover_decision` is logged for these runs. They retry the primary model multiple times and then fail without attempting the fallback.

## Log evidence

Main lane (working fallback):
```
model_fallback_decision: candidate_failed (anthropic/claude-sonnet-4-6, overloaded)
model_fallback_decision: candidate_succeeded (openai/gpt-4.1)
```

Embedded runs (no fallback):
```
embedded_run_agent_end: isError=true, model=claude-sonnet-4-6, failoverReason=overloaded
embedded_run_agent_end: isError=true, model=claude-sonnet-4-6, failoverReason=overloaded
embedded_run_agent_end: isError=true, model=claude-sonnet-4-6, failoverReason=overloaded
(repeats ~10 times with no fallback attempt)
```

## Impact

During an API outage affecting the primary model, the bot becomes partially non-functional even when a healthy fallback model is configured. The user-facing reply may succeed (via main lane fallback), but embedded runs (followups, heartbeats, tool execution) continue to fail, causing error messages and degraded behavior.

## Workaround

Switch the default model itself to the fallback model:
```
openclaw models set openai/gpt-4.1
openclaw gateway restart
```

This forces all runs (main and embedded) to use the working model directly rather than relying on the fallback chain.

## Date observed

2026-03-25, during Anthropic incident "Elevated errors on Claude Opus 4.6" (status.claude.com)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Embedded agent runs do not use model fallback chain #54698

Bug: Embedded agent runs do not use model fallback chain

Summary

Environment

Model configuration

Steps to reproduce

Expected behavior

Actual behavior

Log evidence

Impact

Workaround

Date observed

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Embedded agent runs do not use model fallback chain #54698

Description

Bug: Embedded agent runs do not use model fallback chain

Summary

Environment

Model configuration

Steps to reproduce

Expected behavior

Actual behavior

Log evidence

Impact

Workaround

Date observed

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions