Bug: Gateway falsely marks healthy local vLLM endpoints as timed out/overloaded, causing 1–23 min fallback cascades

## Summary

The gateway's model-fallback/routing subsystem incorrectly marks healthy, responsive local vLLM endpoints as "timed out" or "overloaded", causing cascading fallback chains that take **1–23 minutes** to resolve. The endpoints themselves respond in **0.27–0.93s** when tested directly via curl.

## Environment

- OpenClaw: 2026.4.5 (container, Linux 6.12.63 x64)
- Gateway: loopback bind, port 18789
- Local vLLM endpoints:
  - `vllm-8001` (gemma4, 27B) on jupiter.wg.local:8001 — dedicated GPU
  - `vllm-7002` (qwen3.5-27b) on jupiter.wg.local:7002 — dedicated GPU
- Remote providers: Novita (GLM-5, Kimi), DeepInfra (Kimi), Anthropic (Sonnet)
- Config: `agents.defaults.timeoutSeconds: 1200`, `agents.defaults.llm.idleTimeoutSeconds: 300`

## Observed Behaviour

### 1. Endpoints are fast (direct curl, concurrent)

Both GPUs idle, tested concurrently:

| Endpoint | Avg latency (5 reqs) |
|----------|---------------------|
| vllm-8001/gemma4 | **0.28s** |
| vllm-7002/qwen3.5-27b | **0.91s** |

### 2. Gateway marks them as timed out or overloaded

From gateway logs, model fallback decisions for today:

**Failure reasons:**
- `timeout`: 17 occurrences
- `unknown`: 8
- `overloaded`: 2

**Error previews:**
- `LLM request timed out.`: 12
- `Gateway is draining for restart; new tasks are not accepted`: 8
- `cron: job execution timed out`: 4
- `Live session model switch requested: <model>`: 2
- `Request was aborted.`: 1

### 3. Fallback chains take minutes

Example fallback chains from today's logs:

| Run ID | Chain | Total time |
|--------|-------|-----------|
| `7c914aae` | qwen→timeout → Kimi→timeout → gemma→timeout → **Sonnet✓** | **23.4 min** |
| `21ca97c0` | gemma→timeout → **qwen✓** | **4.1 min** |
| `0cb06206` | gemma→timeout → **Kimi✓** | **56.6s** |
| `66f5e9e5` | gemma→timeout → **GLM-5✓** | **80.6s** |

### 4. Gateway can't even spawn subagents

Attempting `sessions_spawn` returns:
```
gateway timeout after 10000ms
Gateway target: ws://127.0.0.1:18789
```

Meanwhile, direct curl to the same endpoints returns in <1s.

### 5. "Overloaded" misclassification

The gateway logs show `reason: "overloaded"` with `errorPreview: "Live session model switch requested: novita/zai-org/glm-4.7"`. A **session model mismatch** is being classified as a provider overload — the gateway is conflating an internal session state error with provider unavailability.

## Expected Behaviour

- Requests to healthy, sub-second local endpoints should not time out
- Session model switch errors should not be classified as "overloaded"
- Fallback chains should not take minutes when all providers are responsive
- `sessions_spawn` should not time out when the gateway is under normal load

## Root Cause Hypothesis

Two distinct bugs:

1. **Internal timeout too aggressive or misapplied**: The gateway's LLM request timeout fires before the endpoint responds, or the timeout is applied to an internal queue wait rather than the actual HTTP request. Endpoints respond in <1s but the gateway reports "LLM request timed out" 17 times today.

2. **LiveSessionModelSwitchError misclassified as "overloaded"**: When a cron job or isolated session requests a model different from the live session's current model, the gateway throws `LiveSessionModelSwitchError` and classifies this as `reason: "overloaded"` in the fallback system. This is semantically wrong and triggers unnecessary fallback cascading.

## Reproduction

1. Configure two local vLLM providers with fast endpoints (<1s response)
2. Configure 3+ agents with cron jobs using different model overrides
3. Observe gateway logs: endpoints will be marked as "timed out" despite being healthy
4. Run `curl` directly against the same endpoints to confirm sub-second response

## Impact

- Interactive sessions experience **minutes-long delays** for responses that should take seconds
- Cron jobs time out and fail unnecessarily
- Session continuity breaks (related to #63195)
- Gateway becomes unresponsive to `sessions_spawn` and CLI commands
- Users lose trust in the interface as a reliable work surface

## Additional Context

- Related: #63195 (sessions disappearing during normal use)
- The `LiveSessionModelSwitchError` appears 17 times in today's logs, suggesting this is the dominant failure mode
- `agents.defaults.maxConcurrent: 8` with 5 agents may amplify the issue but is not the root cause — the endpoints are idle when failures occur


Run ID	Chain	Total time
`7c914aae`	qwen→timeout → Kimi→timeout → gemma→timeout → Sonnet✓	23.4 min
`21ca97c0`	gemma→timeout → qwen✓	4.1 min
`0cb06206`	gemma→timeout → Kimi✓	56.6s
`66f5e9e5`	gemma→timeout → GLM-5✓	80.6s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Gateway falsely marks healthy local vLLM endpoints as timed out/overloaded, causing 1–23 min fallback cascades #63229

Summary

Environment

Observed Behaviour

1. Endpoints are fast (direct curl, concurrent)

2. Gateway marks them as timed out or overloaded

3. Fallback chains take minutes

4. Gateway can't even spawn subagents

5. "Overloaded" misclassification

Expected Behaviour

Root Cause Hypothesis

Reproduction

Impact

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Bug: Gateway falsely marks healthy local vLLM endpoints as timed out/overloaded, causing 1–23 min fallback cascades #63229

Description

Summary

Environment

Observed Behaviour

1. Endpoints are fast (direct curl, concurrent)

2. Gateway marks them as timed out or overloaded

3. Fallback chains take minutes

4. Gateway can't even spawn subagents

5. "Overloaded" misclassification

Expected Behaviour

Root Cause Hypothesis

Reproduction

Impact

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions