Summary
The gateway's model-fallback/routing subsystem incorrectly marks healthy, responsive local vLLM endpoints as "timed out" or "overloaded", causing cascading fallback chains that take 1–23 minutes to resolve. The endpoints themselves respond in 0.27–0.93s when tested directly via curl.
Environment
- OpenClaw: 2026.4.5 (container, Linux 6.12.63 x64)
- Gateway: loopback bind, port 18789
- Local vLLM endpoints:
vllm-8001 (gemma4, 27B) on jupiter.wg.local:8001 — dedicated GPU
vllm-7002 (qwen3.5-27b) on jupiter.wg.local:7002 — dedicated GPU
- Remote providers: Novita (GLM-5, Kimi), DeepInfra (Kimi), Anthropic (Sonnet)
- Config:
agents.defaults.timeoutSeconds: 1200, agents.defaults.llm.idleTimeoutSeconds: 300
Observed Behaviour
1. Endpoints are fast (direct curl, concurrent)
Both GPUs idle, tested concurrently:
| Endpoint |
Avg latency (5 reqs) |
| vllm-8001/gemma4 |
0.28s |
| vllm-7002/qwen3.5-27b |
0.91s |
2. Gateway marks them as timed out or overloaded
From gateway logs, model fallback decisions for today:
Failure reasons:
timeout: 17 occurrences
unknown: 8
overloaded: 2
Error previews:
LLM request timed out.: 12
Gateway is draining for restart; new tasks are not accepted: 8
cron: job execution timed out: 4
Live session model switch requested: <model>: 2
Request was aborted.: 1
3. Fallback chains take minutes
Example fallback chains from today's logs:
| Run ID |
Chain |
Total time |
7c914aae |
qwen→timeout → Kimi→timeout → gemma→timeout → Sonnet✓ |
23.4 min |
21ca97c0 |
gemma→timeout → qwen✓ |
4.1 min |
0cb06206 |
gemma→timeout → Kimi✓ |
56.6s |
66f5e9e5 |
gemma→timeout → GLM-5✓ |
80.6s |
4. Gateway can't even spawn subagents
Attempting sessions_spawn returns:
gateway timeout after 10000ms
Gateway target: ws://127.0.0.1:18789
Meanwhile, direct curl to the same endpoints returns in <1s.
5. "Overloaded" misclassification
The gateway logs show reason: "overloaded" with errorPreview: "Live session model switch requested: novita/zai-org/glm-4.7". A session model mismatch is being classified as a provider overload — the gateway is conflating an internal session state error with provider unavailability.
Expected Behaviour
- Requests to healthy, sub-second local endpoints should not time out
- Session model switch errors should not be classified as "overloaded"
- Fallback chains should not take minutes when all providers are responsive
sessions_spawn should not time out when the gateway is under normal load
Root Cause Hypothesis
Two distinct bugs:
-
Internal timeout too aggressive or misapplied: The gateway's LLM request timeout fires before the endpoint responds, or the timeout is applied to an internal queue wait rather than the actual HTTP request. Endpoints respond in <1s but the gateway reports "LLM request timed out" 17 times today.
-
LiveSessionModelSwitchError misclassified as "overloaded": When a cron job or isolated session requests a model different from the live session's current model, the gateway throws LiveSessionModelSwitchError and classifies this as reason: "overloaded" in the fallback system. This is semantically wrong and triggers unnecessary fallback cascading.
Reproduction
- Configure two local vLLM providers with fast endpoints (<1s response)
- Configure 3+ agents with cron jobs using different model overrides
- Observe gateway logs: endpoints will be marked as "timed out" despite being healthy
- Run
curl directly against the same endpoints to confirm sub-second response
Impact
Additional Context
Summary
The gateway's model-fallback/routing subsystem incorrectly marks healthy, responsive local vLLM endpoints as "timed out" or "overloaded", causing cascading fallback chains that take 1–23 minutes to resolve. The endpoints themselves respond in 0.27–0.93s when tested directly via curl.
Environment
vllm-8001(gemma4, 27B) on jupiter.wg.local:8001 — dedicated GPUvllm-7002(qwen3.5-27b) on jupiter.wg.local:7002 — dedicated GPUagents.defaults.timeoutSeconds: 1200,agents.defaults.llm.idleTimeoutSeconds: 300Observed Behaviour
1. Endpoints are fast (direct curl, concurrent)
Both GPUs idle, tested concurrently:
2. Gateway marks them as timed out or overloaded
From gateway logs, model fallback decisions for today:
Failure reasons:
timeout: 17 occurrencesunknown: 8overloaded: 2Error previews:
LLM request timed out.: 12Gateway is draining for restart; new tasks are not accepted: 8cron: job execution timed out: 4Live session model switch requested: <model>: 2Request was aborted.: 13. Fallback chains take minutes
Example fallback chains from today's logs:
7c914aae21ca97c00cb0620666f5e9e54. Gateway can't even spawn subagents
Attempting
sessions_spawnreturns:Meanwhile, direct curl to the same endpoints returns in <1s.
5. "Overloaded" misclassification
The gateway logs show
reason: "overloaded"witherrorPreview: "Live session model switch requested: novita/zai-org/glm-4.7". A session model mismatch is being classified as a provider overload — the gateway is conflating an internal session state error with provider unavailability.Expected Behaviour
sessions_spawnshould not time out when the gateway is under normal loadRoot Cause Hypothesis
Two distinct bugs:
Internal timeout too aggressive or misapplied: The gateway's LLM request timeout fires before the endpoint responds, or the timeout is applied to an internal queue wait rather than the actual HTTP request. Endpoints respond in <1s but the gateway reports "LLM request timed out" 17 times today.
LiveSessionModelSwitchError misclassified as "overloaded": When a cron job or isolated session requests a model different from the live session's current model, the gateway throws
LiveSessionModelSwitchErrorand classifies this asreason: "overloaded"in the fallback system. This is semantically wrong and triggers unnecessary fallback cascading.Reproduction
curldirectly against the same endpoints to confirm sub-second responseImpact
sessions_spawnand CLI commandsAdditional Context
LiveSessionModelSwitchErrorappears 17 times in today's logs, suggesting this is the dominant failure modeagents.defaults.maxConcurrent: 8with 5 agents may amplify the issue but is not the root cause — the endpoints are idle when failures occur