Summary
When using local Ollama models, the first request after model load triggers a cold-start that takes ~13-46 seconds (depending on model size). The default LLM request timeout in OpenClaw appears too short for this scenario, causing a timeout-based fallback (status 408) to the next model in the fallback chain — typically a cloud model.
This is a secondary issue to the auth-based silent fallback (see #43945), but creates the same problematic outcome: user data intended for local processing is silently routed to cloud providers.
Problem
Ollama models need significant time to load into memory on first request:
- 27B model: ~13 seconds cold-start
- 122B model: ~46 seconds cold-start
- Larger models with CPU offloading: potentially 60+ seconds
During cold-start, Ollama doesn't respond until the model is fully loaded. OpenClaw's LLM request timeout fires before Ollama can respond, resulting in a timeout error that triggers the fallback chain.
Even after fixing the auth issue (#43945), this timeout problem independently causes the same silent fallback behavior.
Steps to Reproduce
- Configure an Ollama provider with a large model (e.g.
qwen3.5:122b)
- Ensure the model is NOT pre-loaded (cold state —
ollama ps shows no running models)
- Send a request through OpenClaw targeting that model
- Observed: Request times out (model load + inference exceeds default timeout), falls back to cloud model
- Expected: Request waits for model load + inference, or timeout is configurable per provider
Log Evidence
[WARN] model_fallback_decision: candidate_failed
requestedProvider: ollama-remote
requestedModel: qwen3.5:122b
reason: timeout, status: 408
nextCandidate: openai/gpt-4.1-mini
Privacy Impact
Same as the auth issue: if a user chose a local model for data sovereignty, a timeout-triggered fallback silently sends their data to a cloud provider. Combined with the lack of user-visible fallback notifications, this is a privacy concern.
Proposed Fixes
Fix 1: Configurable requestTimeout per provider (minimal)
Add a requestTimeout (or timeoutMs) field to the provider configuration:
{
"models": {
"providers": {
"ollama-remote": {
"baseUrl": "http://192.168.178.122:11434",
"api": "ollama",
"requestTimeout": 120000,
"models": [...]
}
}
}
}
This allows users to set longer timeouts for local providers with slow cold-starts while keeping tight timeouts for cloud APIs.
Fix 2: Configurable timeout per model (granular)
Allow timeout overrides at the model level for cases where different models on the same provider have vastly different load times:
{
"models": {
"providers": {
"ollama-remote": {
"api": "ollama",
"requestTimeout": 60000,
"models": [
{ "id": "qwen3.5:27b", "requestTimeout": 30000 },
{ "id": "qwen3.5:122b", "requestTimeout": 180000 }
]
}
}
}
}
Fix 3: Auto-detect Ollama cold-start and extend timeout (smart)
When api: "ollama" is configured, OpenClaw could probe the Ollama /api/ps endpoint to check if the model is loaded. If not, automatically extend the timeout to accommodate the cold-start. This provides a zero-config experience for Ollama users.
Fix 4: Respect allowCloudFallback: false on timeout too
If the fail-closed policy flag from #43945 is implemented, it should also apply to timeout-triggered fallbacks — not just auth failures. A timeout on a local model should result in an error, not a silent cloud redirect.
Workaround
Pre-load models before use via the Ollama API:
curl http://192.168.178.122:11434/api/generate \
-d '{"model":"qwen3.5:122b","prompt":"hi","stream":false,"keep_alive":"60m","options":{"num_predict":1}}'
This forces the model into memory with a 60-minute keep-alive, avoiding cold-start timeouts on subsequent requests. However, this requires manual intervention or scripting before each session.
Related
Environment
- OpenClaw 2026.3.11
- Ollama 0.17.7 (remote, Windows + local, Linux)
- Models tested: qwen3.5:27b (13s cold-start), qwen3.5:122b (46s cold-start)
Summary
When using local Ollama models, the first request after model load triggers a cold-start that takes ~13-46 seconds (depending on model size). The default LLM request timeout in OpenClaw appears too short for this scenario, causing a timeout-based fallback (status 408) to the next model in the fallback chain — typically a cloud model.
This is a secondary issue to the auth-based silent fallback (see #43945), but creates the same problematic outcome: user data intended for local processing is silently routed to cloud providers.
Problem
Ollama models need significant time to load into memory on first request:
During cold-start, Ollama doesn't respond until the model is fully loaded. OpenClaw's LLM request timeout fires before Ollama can respond, resulting in a timeout error that triggers the fallback chain.
Even after fixing the auth issue (#43945), this timeout problem independently causes the same silent fallback behavior.
Steps to Reproduce
qwen3.5:122b)ollama psshows no running models)Log Evidence
Privacy Impact
Same as the auth issue: if a user chose a local model for data sovereignty, a timeout-triggered fallback silently sends their data to a cloud provider. Combined with the lack of user-visible fallback notifications, this is a privacy concern.
Proposed Fixes
Fix 1: Configurable
requestTimeoutper provider (minimal)Add a
requestTimeout(ortimeoutMs) field to the provider configuration:{ "models": { "providers": { "ollama-remote": { "baseUrl": "http://192.168.178.122:11434", "api": "ollama", "requestTimeout": 120000, "models": [...] } } } }This allows users to set longer timeouts for local providers with slow cold-starts while keeping tight timeouts for cloud APIs.
Fix 2: Configurable timeout per model (granular)
Allow timeout overrides at the model level for cases where different models on the same provider have vastly different load times:
{ "models": { "providers": { "ollama-remote": { "api": "ollama", "requestTimeout": 60000, "models": [ { "id": "qwen3.5:27b", "requestTimeout": 30000 }, { "id": "qwen3.5:122b", "requestTimeout": 180000 } ] } } } }Fix 3: Auto-detect Ollama cold-start and extend timeout (smart)
When
api: "ollama"is configured, OpenClaw could probe the Ollama/api/psendpoint to check if the model is loaded. If not, automatically extend the timeout to accommodate the cold-start. This provides a zero-config experience for Ollama users.Fix 4: Respect
allowCloudFallback: falseon timeout tooIf the fail-closed policy flag from #43945 is implemented, it should also apply to timeout-triggered fallbacks — not just auth failures. A timeout on a local model should result in an error, not a silent cloud redirect.
Workaround
Pre-load models before use via the Ollama API:
curl http://192.168.178.122:11434/api/generate \ -d '{"model":"qwen3.5:122b","prompt":"hi","stream":false,"keep_alive":"60m","options":{"num_predict":1}}'This forces the model into memory with a 60-minute keep-alive, avoiding cold-start timeouts on subsequent requests. However, this requires manual intervention or scripting before each session.
Related
Environment