Skip to content

Configurable LLM request timeout per provider/model (Ollama cold-start causes silent fallback) #43946

@Meli73

Description

@Meli73

Summary

When using local Ollama models, the first request after model load triggers a cold-start that takes ~13-46 seconds (depending on model size). The default LLM request timeout in OpenClaw appears too short for this scenario, causing a timeout-based fallback (status 408) to the next model in the fallback chain — typically a cloud model.

This is a secondary issue to the auth-based silent fallback (see #43945), but creates the same problematic outcome: user data intended for local processing is silently routed to cloud providers.

Problem

Ollama models need significant time to load into memory on first request:

  • 27B model: ~13 seconds cold-start
  • 122B model: ~46 seconds cold-start
  • Larger models with CPU offloading: potentially 60+ seconds

During cold-start, Ollama doesn't respond until the model is fully loaded. OpenClaw's LLM request timeout fires before Ollama can respond, resulting in a timeout error that triggers the fallback chain.

Even after fixing the auth issue (#43945), this timeout problem independently causes the same silent fallback behavior.

Steps to Reproduce

  1. Configure an Ollama provider with a large model (e.g. qwen3.5:122b)
  2. Ensure the model is NOT pre-loaded (cold state — ollama ps shows no running models)
  3. Send a request through OpenClaw targeting that model
  4. Observed: Request times out (model load + inference exceeds default timeout), falls back to cloud model
  5. Expected: Request waits for model load + inference, or timeout is configurable per provider

Log Evidence

[WARN] model_fallback_decision: candidate_failed
  requestedProvider: ollama-remote
  requestedModel: qwen3.5:122b
  reason: timeout, status: 408
  nextCandidate: openai/gpt-4.1-mini

Privacy Impact

Same as the auth issue: if a user chose a local model for data sovereignty, a timeout-triggered fallback silently sends their data to a cloud provider. Combined with the lack of user-visible fallback notifications, this is a privacy concern.

Proposed Fixes

Fix 1: Configurable requestTimeout per provider (minimal)

Add a requestTimeout (or timeoutMs) field to the provider configuration:

{
  "models": {
    "providers": {
      "ollama-remote": {
        "baseUrl": "http://192.168.178.122:11434",
        "api": "ollama",
        "requestTimeout": 120000,
        "models": [...]
      }
    }
  }
}

This allows users to set longer timeouts for local providers with slow cold-starts while keeping tight timeouts for cloud APIs.

Fix 2: Configurable timeout per model (granular)

Allow timeout overrides at the model level for cases where different models on the same provider have vastly different load times:

{
  "models": {
    "providers": {
      "ollama-remote": {
        "api": "ollama",
        "requestTimeout": 60000,
        "models": [
          { "id": "qwen3.5:27b", "requestTimeout": 30000 },
          { "id": "qwen3.5:122b", "requestTimeout": 180000 }
        ]
      }
    }
  }
}

Fix 3: Auto-detect Ollama cold-start and extend timeout (smart)

When api: "ollama" is configured, OpenClaw could probe the Ollama /api/ps endpoint to check if the model is loaded. If not, automatically extend the timeout to accommodate the cold-start. This provides a zero-config experience for Ollama users.

Fix 4: Respect allowCloudFallback: false on timeout too

If the fail-closed policy flag from #43945 is implemented, it should also apply to timeout-triggered fallbacks — not just auth failures. A timeout on a local model should result in an error, not a silent cloud redirect.

Workaround

Pre-load models before use via the Ollama API:

curl http://192.168.178.122:11434/api/generate \
  -d '{"model":"qwen3.5:122b","prompt":"hi","stream":false,"keep_alive":"60m","options":{"num_predict":1}}'

This forces the model into memory with a 60-minute keep-alive, avoiding cold-start timeouts on subsequent requests. However, this requires manual intervention or scripting before each session.

Related

Environment

  • OpenClaw 2026.3.11
  • Ollama 0.17.7 (remote, Windows + local, Linux)
  • Models tested: qwen3.5:27b (13s cold-start), qwen3.5:122b (46s cold-start)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions