Configurable LLM request timeout per provider/model (Ollama cold-start causes silent fallback)

## Summary

When using local Ollama models, the first request after model load triggers a cold-start that takes **~13-46 seconds** (depending on model size). The default LLM request timeout in OpenClaw appears too short for this scenario, causing a **timeout-based fallback** (status 408) to the next model in the fallback chain — typically a cloud model.

This is a secondary issue to the auth-based silent fallback (see #43945), but creates the same problematic outcome: **user data intended for local processing is silently routed to cloud providers.**

## Problem

Ollama models need significant time to load into memory on first request:
- **27B model:** ~13 seconds cold-start
- **122B model:** ~46 seconds cold-start
- Larger models with CPU offloading: potentially 60+ seconds

During cold-start, Ollama doesn't respond until the model is fully loaded. OpenClaw's LLM request timeout fires before Ollama can respond, resulting in a timeout error that triggers the fallback chain.

Even after fixing the auth issue (#43945), this timeout problem independently causes the same silent fallback behavior.

## Steps to Reproduce

1. Configure an Ollama provider with a large model (e.g. `qwen3.5:122b`)
2. Ensure the model is NOT pre-loaded (cold state — `ollama ps` shows no running models)
3. Send a request through OpenClaw targeting that model
4. **Observed:** Request times out (model load + inference exceeds default timeout), falls back to cloud model
5. **Expected:** Request waits for model load + inference, or timeout is configurable per provider

## Log Evidence

```
[WARN] model_fallback_decision: candidate_failed
  requestedProvider: ollama-remote
  requestedModel: qwen3.5:122b
  reason: timeout, status: 408
  nextCandidate: openai/gpt-4.1-mini
```

## Privacy Impact

Same as the auth issue: if a user chose a local model for data sovereignty, a timeout-triggered fallback silently sends their data to a cloud provider. Combined with the lack of user-visible fallback notifications, this is a privacy concern.

## Proposed Fixes

### Fix 1: Configurable `requestTimeout` per provider (minimal)
Add a `requestTimeout` (or `timeoutMs`) field to the provider configuration:

```json
{
  "models": {
    "providers": {
      "ollama-remote": {
        "baseUrl": "http://192.168.178.122:11434",
        "api": "ollama",
        "requestTimeout": 120000,
        "models": [...]
      }
    }
  }
}
```

This allows users to set longer timeouts for local providers with slow cold-starts while keeping tight timeouts for cloud APIs.

### Fix 2: Configurable timeout per model (granular)
Allow timeout overrides at the model level for cases where different models on the same provider have vastly different load times:

```json
{
  "models": {
    "providers": {
      "ollama-remote": {
        "api": "ollama",
        "requestTimeout": 60000,
        "models": [
          { "id": "qwen3.5:27b", "requestTimeout": 30000 },
          { "id": "qwen3.5:122b", "requestTimeout": 180000 }
        ]
      }
    }
  }
}
```

### Fix 3: Auto-detect Ollama cold-start and extend timeout (smart)
When `api: "ollama"` is configured, OpenClaw could probe the Ollama `/api/ps` endpoint to check if the model is loaded. If not, automatically extend the timeout to accommodate the cold-start. This provides a zero-config experience for Ollama users.

### Fix 4: Respect `allowCloudFallback: false` on timeout too
If the fail-closed policy flag from #43945 is implemented, it should also apply to timeout-triggered fallbacks — not just auth failures. A timeout on a local model should result in an error, not a silent cloud redirect.

## Workaround

Pre-load models before use via the Ollama API:

```bash
curl http://192.168.178.122:11434/api/generate \
  -d '{"model":"qwen3.5:122b","prompt":"hi","stream":false,"keep_alive":"60m","options":{"num_predict":1}}'
```

This forces the model into memory with a 60-minute keep-alive, avoiding cold-start timeouts on subsequent requests. However, this requires manual intervention or scripting before each session.

## Related

- #43945 — Subagents miss Ollama credentials and silently fall back to cloud models (privacy regression)
- Both issues independently cause the same outcome: silent data leakage from local to cloud

## Environment

- OpenClaw 2026.3.11
- Ollama 0.17.7 (remote, Windows + local, Linux)
- Models tested: qwen3.5:27b (13s cold-start), qwen3.5:122b (46s cold-start)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable LLM request timeout per provider/model (Ollama cold-start causes silent fallback) #43946

Summary

Problem

Steps to Reproduce

Log Evidence

Privacy Impact

Proposed Fixes

Fix 1: Configurable `requestTimeout` per provider (minimal)

Fix 2: Configurable timeout per model (granular)

Fix 3: Auto-detect Ollama cold-start and extend timeout (smart)

Fix 4: Respect `allowCloudFallback: false` on timeout too

Workaround

Related

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Configurable LLM request timeout per provider/model (Ollama cold-start causes silent fallback) #43946

Description

Summary

Problem

Steps to Reproduce

Log Evidence

Privacy Impact

Proposed Fixes

Fix 1: Configurable requestTimeout per provider (minimal)

Fix 2: Configurable timeout per model (granular)

Fix 3: Auto-detect Ollama cold-start and extend timeout (smart)

Fix 4: Respect allowCloudFallback: false on timeout too

Workaround

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Fix 1: Configurable `requestTimeout` per provider (minimal)

Fix 4: Respect `allowCloudFallback: false` on timeout too