Skip to content

LLM idle timeout cannot be configured for long local llama.cpp requests in 2026.5.3-1 #77744

@juaps

Description

@juaps

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After updating to OpenClaw 2026.5.3-1, long local llama.cpp requests time out even though the backend is still actively processing the prompt.

The old agents.defaults.llm.idleTimeoutSeconds config is rejected as an unrecognized key. The suggested models.providers.<id>.timeoutSeconds config is accepted and hot reloaded, but it does not prevent the chat from being cut off during long local model prefill.

This looks like the idle watchdog is still using another timeout path that is not configurable through the current schema.

Steps to reproduce

  1. Install or update OpenClaw to 2026.5.3-1.
  2. Run OpenClaw in Docker.
  3. Configure a local llama.cpp / OpenAI-compatible provider.
  4. Use a large-context local model, for example qwen3.6-35b.
  5. Configure the agent primary model as llamacpp/qwen3.6-35b.
  6. Send a very large prompt, around 150k tokens, through webchat.
  7. Observe that llama.cpp keeps processing the prompt, but OpenClaw cancels before the model produces a reply.
  8. Try adding agents.defaults.llm.idleTimeoutSeconds.
  9. Observe OpenClaw rejects the config with Unrecognized key: "llm".
  10. Try using models.providers.llamacpp.timeoutSeconds.
  11. Observe the config is accepted and hot reloaded, but the chat still times out.

Expected behavior

OpenClaw should wait for the local provider according to the configured timeout, especially when the provider is still processing a long prefill and has not failed.

A long local llama.cpp prefill should not be treated as a failed model request merely because no output token has been produced yet.

Actual behavior

OpenClaw cuts off the agent request before the local llama.cpp backend finishes prompt prefill and produces a reply.

The llama.cpp server is still alive and processing. It is not dead. The request is cancelled from the OpenClaw side before completion.

When fallbacks are configured, OpenClaw then tries fallback providers even though the local model was still working.

OpenClaw version

2026.5.3-1

Operating system

Docker container on NAS, with local llama.cpp backend running on Mac.

Install method

Docker / custom image based on ghcr.io/openclaw/openclaw:latest.

Model

OpenClaw model id:

llamacpp/qwen3.6-35b

Local backend model:

Qwen3.6-35B-A3B-GGUF

Qwen3.6-35B-A3B-Q6_K.gguf

Provider / routing chain

OpenClaw → local llama.cpp OpenAI-compatible server → /v1/responses

Additional provider/model setup details

Provider config:

"models": {
  "mode": "merge",
  "providers": {
    "llamacpp": {
      "baseUrl": "http://cerebro-mac:8080/v1",
      "apiKey": "dummy",
      "api": "openai-responses",
      "timeoutSeconds": 14400,
      "models": [
        {
          "id": "qwen3.6-35b",
          "name": "Qwen 3.6 35B A3B local llama.cpp",
          "reasoning": true,
          "input": ["text"],
          "contextWindow": 262144,
          "maxTokens": 100000
        }
      ]
    }
  }
}

Agent config:

"agents": {
  "defaults": {
    "timeoutSeconds": 14400,
    "model": {
      "primary": "llamacpp/qwen3.6-35b",
      "fallbacks": []
    },
    "compaction": {
      "timeoutSeconds": 10800
    },
    "models": {
      "llamacpp/qwen3.6-35b": {
        "timeoutSeconds": 14400,
        "streaming": true
      }
    }
  }
}

Config that no longer works:

{
  "agents": {
    "defaults": {
      "llm": {
        "idleTimeoutSeconds": 3600
      }
    }
  }
}

This raises:

agents.defaults: Unrecognized key: "llm"

models.providers.llamacpp.timeoutSeconds is accepted by hot reload, but the long request still gets cancelled.

Logs, screenshots, and evidence

OpenClaw logs show fallback and timeout behavior:

Embedded agent failed before reply: All models failed
llamacpp/qwen3.6-35b: LLM request timed out

Config reload confirms models.providers.llamacpp.timeoutSeconds is accepted:

config hot reload applied (... models.providers.llamacpp.timeoutSeconds ...)

Adding agents.defaults.llm.idleTimeoutSeconds fails:

config reload skipped (invalid config): agents.defaults: Unrecognized key: "llm"

The local llama.cpp backend receives the request and starts processing a very large prompt:

task.n_tokens = 152780
prompt processing progress ...
n_tokens = 67584
srv stop: cancel task
done request: POST /v1/responses 200

This suggests the backend was still processing and did not crash. The request was cancelled before the model had a chance to produce a reply.

Impact and severity

Affected: local large-context llama.cpp users.

Severity: Critical for long-running local model workflows.

Frequency: 100% on very large prompts since updating to 2026.5.3-1.

Consequence: long local model requests cannot complete because OpenClaw cancels before the first reply.

Additional information

This appears to be a regression or schema/config mismatch around idle timeout behavior.

The old agents.defaults.llm.idleTimeoutSeconds path is rejected.

The new suggested models.providers.<id>.timeoutSeconds path is accepted but does not stop the idle watchdog cancellation.

Question:

What is the correct supported config key in 2026.5.3-1 to increase the idle watchdog timeout for long local model prefill?

Is there a separate hidden/default idle timeout still active even when models.providers.<id>.timeoutSeconds is set?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions