Skip to content

doctor: agents.defaults.llm.idleTimeoutSeconds auto-fix discards the user value; runtime gives no signal until doctor runs #74910

@andhai

Description

@andhai

Summary

agents.defaults.llm.idleTimeoutSeconds (legacy timeout knob) is correctly recognized by openclaw doctor's deprecation rule introduced in v2026.4.27. Two gaps remain that combined to silently produce 120s timeouts on slow local models:

  1. doctor --fix deletes the legacy block without preserving the user's value. The migrated key (models.providers.<id>.timeoutSeconds) is not populated; the user's "I want 180s" intent is silently dropped.
  2. The runtime emits no warning when the legacy key is present, so users who haven't run openclaw doctor since v2026.4.27 see only the symptom: prefills on slow local models hit a hardcoded 120s ceiling and fall back to the configured fallback model.

What I observed

On v2026.4.27 (cbc2ba0931), with this config:

"agents": {
  "defaults": {
    "model": { "primary": "mlx/<slow-local-30b-model>" },
    "llm": { "idleTimeoutSeconds": 180 }
  }
}
  • openclaw agent --message "<prompt that triggers >120s prefill>" consistently hit a 120s idle timeout, not 180s, then fell back to OpenAI.
  • src/config/agent-timeout-defaults.tsDEFAULT_LLM_IDLE_TIMEOUT_SECONDS = 120.
  • src/agents/pi-embedded-runner/run/llm-idle-timeout.tsresolveLlmIdleTimeoutMs(...) uses params.modelRequestTimeoutMs (derived from models.providers.<id>.timeoutSeconds) as the override path. There is no path from agents.defaults.llm.idleTimeoutSeconds into this resolver.

The actual user-side fix is to set models.providers.<id>.timeoutSeconds instead, which works as expected.

What doctor does today

src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts:

{
  id: "agents.defaults.llm->models.providers.timeoutSeconds",
  legacyRules: [{
    path: ["agents","defaults","llm"],
    message: 'agents.defaults.llm is legacy; use models.providers.<id>.timeoutSeconds for slow model/provider timeouts. Run "openclaw doctor --fix".'
  }],
  apply: (raw, changes) => {
    delete defaults.llm;
    changes.push("Removed agents.defaults.llm; model idle timeout now follows models.providers.<id>.timeoutSeconds.");
  }
}

The detection rule is correct. The apply step:

  • Deletes the legacy block.
  • Does not copy idleTimeoutSeconds into any models.providers.<id>.timeoutSeconds.
  • Does not quote the user's number in the change message, so the user has no breadcrumb to recreate their intent.

End state: the user's explicit "180s" preference is silently dropped on auto-fix.

Suggested improvements

Either alone would help; both would be ideal.

  1. Preserve intent in the change message. Echo the legacy value back so the user can recreate it:

    "Removed agents.defaults.llm.idleTimeoutSeconds: 180. To preserve this behavior, add models.providers.<id>.timeoutSeconds: 180 to slow providers (detected providers: mlx, ollama)."

    If there's exactly one non-OpenAI provider configured, optionally offer to apply the value there.

  2. Runtime warning at startup when agents.defaults.llm exists, regardless of whether doctor has been run. One-shot warning naming the legacy path and pointing at openclaw doctor. Today, deprecation visibility is effectively opt-in to running doctor.

Repro

// ~/.openclaw/openclaw.json
{
  "agents": {
    "defaults": {
      "model": { "primary": "mlx/<some-slow-local-30b-model>" },
      "llm": { "idleTimeoutSeconds": 600 }
    }
  },
  "models": {
    "providers": {
      "mlx": {
        "baseUrl": "http://127.0.0.1:8080/v1",
        "api": "openai-completions",
        "models": [{ "id": "<some-slow-local-30b-model>" }]
      }
    }
  }
}

Then openclaw agent --message "<prompt that triggers >120s prefill>" times out at 120s, not 600s, with the user having no in-band signal that agents.defaults.llm.idleTimeoutSeconds was the wrong place to set it.

Related diagnostic gap (not a separate issue, just adjacent)

While debugging the timeout symptom, we also bumped into agents.list[i].model.primary silently shadowing agents.defaults.model.primary. The behavior is correct (resolveAgentEffectiveModelPrimary does what it says on the tin), but there's no diagnostic when a user changes their default and a per-agent override quietly keeps the old value. There's already a precedent for inspecting agents.list[].model in src/commands/doctor/shared/codex-route-warnings.ts; a generalized "explicit override differs from defaults" info-level hit there (or in openclaw models list grouping) would close the same kind of "config-says-X-but-runtime-uses-Y" debugging gap.

Environment

  • OpenClaw v2026.4.27 (cbc2ba0931)
  • macOS Darwin 25.3.0, Apple silicon, 48 GB unified memory
  • mlx_lm.server v0.31.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions