doctor: agents.defaults.llm.idleTimeoutSeconds auto-fix discards the user value; runtime gives no signal until doctor runs

## Summary

`agents.defaults.llm.idleTimeoutSeconds` (legacy timeout knob) is correctly recognized by `openclaw doctor`'s deprecation rule introduced in v2026.4.27. Two gaps remain that combined to silently produce 120s timeouts on slow local models:

1. **`doctor --fix` deletes the legacy block without preserving the user's value.** The migrated key (`models.providers.<id>.timeoutSeconds`) is not populated; the user's "I want 180s" intent is silently dropped.
2. **The runtime emits no warning** when the legacy key is present, so users who haven't run `openclaw doctor` since v2026.4.27 see only the symptom: prefills on slow local models hit a hardcoded 120s ceiling and fall back to the configured fallback model.

## What I observed

On `v2026.4.27` (`cbc2ba0931`), with this config:

```jsonc
"agents": {
  "defaults": {
    "model": { "primary": "mlx/<slow-local-30b-model>" },
    "llm": { "idleTimeoutSeconds": 180 }
  }
}
```

- `openclaw agent --message "<prompt that triggers >120s prefill>"` consistently hit a **120s** idle timeout, not 180s, then fell back to OpenAI.
- `src/config/agent-timeout-defaults.ts` → `DEFAULT_LLM_IDLE_TIMEOUT_SECONDS = 120`.
- `src/agents/pi-embedded-runner/run/llm-idle-timeout.ts` → `resolveLlmIdleTimeoutMs(...)` uses `params.modelRequestTimeoutMs` (derived from `models.providers.<id>.timeoutSeconds`) as the override path. There is no path from `agents.defaults.llm.idleTimeoutSeconds` into this resolver.

The actual user-side fix is to set `models.providers.<id>.timeoutSeconds` instead, which works as expected.

## What doctor does today

`src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts`:

```ts
{
  id: "agents.defaults.llm->models.providers.timeoutSeconds",
  legacyRules: [{
    path: ["agents","defaults","llm"],
    message: 'agents.defaults.llm is legacy; use models.providers.<id>.timeoutSeconds for slow model/provider timeouts. Run "openclaw doctor --fix".'
  }],
  apply: (raw, changes) => {
    delete defaults.llm;
    changes.push("Removed agents.defaults.llm; model idle timeout now follows models.providers.<id>.timeoutSeconds.");
  }
}
```

The detection rule is correct. The `apply` step:

- Deletes the legacy block.
- Does **not** copy `idleTimeoutSeconds` into any `models.providers.<id>.timeoutSeconds`.
- Does **not** quote the user's number in the change message, so the user has no breadcrumb to recreate their intent.

End state: the user's *explicit* "180s" preference is silently dropped on auto-fix.

## Suggested improvements

Either alone would help; both would be ideal.

1. **Preserve intent in the change message.** Echo the legacy value back so the user can recreate it:
   > *"Removed `agents.defaults.llm.idleTimeoutSeconds: 180`. To preserve this behavior, add `models.providers.<id>.timeoutSeconds: 180` to slow providers (detected providers: `mlx`, `ollama`)."*

   If there's exactly one non-OpenAI provider configured, optionally offer to apply the value there.

2. **Runtime warning at startup** when `agents.defaults.llm` exists, regardless of whether doctor has been run. One-shot warning naming the legacy path and pointing at `openclaw doctor`. Today, deprecation visibility is effectively opt-in to running doctor.

## Repro

```jsonc
// ~/.openclaw/openclaw.json
{
  "agents": {
    "defaults": {
      "model": { "primary": "mlx/<some-slow-local-30b-model>" },
      "llm": { "idleTimeoutSeconds": 600 }
    }
  },
  "models": {
    "providers": {
      "mlx": {
        "baseUrl": "http://127.0.0.1:8080/v1",
        "api": "openai-completions",
        "models": [{ "id": "<some-slow-local-30b-model>" }]
      }
    }
  }
}
```

Then `openclaw agent --message "<prompt that triggers >120s prefill>"` times out at **120s, not 600s**, with the user having no in-band signal that `agents.defaults.llm.idleTimeoutSeconds` was the wrong place to set it.

## Related diagnostic gap (not a separate issue, just adjacent)

While debugging the timeout symptom, we also bumped into `agents.list[i].model.primary` silently shadowing `agents.defaults.model.primary`. The behavior is correct (`resolveAgentEffectiveModelPrimary` does what it says on the tin), but there's no diagnostic when a user changes their default and a per-agent override quietly keeps the old value. There's already a precedent for inspecting `agents.list[].model` in `src/commands/doctor/shared/codex-route-warnings.ts`; a generalized "explicit override differs from defaults" info-level hit there (or in `openclaw models list` grouping) would close the same kind of "config-says-X-but-runtime-uses-Y" debugging gap.

## Environment

- OpenClaw `v2026.4.27` (`cbc2ba0931`)
- macOS Darwin 25.3.0, Apple silicon, 48 GB unified memory
- `mlx_lm.server` v0.31.1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

doctor: agents.defaults.llm.idleTimeoutSeconds auto-fix discards the user value; runtime gives no signal until doctor runs #74910

Summary

What I observed

What doctor does today

Suggested improvements

Repro

Related diagnostic gap (not a separate issue, just adjacent)

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

doctor: agents.defaults.llm.idleTimeoutSeconds auto-fix discards the user value; runtime gives no signal until doctor runs #74910

Description

Summary

What I observed

What doctor does today

Suggested improvements

Repro

Related diagnostic gap (not a separate issue, just adjacent)

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions