Skip to content

Fix legacy LLM timeout diagnostics#74940

Open
chiyouYCH wants to merge 1 commit intoopenclaw:mainfrom
chiyouYCH:fix/agent-llm-timeout-legacy-diagnostics
Open

Fix legacy LLM timeout diagnostics#74940
chiyouYCH wants to merge 1 commit intoopenclaw:mainfrom
chiyouYCH:fix/agent-llm-timeout-legacy-diagnostics

Conversation

@chiyouYCH
Copy link
Copy Markdown
Contributor

@chiyouYCH chiyouYCH commented Apr 30, 2026

Summary

  • Preserve the numeric agents.defaults.llm.idleTimeoutSeconds value in doctor --fix output instead of silently dropping it.
  • Add a one-shot config-load warning when agents.defaults.llm is still present, pointing users to models.providers.<id>.timeoutSeconds.
  • Cover doctor migration, partial-validation migration, and config-load diagnostics with focused tests.

Fixes #74910

Real behavior proof

Behavior or issue addressed: Legacy agents.defaults.llm.idleTimeoutSeconds was removed by doctor --fix with only a generic message, so users lost the old timeout value and had no actionable provider-specific migration hint.

Real environment tested: Local macOS OpenClaw checkout at 82e9c03, run with isolated temporary OPENCLAW_CONFIG_PATH and OPENCLAW_STATE_DIR. The temp config contained agents.defaults.llm.idleTimeoutSeconds: 180 and models.providers.mlx.

Exact steps or command run after this patch:

OPENCLAW_CONFIG_PATH=<tmp>/openclaw.json OPENCLAW_STATE_DIR=<tmp>/state pnpm openclaw health
OPENCLAW_CONFIG_PATH=<tmp>/openclaw.json OPENCLAW_STATE_DIR=<tmp>/state pnpm openclaw doctor --fix

Evidence after fix: Copied live terminal output from the real CLI run:

$ OPENCLAW_CONFIG_PATH=<tmp>/openclaw.json OPENCLAW_STATE_DIR=<tmp>/state pnpm openclaw health
Config invalid
Problem:
  - agents.defaults: Unrecognized key: "llm"
Legacy config keys detected:
  - agents.defaults.llm: agents.defaults.llm is legacy; use models.providers.<id>.timeoutSeconds for slow model/provider timeouts. Run "openclaw doctor --fix".

Run: openclaw doctor --fix

$ OPENCLAW_CONFIG_PATH=<tmp>/openclaw.json OPENCLAW_STATE_DIR=<tmp>/state pnpm openclaw doctor --fix
Legacy config keys detected
  - agents.defaults.llm: agents.defaults.llm is legacy; use models.providers.<id>.timeoutSeconds for slow model/provider timeouts. Run "openclaw doctor --fix".

Doctor changes
  Removed agents.defaults.llm.idleTimeoutSeconds: 180; to preserve this behavior, set models.providers.<id>.timeoutSeconds: 180 for slow providers. Configured providers: mlx.

Observed result after fix: The real CLI now preserves the old timeout value in the doctor migration output and gives the user the exact provider-level replacement value to configure: models.providers.<id>.timeoutSeconds: 180. It also shows the configured provider hint: Configured providers: mlx.

What was not tested: No known gaps for this diagnostic-only change; no live LLM request was needed because the fix changes config migration and diagnostics only.

Tests

  • pnpm test src/commands/doctor/shared/legacy-config-migrate.test.ts src/commands/doctor/shared/legacy-config-migrate.validation.test.ts src/config/io.compat.test.ts
  • pnpm exec oxfmt --check --threads=1 src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts src/commands/doctor/shared/legacy-config-migrate.test.ts src/commands/doctor/shared/legacy-config-migrate.validation.test.ts src/config/io.ts src/config/io.compat.test.ts

@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: S labels Apr 30, 2026
@chiyouYCH chiyouYCH force-pushed the fix/agent-llm-timeout-legacy-diagnostics branch from 371ba4c to 33c50f1 Compare April 30, 2026 07:37
@chiyouYCH chiyouYCH force-pushed the fix/agent-llm-timeout-legacy-diagnostics branch from 33c50f1 to 85316e5 Compare April 30, 2026 07:39
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 30, 2026

Codex review: needs maintainer review before merge.

Summary
The PR updates legacy agents.defaults.llm timeout doctor/config-load diagnostics, focused tests, and the changelog so removed numeric timeout values are echoed and users are pointed to provider timeoutSeconds.

Reproducibility: yes. Source inspection on current main shows doctor --fix deletes the legacy block with only a generic message, and the linked issue comments include production confirmation of the dropped timeout value.

Real behavior proof
Sufficient (terminal): The PR body includes copied live terminal output from an isolated real CLI run showing the after-fix doctor migration message preserves the legacy timeout value and provider hint.

Next step before merge
No repair lane is needed; the previous changelog gap is fixed and the remaining action is ordinary maintainer review and merge gating for this contributor PR.

Security
Cleared: The latest diff only touches config diagnostics, tests, and changelog text, with no dependency, workflow, package-resolution, lifecycle-hook, artifact-download, permission, or secret-handling changes.

Review details

Best possible solution:

Land this narrow diagnostic fix, keeping the retired key out of runtime semantics while giving users enough information to move their timeout to the provider config.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows doctor --fix deletes the legacy block with only a generic message, and the linked issue comments include production confirmation of the dropped timeout value.

Is this the best way to solve the issue?

Yes. Echoing the retired value and warning during config load is narrower than reviving deprecated runtime semantics or guessing which provider to mutate when multiple providers may be configured.

What I checked:

  • current_main_doctor_gap: Current main deletes agents.defaults.llm and emits only the generic removal message, so the old idleTimeoutSeconds value is not preserved in doctor --fix output. (src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts:253, b680360fde68)
  • current_main_load_warning_gap: Current main only calls warnOnConfigMiskeys before config validation; there is no legacy LLM timeout-specific load warning path in createConfigIO().loadConfig(). (src/config/io.ts:1522, b680360fde68)
  • pr_value_preserving_migration: The latest PR diff adds finite-number extraction, sorted configured-provider hints, and a value-preserving doctor change message for agents.defaults.llm.idleTimeoutSeconds. (src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts:265, a31467686e27)
  • pr_warning_tests_changelog: The latest PR diff adds the config-load warning helper, config IO coverage, doctor migration coverage, partial-validation coverage, and a changelog entry. (src/config/io.ts:886, a31467686e27)
  • linked_issue_confirmation: The linked issue discussion includes a second production confirmation that doctor dropped a configured legacy timeout and that setting models.providers.<id>.timeoutSeconds stopped the failures.
  • real_behavior_proof_supplied: The PR body includes copied after-fix terminal output from a real isolated macOS OpenClaw checkout showing openclaw doctor --fix reporting idleTimeoutSeconds: 180 and the configured provider hint. (a31467686e27)

Likely related people:

  • steipete: GitHub path history attributes the retired LLM timeout migration and nearby local-model timeout policy changes to this maintainer. (role: introduced behavior and adjacent maintainer; confidence: high; commits: 531a0ddfe4e9, e899b32e1d79, d9dc75774bcb; files: src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts, src/agents/pi-embedded-runner/run/llm-idle-timeout.ts, src/config/io.ts)
  • hclsys: Recent merged work on partial doctor legacy migration persistence touched the same validation and config IO path this PR extends. (role: recent maintainer; confidence: medium; commits: b1db87fb3646; files: src/config/io.ts, src/commands/doctor/shared/legacy-config-migrate.validation.test.ts, src/commands/doctor/shared/legacy-config-migrate.test.ts)
  • pashpashpash: Recent work in the same doctor runtime migration file modified adjacent agent runtime legacy migration behavior. (role: recent adjacent maintainer; confidence: medium; commits: 8f4eaa9c00be; files: src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against b680360fde68.

@chiyouYCH chiyouYCH force-pushed the fix/agent-llm-timeout-legacy-diagnostics branch from 85316e5 to f03a09b Compare May 6, 2026 03:47
@openclaw-barnacle openclaw-barnacle Bot added size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@chiyouYCH chiyouYCH force-pushed the fix/agent-llm-timeout-legacy-diagnostics branch from 82e9c03 to a314676 Compare May 7, 2026 04:10
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

doctor: agents.defaults.llm.idleTimeoutSeconds auto-fix discards the user value; runtime gives no signal until doctor runs

1 participant