Skip to content

[Bug]: Config validation error causes infinite retry loop with no backoff — 1GB error log from single invalid key #29745

@JoyaWang

Description

@JoyaWang

Summary

When openclaw.json contains an unrecognized config key (e.g., channels.whatsapp.groups.*.groupPolicy), the gateway enters an infinite config validation retry loop with zero backoff. The same error message is written to gateway.err.log on every cycle, causing the log file to grow to ~1GB within hours.

The gateway process stays alive (not crashing), consuming CPU in the retry loop, but is completely non-functional — no messages are processed.

Steps to reproduce

  1. Add an unrecognized key to openclaw.json, e.g.:
    {
      "channels": {
        "whatsapp": {
          "groups": {
            "*": {
              "groupPolicy": "open",
              "requireMention": false
            }
          }
        }
      }
    }
    (Note: groupPolicy is valid at channels.whatsapp.groupPolicy but not inside groups.*)
  2. Start OpenClaw gateway
  3. Wait a few hours
  4. Check ~/.openclaw/logs/gateway.err.log

Expected behavior

  • Gateway should fail fast with a clear error and exit (non-zero), OR
  • Log the config error once, then wait with exponential backoff before retrying, OR
  • At minimum, deduplicate identical consecutive error messages

Actual behavior

Gateway enters a tight loop, logging the same error on every iteration:

Invalid config at /Users/.../.openclaw/openclaw.json:
- channels.whatsapp.groups.*: Unrecognized key: "groupPolicy"
Config invalid
File: ~/.openclaw/openclaw.json
Problem:
  - channels.whatsapp.groups.*: Unrecognized key: "groupPolicy"

Run: openclaw doctor --fix

This message repeats indefinitely. In my case:

  • gateway.err.log grew to 983MB
  • gateway.log grew to 21MB
  • Gateway was completely unresponsive (no Telegram/WhatsApp messages processed)
  • The only fix was to manually edit the config and restart

OpenClaw version

2026.2.26

Operating system

macOS 15 (Darwin 24.6.0, Apple Silicon)

Install method

npm global

Impact and severity

  • Affected: Anyone whose config becomes invalid after an upgrade (common — config schema changes between versions)
  • Severity: High — gateway silently becomes non-functional, fills disk with log spam, no alert to user
  • Frequency: 100% reproducible with any invalid config key
  • Consequence: Bot goes offline with no obvious indication. On unattended machines, disk can fill up. In my case, the bot was unresponsive and I only noticed when messages stopped being answered.

Suggested fix

  1. Fail fast: If config is invalid, log the error once and exit with non-zero status. Let the process manager (systemd/launchd/pm2) handle restarts with its own backoff.
  2. Log deduplication: At minimum, don't log the same error message more than once per config file change.
  3. Log rotation: Add built-in log rotation or max file size limit (see also CRITICAL BUG - Edge Case - Mac app reinstall triggers breaking log bomb — no retry backoff, no log rotation, health check role mismatch #21080).
  4. Doctor auto-run on validation failure: If openclaw doctor --fix can resolve the issue, consider running it automatically (or at least prompting once).

Related issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions