Skip to content

Add web.messageTimeoutMs for WhatsApp watchdog#63939

Open
oromeis wants to merge 2 commits intoopenclaw:mainfrom
oromeis:feature/whatsapp-message-timeout-config
Open

Add web.messageTimeoutMs for WhatsApp watchdog#63939
oromeis wants to merge 2 commits intoopenclaw:mainfrom
oromeis:feature/whatsapp-message-timeout-config

Conversation

@oromeis
Copy link
Copy Markdown

@oromeis oromeis commented Apr 9, 2026

Summary

Adds a small, backward-compatible config option for the WhatsApp web watchdog timeout.

This exposes web.messageTimeoutMs so deployments that need a longer watchdog window can configure it without carrying a local patch.

Closes #63925.

What changed

  • added web.messageTimeoutMs to config types and schema
  • added resolveMessageTimeoutMs(cfg, overrideMs?)
  • updated the WhatsApp web monitor to use config-aware timeout resolution
  • kept the default behavior unchanged at 30 minutes
  • documented the new field in the configuration reference and WhatsApp docs
  • added test coverage for default and override behavior

Why

Right now the watchdog timeout is effectively hard-coded. That makes it awkward to tune for quieter or longer-lived sessions, and local patches get overwritten on update.

This change keeps the existing default while making the timeout configurable in a narrow, low-risk way.

Testing

AI-assisted PR.

Tested locally:

  • pnpm test:extension whatsapp
  • pnpm check

Notes

  • default remains 30 * 60 * 1000
  • config field is optional
  • tuning overrides still take precedence in test-oriented paths

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: whatsapp-web Channel integration: whatsapp-web gateway Gateway runtime size: XS labels Apr 9, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR adds web.messageTimeoutMs as a configurable option for the WhatsApp watchdog timeout, exposing what was previously a hard-coded 30-minute constant (DEFAULT_MESSAGE_TIMEOUT_MS). The change is backward-compatible, touches all the right surfaces (type, zod schema, resolver, generated schema, help text, labels, tests, and docs), and follows the existing heartbeatSeconds pattern faithfully.

Confidence Score: 5/5

Safe to merge; only remaining finding is a P2 cosmetic inconsistency in observability logging.

All changed surfaces (type, schema, resolver, generated baseline, labels, help, tests, docs) are consistent and correct. The one new finding — the heartbeat warning threshold hardcoded at 30 minutes while the watchdog timeout is now configurable — is purely observability/cosmetic and does not affect correctness or reliability. Prior P1 concerns (in-loop resolution, sha256 baseline) are already tracked in the existing thread.

extensions/whatsapp/src/auto-reply/monitor.ts — heartbeat warning threshold at lines 332 and 337 should be derived from MESSAGE_TIMEOUT_MS rather than hardcoded to 30.

Comments Outside Diff (1)

  1. extensions/whatsapp/src/auto-reply/monitor.ts, line 337-338 (link)

    P2 Heartbeat warning threshold hardcoded at 30 minutes

    The heartbeat callback still uses the literal 30 as the "idle for too long" threshold, but MESSAGE_TIMEOUT_MS (which controls the actual watchdog) is now configurable. If someone sets messageTimeoutMs to, say, 5 minutes, the watchdog will fire long before the heartbeat ever emits its warning; if they set it to 2 hours, the heartbeat will warn four times before the watchdog acts. MESSAGE_TIMEOUT_MS is in scope here, so the threshold can be derived from it.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: extensions/whatsapp/src/auto-reply/monitor.ts
    Line: 337-338
    
    Comment:
    **Heartbeat warning threshold hardcoded at 30 minutes**
    
    The heartbeat callback still uses the literal `30` as the "idle for too long" threshold, but `MESSAGE_TIMEOUT_MS` (which controls the actual watchdog) is now configurable. If someone sets `messageTimeoutMs` to, say, 5 minutes, the watchdog will fire long before the heartbeat ever emits its warning; if they set it to 2 hours, the heartbeat will warn four times before the watchdog acts. `MESSAGE_TIMEOUT_MS` is in scope here, so the threshold can be derived from it.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/whatsapp/src/auto-reply/monitor.ts
Line: 337-338

Comment:
**Heartbeat warning threshold hardcoded at 30 minutes**

The heartbeat callback still uses the literal `30` as the "idle for too long" threshold, but `MESSAGE_TIMEOUT_MS` (which controls the actual watchdog) is now configurable. If someone sets `messageTimeoutMs` to, say, 5 minutes, the watchdog will fire long before the heartbeat ever emits its warning; if they set it to 2 hours, the heartbeat will warn four times before the watchdog acts. `MESSAGE_TIMEOUT_MS` is in scope here, so the threshold can be derived from it.

```suggestion
  if (minutesSinceLastMessage && minutesSinceLastMessage > MESSAGE_TIMEOUT_MS / 60000) {
    heartbeatLogger.warn(logData, `⚠️ web gateway heartbeat - no messages in ${Math.floor(MESSAGE_TIMEOUT_MS / 60000)}+ minutes`);
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Add web.messageTimeoutMs for WhatsApp wa..." | Re-trigger Greptile

// Watchdog to detect stuck message processing (e.g., event emitter died).
// Tuning overrides are test-oriented; production defaults remain unchanged.
const MESSAGE_TIMEOUT_MS = tuning.messageTimeoutMs ?? 30 * 60 * 1000; // 30m default
const MESSAGE_TIMEOUT_MS = resolveMessageTimeoutMs(cfg, tuning.messageTimeoutMs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 resolveMessageTimeoutMs called inside reconnect loop

heartbeatSeconds and reconnectPolicy are both resolved once before the while loop (lines 124–125), but MESSAGE_TIMEOUT_MS is re-resolved on every reconnect iteration. Since cfg and tuning are stable across iterations this is a harmless but redundant computation that diverges from the established pattern. Consider hoisting it above the loop alongside the other resolve calls.

(Move this line to just after line 125, before the while (true) loop.)

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/whatsapp/src/auto-reply/monitor.ts
Line: 195

Comment:
**`resolveMessageTimeoutMs` called inside reconnect loop**

`heartbeatSeconds` and `reconnectPolicy` are both resolved once before the `while` loop (lines 124–125), but `MESSAGE_TIMEOUT_MS` is re-resolved on every reconnect iteration. Since `cfg` and `tuning` are stable across iterations this is a harmless but redundant computation that diverges from the established pattern. Consider hoisting it above the loop alongside the other resolve calls.

(Move this line to just after line 125, before the `while (true)` loop.)

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with Commit 3d3c356

Comment thread src/config/schema.help.ts
Comment on lines +338 to +339
"web.messageTimeoutMs":
"Maximum allowed idle time in milliseconds since the last inbound message before the web-channel watchdog forces a reconnect. Increase this when long-lived quiet sessions are expected and false-positive reconnects are undesirable.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing config-baseline.sha256 regeneration

schema.help.ts and schema.labels.ts both changed, but docs/.generated/config-baseline.sha256 is not in the PR's changed files. pnpm config:docs:check (part of pnpm release:check) hashes the config doc baseline at runtime using FIELD_HELP from schema.help.js, so it will detect drift and fail. Per CLAUDE.md: "If you change config schema/help … run the matching gen command and commit the updated .sha256 hash file." — run pnpm config:docs:gen and include the updated hash file before merging.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/schema.help.ts
Line: 338-339

Comment:
**Missing `config-baseline.sha256` regeneration**

`schema.help.ts` and `schema.labels.ts` both changed, but `docs/.generated/config-baseline.sha256` is not in the PR's changed files. `pnpm config:docs:check` (part of `pnpm release:check`) hashes the config doc baseline at runtime using `FIELD_HELP` from `schema.help.js`, so it will detect drift and fail. Per CLAUDE.md: _"If you change config schema/help … run the matching gen command and commit the updated `.sha256` hash file."_ — run `pnpm config:docs:gen` and include the updated hash file before merging.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with Commit 3d3c356

@oromeis
Copy link
Copy Markdown
Author

oromeis commented Apr 13, 2026

Friendly ping on #63939 when someone has a minute.
This is a small, backward-compatible change to expose the WhatsApp watchdog timeout as config, so deployments do not need to carry a local patch.

Follow-up review comments were addressed in 3d3c356.

This came from an actual deployment pain point, the current 30 minute timeout is a bit too rigid in practice. Thanks!

@PatrickTrent
Copy link
Copy Markdown

Bumping this — we're hitting the exact problem described in #66965, #55030, #59589, #60337, #60626. Daily false-positive "WhatsApp DOWN" boot-alerts because a health-check script reads the 30m idle-reconnect pattern as "session not connected", while inbound messages actually work fine (confirmed by receiving WA messages during the exact window the health-check claimed DOWN).

We're running v2026.4.10 on Mac Mini. Inbound messages arrive normally, but gateway.err.log cycles every ~30 minutes:

[whatsapp] No messages received in 30m - restarting connection
[whatsapp] Web connection closed (status 499). Retry 1/12 in 2.3s…

Until this PR lands we've applied a local patch that replaces 1800 * 1e3 with 86400 * 1e3 (24h) in the bundled dist. Design rationale: Baileys' own connection.update event already detects real disconnects; this watchdog is duplicative and fires on the WhatsApp-Web-side idle-kill cycle itself. The 24h value preserves a safety net for genuinely stuck sockets that never emit a close event.

Happy to help test this PR with production traffic once it's ready to merge — we have a single-account personal deployment with realistic idle patterns (overnight 8h+ quiet periods).

If a maintainer sees value: the 0/disable option described in #66965 would be ideal for always-on personal deployments where any idle reconnect is a false positive.

@vincentkoc
Copy link
Copy Markdown
Member

thanks for pushing on the same stability symptom.

I opened #71466 as the canonical path for the quiet-session 499 loop. The issue is the production default using inbound app-message silence as liveness, so making the timeout configurable helps operators but still leaves the bad default in place.

I’m leaving this open until #71466 lands; after that we can decide whether any remaining config surface is still useful or close this as superseded.

vincentkoc added a commit that referenced this pull request Apr 26, 2026
Fixes #70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused #71466 approach and keeps #63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
steipete pushed a commit that referenced this pull request Apr 26, 2026
Fixes #70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused #71466 approach and keeps #63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
ayesha-aziz123 pushed a commit to ayesha-aziz123/openclaw that referenced this pull request Apr 26, 2026
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
bminicore pushed a commit to bminicore/openclaw-fork that referenced this pull request Apr 27, 2026
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 30, 2026

Codex review: found issues before merge.

Summary
The PR adds a web.messageTimeoutMs config field, resolver, schema/help/generated docs updates, WhatsApp monitor wiring, and tests while preserving the 30-minute default.

Reproducibility: not applicable. as a feature/config PR. Source inspection confirms current main still lacks web.messageTimeoutMs, while tests/docs show the original quiet-session reconnect symptom was addressed through transport activity.

Next step before merge
Current main fixed the motivating false-positive reconnect bug differently, so accepting this PR now requires a maintainer decision on whether and where to expose a remaining timeout knob.

Security
Cleared: The diff changes config schema/docs/tests and WhatsApp timeout resolution only; it does not alter workflows, dependencies, lockfiles, secrets, package resolution, or code-download/execution paths.

Review findings

  • [P2] Align timeout help with the transport-aware watchdog — src/config/schema.help.ts:338-339
  • [P3] Add the required changelog entry — src/config/types.base.ts:270
Review details

Best possible solution:

Decide whether operators still need a public watchdog timeout after the transport-aware fix; if yes, expose it through the current WhatsApp timing model with aligned schema, generated docs, changelog, and focused regression tests.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature/config PR. Source inspection confirms current main still lacks web.messageTimeoutMs, while tests/docs show the original quiet-session reconnect symptom was addressed through transport activity.

Is this the best way to solve the issue?

No, not as-is. The branch adds a top-level inbound-message timeout against the older watchdog model, while current main uses transport-aware liveness and web.whatsapp.* timing controls, so the final API/docs need maintainer choice.

Full review comments:

  • [P2] Align timeout help with the transport-aware watchdog — src/config/schema.help.ts:338-339
    Current main no longer treats only inbound-message silence as the reconnect signal; it also uses transport activity and a longer app-silence cap. This new help text would publish the old “last inbound message” model, so the config docs/schema wording need to be rebased onto the current watchdog semantics before merge.
    Confidence: 0.86
  • [P3] Add the required changelog entry — src/config/types.base.ts:270
    This PR adds a user-facing config option, but the diff does not update CHANGELOG.md. Repo policy requires user-facing feature/config changes to include an active-version changelog entry before merge.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.84

What I checked:

  • PR scope: The PR branch adds web.messageTimeoutMs to config/schema/docs and wires the WhatsApp monitor through a new resolveMessageTimeoutMs helper. (src/config/types.base.ts:270, 3d3c35670147)
  • Current watchdog model changed: Current main separates transport inactivity from app-message silence: the controller checks lastTransportActivityAt, transportTimeoutMs, and a longer app-silence window before force-closing. (extensions/whatsapp/src/connection-controller.ts:625, ae87f7800b2a)
  • Config field still absent: Current WebConfig exposes enabled, heartbeatSeconds, reconnect, and whatsapp; the strict zod schema likewise has no web.messageTimeoutMs field. (src/config/types.base.ts:314, ae87f7800b2a)
  • Current docs point to transport activity and web.whatsapp timing: The WhatsApp docs now describe the watchdog as transport-activity based and list Baileys socket timing controls under web.whatsapp.*. Public docs: docs/channels/whatsapp.md. (docs/channels/whatsapp.md:152, ae87f7800b2a)
  • Follow-up preserved in changelog: The changelog entry for the quiet-session fix says the transport-aware approach landed while preserving this configurable-timeout work as a related follow-up. (CHANGELOG.md:1240, ae87f7800b2a)
  • Discussion context: A maintainer comment identified fix(whatsapp): stop reconnecting quiet sockets #71466 as the canonical path for the quiet-session 499 loop and left this PR open for a later decision on whether a remaining config surface is still useful.

Likely related people:

  • vincentkoc: Authored the transport-aware WhatsApp quiet-session fix, participated in this PR discussion, and the current blame for the watchdog/config-adjacent lines in the shallow checkout points to recent Vincent Koc history. (role: recent maintainer and replacement-fix author; confidence: high; commits: e672b61417af, 1f194f1d55a4; files: extensions/whatsapp/src/auto-reply/monitor.ts, extensions/whatsapp/src/connection-controller.ts, extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts)
  • steipete: Committed the transport-aware watchdog fix variant and is adjacent to release/config acceptance for a public config-schema addition. (role: adjacent maintainer and merger/committer; confidence: medium; commits: 1f194f1d55a4; files: CHANGELOG.md, docs/channels/whatsapp.md, src/config/types.base.ts)

Remaining risk / open question:

  • The PR branch predates the transport-activity watchdog model, so a direct rebase would need to remap the public option onto the current transport/app-silence semantics.
  • The API location remains a product decision: a remaining timeout knob may belong under web.whatsapp.*, or may need separate transport and app-silence controls.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ae87f7800b2a.

ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: whatsapp-web Channel integration: whatsapp-web docs Improvements or additions to documentation gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Expose WhatsApp web message watchdog timeout as config

3 participants