Add web.messageTimeoutMs for WhatsApp watchdog#63939
Add web.messageTimeoutMs for WhatsApp watchdog#63939oromeis wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds Confidence Score: 5/5Safe to merge; only remaining finding is a P2 cosmetic inconsistency in observability logging. All changed surfaces (type, schema, resolver, generated baseline, labels, help, tests, docs) are consistent and correct. The one new finding — the heartbeat warning threshold hardcoded at 30 minutes while the watchdog timeout is now configurable — is purely observability/cosmetic and does not affect correctness or reliability. Prior P1 concerns (in-loop resolution, sha256 baseline) are already tracked in the existing thread. extensions/whatsapp/src/auto-reply/monitor.ts — heartbeat warning threshold at lines 332 and 337 should be derived from MESSAGE_TIMEOUT_MS rather than hardcoded to 30.
|
| // Watchdog to detect stuck message processing (e.g., event emitter died). | ||
| // Tuning overrides are test-oriented; production defaults remain unchanged. | ||
| const MESSAGE_TIMEOUT_MS = tuning.messageTimeoutMs ?? 30 * 60 * 1000; // 30m default | ||
| const MESSAGE_TIMEOUT_MS = resolveMessageTimeoutMs(cfg, tuning.messageTimeoutMs); |
There was a problem hiding this comment.
resolveMessageTimeoutMs called inside reconnect loop
heartbeatSeconds and reconnectPolicy are both resolved once before the while loop (lines 124–125), but MESSAGE_TIMEOUT_MS is re-resolved on every reconnect iteration. Since cfg and tuning are stable across iterations this is a harmless but redundant computation that diverges from the established pattern. Consider hoisting it above the loop alongside the other resolve calls.
(Move this line to just after line 125, before the while (true) loop.)
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/whatsapp/src/auto-reply/monitor.ts
Line: 195
Comment:
**`resolveMessageTimeoutMs` called inside reconnect loop**
`heartbeatSeconds` and `reconnectPolicy` are both resolved once before the `while` loop (lines 124–125), but `MESSAGE_TIMEOUT_MS` is re-resolved on every reconnect iteration. Since `cfg` and `tuning` are stable across iterations this is a harmless but redundant computation that diverges from the established pattern. Consider hoisting it above the loop alongside the other resolve calls.
(Move this line to just after line 125, before the `while (true)` loop.)
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| "web.messageTimeoutMs": | ||
| "Maximum allowed idle time in milliseconds since the last inbound message before the web-channel watchdog forces a reconnect. Increase this when long-lived quiet sessions are expected and false-positive reconnects are undesirable.", |
There was a problem hiding this comment.
Missing
config-baseline.sha256 regeneration
schema.help.ts and schema.labels.ts both changed, but docs/.generated/config-baseline.sha256 is not in the PR's changed files. pnpm config:docs:check (part of pnpm release:check) hashes the config doc baseline at runtime using FIELD_HELP from schema.help.js, so it will detect drift and fail. Per CLAUDE.md: "If you change config schema/help … run the matching gen command and commit the updated .sha256 hash file." — run pnpm config:docs:gen and include the updated hash file before merging.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/schema.help.ts
Line: 338-339
Comment:
**Missing `config-baseline.sha256` regeneration**
`schema.help.ts` and `schema.labels.ts` both changed, but `docs/.generated/config-baseline.sha256` is not in the PR's changed files. `pnpm config:docs:check` (part of `pnpm release:check`) hashes the config doc baseline at runtime using `FIELD_HELP` from `schema.help.js`, so it will detect drift and fail. Per CLAUDE.md: _"If you change config schema/help … run the matching gen command and commit the updated `.sha256` hash file."_ — run `pnpm config:docs:gen` and include the updated hash file before merging.
How can I resolve this? If you propose a fix, please make it concise.|
Friendly ping on #63939 when someone has a minute. Follow-up review comments were addressed in This came from an actual deployment pain point, the current 30 minute timeout is a bit too rigid in practice. Thanks! |
|
Bumping this — we're hitting the exact problem described in #66965, #55030, #59589, #60337, #60626. Daily false-positive "WhatsApp DOWN" boot-alerts because a health-check script reads the 30m idle-reconnect pattern as "session not connected", while inbound messages actually work fine (confirmed by receiving WA messages during the exact window the health-check claimed DOWN). We're running v2026.4.10 on Mac Mini. Inbound messages arrive normally, but Until this PR lands we've applied a local patch that replaces Happy to help test this PR with production traffic once it's ready to merge — we have a single-account personal deployment with realistic idle patterns (overnight 8h+ quiet periods). If a maintainer sees value: the |
|
thanks for pushing on the same stability symptom. I opened #71466 as the canonical path for the quiet-session 499 loop. The issue is the production default using inbound app-message silence as liveness, so making the timeout configurable helps operators but still leaves the bad default in place. I’m leaving this open until #71466 lands; after that we can decide whether any remaining config surface is still useful or close this as superseded. |
Fixes #70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused #71466 approach and keeps #63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
Fixes #70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused #71466 approach and keeps #63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
|
Codex review: found issues before merge. Summary Reproducibility: not applicable. as a feature/config PR. Source inspection confirms current main still lacks Next step before merge Security Review findings
Review detailsBest possible solution: Decide whether operators still need a public watchdog timeout after the transport-aware fix; if yes, expose it through the current WhatsApp timing model with aligned schema, generated docs, changelog, and focused regression tests. Do we have a high-confidence way to reproduce the issue? Not applicable as a feature/config PR. Source inspection confirms current main still lacks Is this the best way to solve the issue? No, not as-is. The branch adds a top-level inbound-message timeout against the older watchdog model, while current main uses transport-aware liveness and Full review comments:
Overall correctness: patch is incorrect What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against ae87f7800b2a. |
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
Fixes openclaw#70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused openclaw#71466 approach and keeps openclaw#63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
Summary
Adds a small, backward-compatible config option for the WhatsApp web watchdog timeout.
This exposes
web.messageTimeoutMsso deployments that need a longer watchdog window can configure it without carrying a local patch.Closes #63925.
What changed
web.messageTimeoutMsto config types and schemaresolveMessageTimeoutMs(cfg, overrideMs?)Why
Right now the watchdog timeout is effectively hard-coded. That makes it awkward to tune for quieter or longer-lived sessions, and local patches get overwritten on update.
This change keeps the existing default while making the timeout configurable in a narrow, low-risk way.
Testing
AI-assisted PR.
Tested locally:
pnpm test:extension whatsapp✅pnpm check✅Notes
30 * 60 * 1000