fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053)#66816
fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053)#66816omarshahine merged 2 commits intomainfrom
Conversation
🔒 Aisle Security AnalysisWe found 3 potential security issue(s) in this PR:
1. 🟡 Replay can re-run tool side effects when reply delivery fails (dedupe claim released)
DescriptionThe inbound deduplication wrapper in If any tools/actions in Vulnerable behavior:
Vulnerable code: if (signal.deliveryFailed) {
...
claim.release();
} else {
await claim.finalize();
}and if (info.kind === "final") {
dedupeSignal.deliveryFailed = true;
}RecommendationTreat inbound processing as at-most-once for side effects, even if reply delivery fails. Options (choose based on product requirements):
Illustrative approach (finalize processing, retry delivery): // After successful tool run / response generation
await claim.finalize();
try {
await deliverFinalReply(...);
} catch (e) {
// schedule retry of delivery only; do NOT release inbound dedupe
await enqueueDeliveryRetry({ dedupeKey, responseId });
}2. 🟡 Disk/CPU DoS via file-backed inbound dedupe store rewriting large JSON map
DescriptionThe new BlueBubbles inbound GUID dedupe persists attacker-influenced GUIDs to a per-account JSON file for 7 days (TTL) with a hard cap of 50,000 entries. Each successful message processing calls
Because inbound GUIDs originate from remote webhook/poller events, a remote party can send many unique messages (unique GUIDs) to drive the store to its maximum size and then keep it near the cap. This can cause sustained high disk I/O and CPU usage due to repeated full-file read/parse/sort/write cycles, potentially degrading the gateway or exhausting disk throughput. Although Vulnerable code (BlueBubbles wiring):
RecommendationReduce the ability for untrusted inbound traffic to force large persistent state and full-file rewrites. Suggested mitigations (pick a combination):
import { createHash } from "node:crypto";
function normalizeGuidForStore(guid: string): string {
return createHash("sha256").update(guid, "utf8").digest("hex");
}
3. 🟡 Symlink/hardlink file clobber risk in Windows fallback path for atomic JSON writes
DescriptionThe On Windows, This is relevant to the new BlueBubbles inbound dedupe feature because it calls Vulnerable behavior:
Vulnerable code: await fs.copyFile(tempPath, filePath);Note: RecommendationAvoid overwriting an attacker-controlled link destination. Options:
Example (sketch): import fs from "node:fs/promises";
async function safeReplaceFile(tempPath: string, filePath: string) {
try {
// Best: attempt atomic rename
await fs.rename(tempPath, filePath);
return;
} catch (e: any) {
if (process.platform !== "win32" || (e?.code !== "EPERM" && e?.code !== "EEXIST")) {
throw e;
}
}
// Windows fallback: refuse to write through links
try {
const st = await fs.lstat(filePath);
if (st.isSymbolicLink()) {
throw new Error(`Refusing to overwrite symlink: ${filePath}`);
}
// Optionally also ensure st.isFile() and that filePath is within an expected base dir.
await fs.rm(filePath, { force: true });
} catch (e: any) {
if (e?.code !== "ENOENT") throw e;
}
await fs.rename(tempPath, filePath);
}
Analyzed PR: #66816 at commit Last updated on: 2026-04-14T22:40:55Z |
Greptile SummaryAdds a persistent file-backed GUID dedupe ( All four claim outcomes are handled correctly, the Confidence Score: 5/5Safe to merge — all remaining findings are minor style/clarity concerns that do not affect correctness. The core dedupe logic is correct across all four claim outcomes. The finalize-on-success / release-on-failure lifecycle is sound. Disk persistence is properly guarded against errors. Tests cover the key behavioral contracts. The two P2 findings (misleading comment in a catch block whose release() call is a no-op, and an unnecessary export on an internal type) have no runtime impact. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/bluebubbles/src/monitor-processing.ts
Line: 602
Comment:
**`InboundDedupeDeliverySignal` exported but only used within this file**
The type is defined and consumed entirely within `monitor-processing.ts` — `processMessageAfterDedupe` (unexported) takes it as a parameter, and `processMessage` (exported) creates and owns it. Exporting the type leaks an internal implementation detail of the dedupe wrapper into the module's public surface. Consider removing the `export` keyword unless downstream consumers (e.g., tests) need to reference it explicitly.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/bluebubbles/src/monitor-processing.ts
Line: 662-673
Comment:
**`claim.release()` in the `catch (finalizeError)` block is a no-op**
When `finalize()` throws, it means `impl.commit` threw. `createClaimableDedupe.commit` always calls `inflight.delete(scopedKey)` in its `finally` block before re-throwing, so by the time `claim.release()` runs here, the inflight slot is already gone — `release` calls `inflight.get` on an empty map and returns immediately.
The GUID won't get "stuck" for 7 days, but not because `release()` unsticks it: `commit` failed before persisting anything (disk errors inside `checkAndRecordInner` are caught and fall back to memory, so `commit` itself rarely throws at all). The comment overstates what `release()` accomplishes and could mislead future maintainers into thinking the call is load-bearing when it's actually redundant in this path.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(bluebubbles): dedupe inbound webhook..." | Re-trigger Greptile |
| * dispatcher's `onError` callback rather than as thrown exceptions, so a | ||
| * plain try/catch cannot detect them — see review thread `rwF8` on #66230. | ||
| */ | ||
| export type InboundDedupeDeliverySignal = { deliveryFailed: boolean }; |
There was a problem hiding this comment.
InboundDedupeDeliverySignal exported but only used within this file
The type is defined and consumed entirely within monitor-processing.ts — processMessageAfterDedupe (unexported) takes it as a parameter, and processMessage (exported) creates and owns it. Exporting the type leaks an internal implementation detail of the dedupe wrapper into the module's public surface. Consider removing the export keyword unless downstream consumers (e.g., tests) need to reference it explicitly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/bluebubbles/src/monitor-processing.ts
Line: 602
Comment:
**`InboundDedupeDeliverySignal` exported but only used within this file**
The type is defined and consumed entirely within `monitor-processing.ts` — `processMessageAfterDedupe` (unexported) takes it as a parameter, and `processMessage` (exported) creates and owns it. Exporting the type leaks an internal implementation detail of the dedupe wrapper into the module's public surface. Consider removing the `export` keyword unless downstream consumers (e.g., tests) need to reference it explicitly.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — removed the export keyword. The type is now file-local.
| await claim.finalize(); | ||
| } catch (finalizeError) { | ||
| // Release the in-flight claim so the GUID doesn't get stuck for the | ||
| // 7-day TTL if persistence fails (disk full, permissions, etc.). | ||
| claim.release(); | ||
| logVerbose( | ||
| core, | ||
| runtime, | ||
| `inbound-dedupe: finalize failed for key=${sanitizeForLog(dedupeKey ?? "")}: ${sanitizeForLog(finalizeError)}`, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
claim.release() in the catch (finalizeError) block is a no-op
When finalize() throws, it means impl.commit threw. createClaimableDedupe.commit always calls inflight.delete(scopedKey) in its finally block before re-throwing, so by the time claim.release() runs here, the inflight slot is already gone — release calls inflight.get on an empty map and returns immediately.
The GUID won't get "stuck" for 7 days, but not because release() unsticks it: commit failed before persisting anything (disk errors inside checkAndRecordInner are caught and fall back to memory, so commit itself rarely throws at all). The comment overstates what release() accomplishes and could mislead future maintainers into thinking the call is load-bearing when it's actually redundant in this path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/bluebubbles/src/monitor-processing.ts
Line: 662-673
Comment:
**`claim.release()` in the `catch (finalizeError)` block is a no-op**
When `finalize()` throws, it means `impl.commit` threw. `createClaimableDedupe.commit` always calls `inflight.delete(scopedKey)` in its `finally` block before re-throwing, so by the time `claim.release()` runs here, the inflight slot is already gone — `release` calls `inflight.get` on an empty map and returns immediately.
The GUID won't get "stuck" for 7 days, but not because `release()` unsticks it: `commit` failed before persisting anything (disk errors inside `checkAndRecordInner` are caught and fall back to memory, so `commit` itself rarely throws at all). The comment overstates what `release()` accomplishes and could mislead future maintainers into thinking the call is load-bearing when it's actually redundant in this path.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed — removed the redundant claim.release() call since commit() already clears inflight state in its finally block. Now just logs the error.
82bce26 to
64020da
Compare
Aisle findings response#1 (High) Symlink-following file overwrite: This is a pre-existing concern in #2 (Medium) PII in verbose logs: Consistent with existing BB channel behavior — #3 (Medium) Attacker-controlled dedupe key: If an attacker can forge BB webhooks (requires the webhook password), they can already inject arbitrary messages, not just suppress them. The dedupe key derivation mirrors the existing debouncer key logic. The webhook password is the trust boundary here. #4 (Info) Plaintext GUIDs on disk: All BB state (sessions, reply cache, history) is already stored as plaintext JSON under the same state directory with the same permissions. Consistent with existing behavior. None of these are blockers or require changes in this PR scope. |
…66816 The inbound-dedupe PR was reopened again as #66816 (closed-without-merge trail: #66230 → #66810 → #66816). The branch was force-pushed and the new PR uses the parallel `fix/bb-inbound-dedupe` branch. Updating code comments and the catchup CHANGELOG entry to point at the live PR. Stacking on top of the dedupe branch will be addressed in a follow-up rebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aisle findings response (commit d27603b)#1 (High) Replay re-executes tool side effects when final delivery fails: #2 (Medium) Unbounded fields in verbose logs: #3 (Medium) Symlink-following in atomic JSON writer: |
c772ce8 to
be43c31
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be43c315aa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| onDiskError: (error) => | ||
| logVerbose(core, runtime, `inbound-dedupe disk error: ${sanitizeForLog(error)}`), | ||
| }); | ||
| if (claim.kind === "duplicate" || claim.kind === "inflight") { |
There was a problem hiding this comment.
Skip dedupe for from-me webhook messages
processMessage now claims/drops dedupe entries before the message.fromMe branch runs, so after a gateway restart replayed from-me events are treated as duplicate and never reach cacheInboundMessage() in processMessageAfterDedupe. That regresses post-restart cache hydration for BlueBubbles reply context/short IDs (the cache is in-memory), so replies to pre-restart assistant messages can lose context resolution even though the replay webhooks still arrived. Consider bypassing duplicate-drop for fromMe events (or running their cache-update path before dedupe return).
Useful? React with 👍 / 👎.
…2053) BlueBubbles MessagePoller replays its ~1-week lookback window as new-message webhooks after BB Server restart or reconnect. Without persistent dedup, the gateway re-replies to messages it already handled before the restart. Add a persistent file-backed GUID dedupe (TTL=7d, matching BB's lookback window) at the top of processMessage, using the same createClaimableDedupe SDK primitive as Feishu. The on-disk store at ~/.openclaw/bluebubbles/inbound-dedupe/<account>.json survives gateway restarts. Claim/finalize/release semantics ensure transient delivery failures release the GUID so a later replay can retry, while successful deliveries are committed and block future replays. Fixes #19176, #12053.
be43c31 to
cbeb3ca
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbeb3ca130
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (info.kind === "final") { | ||
| dedupeSignal.deliveryFailed = true; | ||
| } |
There was a problem hiding this comment.
Release dedupe claim on block-stream delivery failures
When BlueBubbles blockStreaming is enabled, replies can be delivered as block payloads without a final payload, but this handler only marks dedupeSignal.deliveryFailed for info.kind === "final". If a block send fails (transient network/API error), onError logs it but leaves deliveryFailed false, so processMessage commits the dedupe claim and future replay webhooks for that message are dropped as duplicates for 7 days instead of retrying delivery. This regresses reliability specifically for block-streaming accounts with block-only output.
Useful? React with 👍 / 👎.
…rt (#66721) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in #66721: BB's WebhookService is fire-and-forget on POST failure (no retries), and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the gateway was down, restarting, or wedged were permanently lost. Design ------ New extensions/bluebubbles/src/catchup.ts: - fetchBlueBubblesMessagesSince(sinceMs, limit, opts) calls /api/v1/message/query with {after, sort:"ASC", with:[chat, chat.participants, attachment]} so replays carry the same shape normalizeWebhookMessage already handles on live dispatch. - loadBlueBubblesCatchupCursor / saveBlueBubblesCatchupCursor persist {lastSeenMs, updatedAt} per account under <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json using the plugin-sdk's atomic JSON helpers. File layout mirrors the inbound-dedupe store from #66816, and the resolver is the canonical openclaw/plugin-sdk/state-paths.resolveStateDir (same helper dedupe uses) so the two stores share a single root. - runBlueBubblesCatchup(target) orchestrates: clamp config, fetch, filter isFromMe and pre-cursor records, dispatch to processMessage, advance cursor. Modified extensions/bluebubbles/src/monitor.ts: after the webhook target registers, fire catchup as a background task; errors are logged but never block the channel-ready signal. Modified extensions/bluebubbles/src/config-schema.ts: new optional `catchup` block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults are on with 2h lookback / 50 msg cap / 30-min first-run lookback. Modified extensions/bluebubbles/src/accounts.ts: adds `catchup` to the account-merge nestedObjectKeys list so per-account overrides deep-merge on top of channel-level defaults, mirroring the existing `network` precedent. Safety ------ - Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch all apply unchanged. - Dedupes against #66816's persistent inbound GUID cache: a webhook delivery that already succeeded cannot be reprocessed by catchup. - Never dispatches isFromMe records (double-checked before and after normalization) so the agent's own sends cannot enter the inbound path. - Catchup runs once per gateway startup and does NOT skip on rapid restarts. Skipping would permanently lose any messages that arrived during the brief downtime between the two startups; the bounded query (perRunLimit, maxAge) and inbound-dedupe cache cap the cost of running every restart. - Cursor only advances to nowMs on fully-successful runs. On processMessage failure, cursor is held just before the earliest failure timestamp so the next run retries from there. On truncation (fetchedCount === perRunLimit), cursor advances only to the last fetched timestamp so the next gateway startup picks up the unfetched tail. - A future-dated cursor (NTP rollback, manual clock adjust) is treated as unusable and falls through to the firstRunLookback path; the cursor is repaired at the end of the run. - First-run lookback clamped to the maxAge ceiling so a config with maxAgeMinutes:5, firstRunLookbackMinutes:30 cannot exceed the operator's stated cap. - Hard ceilings: 12h max lookback, 500 messages per run. - Loud WARNING emitted when fetchedCount hits perRunLimit so operators know a single startup didn't drain the full backlog. Why this approach The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66816's persistent dedupe automatically. Validation - New scoped tests in extensions/bluebubbles/src/catchup.test.ts (21 cases): cursor round-trip, per-account scoping, FS-unsafe account IDs, firstRunLookback default, maxAge clamp on both existing-cursor and first-run branches, enabled:false, rapid-restart-still-runs, isFromMe filter (pre- and post-normalize), query-failure-preserves-cursor, per-message failure isolation, held-cursor-on-retryable-failure, clamp-to-prior-cursor, future-cursor recovery, pre-cursor defense-in-depth, perRunLimit warn / no-warn, and truncation-cursor advances only to page boundary. - Full BlueBubbles suite: 410/410. - pnpm check green. - Live E2E on macOS 26.3 / BB Server 1.9.x: stop gateway, send 3 messages (verified 3x ECONNREFUSED in BB log), start gateway; catchup replayed all 3 through processMessage, cursor file appeared at ~/.openclaw/bluebubbles/catchup/<accountId>__<hash>.json, subsequent restart was a no-op. Closes #66721. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt (#66721) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in #66721: BB's WebhookService is fire-and-forget on POST failure (no retries), and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the gateway was down, restarting, or wedged were permanently lost. Design - New extensions/bluebubbles/src/catchup.ts with fetchBlueBubblesMessagesSince (POSTs /api/v1/message/query with {after, sort:"ASC", with:[chat, chat.participants, attachment]}), load/saveBlueBubblesCatchupCursor (file-backed {lastSeenMs, updatedAt} per account under <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json using the plugin-sdk's atomic JSON helpers, same state-dir root as inbound-dedupe via the canonical SDK resolver, and resolvePreferredOpenClawTmpDir for test isolation to satisfy the messaging-tmpdir and temp-path-guard lints), and runBlueBubblesCatchup orchestrator. - monitor.ts: fire catchup as a background task after the webhook target registers; errors are logged but never block the channel-ready signal. - config-schema.ts: new optional `catchup` block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults on with 2h lookback / 50 msg cap / 30-min first-run lookback. - accounts.ts: adds `catchup` to nestedObjectKeys so per-account overrides deep-merge on top of channel-level defaults (mirroring the existing `network` precedent). Safety - Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch apply unchanged. - Dedupes against #66816's persistent inbound GUID cache. - Never dispatches isFromMe records (checked before and after normalization). - Runs once per gateway startup and does NOT skip on rapid restarts - skipping would permanently lose any messages that arrived during the brief downtime between two startups. - Cursor advances to nowMs on full success, held at min(earliestFailureTs - 1, previousCursor) on any processMessage failure so retries pick up exactly the failed records, or at latestFetchedTs on truncation (fetchedCount === perRunLimit) so the next gateway startup picks up the unfetched tail. - Future-dated cursor (NTP rollback, manual clock adjust) treated as unusable and recovered via firstRunLookback; cursor is repaired at end of run. - First-run lookback clamped to the maxAge ceiling. - Hard ceilings: 12h max lookback, 500 messages per run. - Loud WARNING on perRunLimit truncation pointing at the config knob to raise. Why this approach The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66816's persistent dedupe automatically. Validation - 21 scoped tests in extensions/bluebubbles/src/catchup.test.ts. - Full BB suite 410/410. - pnpm check green. - src/security/temp-path-guard.test.ts and lint:tmp:no-random-messaging both pass (use resolvePreferredOpenClawTmpDir + string concatenation instead of os.tmpdir + template literal). - Live E2E on macOS 26.3 / BB Server 1.9.x: 3/3 messages replayed. Closes #66721. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt (#66721) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in #66721: BB's WebhookService is fire-and-forget on POST failure (no retries), and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the gateway was down, restarting, or wedged were permanently lost. Design - New extensions/bluebubbles/src/catchup.ts with fetchBlueBubblesMessagesSince (POSTs /api/v1/message/query with {after, sort:"ASC", with:[chat, chat.participants, attachment]}), load/saveBlueBubblesCatchupCursor (file-backed {lastSeenMs, updatedAt} per account under <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json using the plugin-sdk's atomic JSON helpers, same state-dir root as inbound-dedupe via the canonical SDK resolver, and resolvePreferredOpenClawTmpDir for test isolation to satisfy the messaging-tmpdir and temp-path-guard lints), and runBlueBubblesCatchup orchestrator. - monitor.ts: fire catchup as a background task after the webhook target registers; errors are logged but never block the channel-ready signal. - config-schema.ts: new optional `catchup` block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults on with 2h lookback / 50 msg cap / 30-min first-run lookback. - accounts.ts: adds `catchup` to nestedObjectKeys so per-account overrides deep-merge on top of channel-level defaults (mirroring the existing `network` precedent). Safety - Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch apply unchanged. - Dedupes against #66816's persistent inbound GUID cache. - Never dispatches isFromMe records (checked before and after normalization). - Runs once per gateway startup and does NOT skip on rapid restarts - skipping would permanently lose any messages that arrived during the brief downtime between two startups. - Cursor advances to nowMs on full success, held at min(earliestFailureTs - 1, previousCursor) on any processMessage failure so retries pick up exactly the failed records, or at latestFetchedTs on truncation (fetchedCount === perRunLimit) so the next gateway startup picks up the unfetched tail. - Future-dated cursor (NTP rollback, manual clock adjust) treated as unusable and recovered via firstRunLookback; cursor is repaired at end of run. - First-run lookback clamped to the maxAge ceiling. - Hard ceilings: 12h max lookback, 500 messages per run. - Loud WARNING on perRunLimit truncation pointing at the config knob to raise. Why this approach The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66816's persistent dedupe automatically. Validation - 21 scoped tests in extensions/bluebubbles/src/catchup.test.ts. - Full BB suite 410/410. - pnpm check green. - src/security/temp-path-guard.test.ts and lint:tmp:no-random-messaging both pass (use resolvePreferredOpenClawTmpDir + string concatenation instead of os.tmpdir + template literal). - Live E2E on macOS 26.3 / BB Server 1.9.x: 3/3 messages replayed. Closes #66721. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt (#66721) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in #66721: BB's WebhookService is fire-and-forget on POST failure (no retries), and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the gateway was down, restarting, or wedged were permanently lost. Design - New extensions/bluebubbles/src/catchup.ts with fetchBlueBubblesMessagesSince (POSTs /api/v1/message/query with {after, sort:"ASC", with:[chat, chat.participants, attachment]}), load/saveBlueBubblesCatchupCursor (file-backed {lastSeenMs, updatedAt} per account under <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json using the plugin-sdk's atomic JSON helpers, same state-dir root as inbound-dedupe via the canonical SDK resolver, and resolvePreferredOpenClawTmpDir for test isolation to satisfy the messaging-tmpdir and temp-path-guard lints), and runBlueBubblesCatchup orchestrator. - monitor.ts: fire catchup as a background task after the webhook target registers; errors are logged but never block the channel-ready signal. - config-schema.ts: new optional `catchup` block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults on with 2h lookback / 50 msg cap / 30-min first-run lookback. - accounts.ts: adds `catchup` to nestedObjectKeys so per-account overrides deep-merge on top of channel-level defaults (mirroring the existing `network` precedent). Safety - Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch apply unchanged. - Dedupes against #66816's persistent inbound GUID cache. - Never dispatches isFromMe records (checked before and after normalization). - Runs once per gateway startup and does NOT skip on rapid restarts - skipping would permanently lose any messages that arrived during the brief downtime between two startups. - Cursor advances to nowMs on full success, held at min(earliestFailureTs - 1, previousCursor) on any processMessage failure so retries pick up exactly the failed records, or at latestFetchedTs on truncation (fetchedCount === perRunLimit) so the next gateway startup picks up the unfetched tail. - Future-dated cursor (NTP rollback, manual clock adjust) treated as unusable and recovered via firstRunLookback; cursor is repaired at end of run. - First-run lookback clamped to the maxAge ceiling. - Hard ceilings: 12h max lookback, 500 messages per run. - Loud WARNING on perRunLimit truncation pointing at the config knob to raise. Why this approach The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66816's persistent dedupe automatically. Validation - 21 scoped tests in extensions/bluebubbles/src/catchup.test.ts. - Full BB suite 410/410. - pnpm check green. - src/security/temp-path-guard.test.ts and lint:tmp:no-random-messaging both pass (use resolvePreferredOpenClawTmpDir + string concatenation instead of os.tmpdir + template literal). - Live E2E on macOS 26.3 / BB Server 1.9.x: 3/3 messages replayed. Closes #66721. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rt (#66857) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in #66721: BB's WebhookService is fire-and-forget on POST failure, and MessagePoller only re-fires webhooks on BB-side reconnection events, not on webhook-receiver recovery. - New extensions/bluebubbles/src/catchup.ts with singleflight per accountId, cursor persistence via the canonical state-paths resolver, bounded query (perRunLimit + maxAgeMinutes), failure-held cursor, truncation-aware page-boundary advancement, future-cursor recovery, isFromMe filter (pre- and post-normalization). - monitor.ts fires catchup as a background task after the webhook target registers. - config-schema.ts adds optional catchup block; accounts.ts adds catchup to nestedObjectKeys for deep-merge per-account overrides. - Dedupes against #66816's persistent inbound GUID cache. - 22 scoped tests; full BB suite 411/411; pnpm check green; live E2E on macOS 26.3 / BB Server 1.9.x recovered 3/3 missed messages. Closes #66721. Co-authored-by: Omar Shahine <omar@shahine.com>
…9176, openclaw#12053) (openclaw#66816) BlueBubbles MessagePoller replays its ~1-week lookback window as new-message webhooks after BB Server restart or reconnect. Add a persistent file-backed GUID dedupe (TTL=7d) at the top of processMessage using createClaimableDedupe from the Plugin SDK. Claim/finalize/release semantics ensure transient delivery failures release the GUID so a later replay can retry. Fixes openclaw#19176, openclaw#12053. Co-authored-by: Omar Shahine <omar@shahine.com>
…rt (openclaw#66857) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in openclaw#66721: BB's WebhookService is fire-and-forget on POST failure, and MessagePoller only re-fires webhooks on BB-side reconnection events, not on webhook-receiver recovery. - New extensions/bluebubbles/src/catchup.ts with singleflight per accountId, cursor persistence via the canonical state-paths resolver, bounded query (perRunLimit + maxAgeMinutes), failure-held cursor, truncation-aware page-boundary advancement, future-cursor recovery, isFromMe filter (pre- and post-normalization). - monitor.ts fires catchup as a background task after the webhook target registers. - config-schema.ts adds optional catchup block; accounts.ts adds catchup to nestedObjectKeys for deep-merge per-account overrides. - Dedupes against openclaw#66816's persistent inbound GUID cache. - 22 scoped tests; full BB suite 411/411; pnpm check green; live E2E on macOS 26.3 / BB Server 1.9.x recovered 3/3 missed messages. Closes openclaw#66721. Co-authored-by: Omar Shahine <omar@shahine.com>
…9176, openclaw#12053) (openclaw#66816) BlueBubbles MessagePoller replays its ~1-week lookback window as new-message webhooks after BB Server restart or reconnect. Add a persistent file-backed GUID dedupe (TTL=7d) at the top of processMessage using createClaimableDedupe from the Plugin SDK. Claim/finalize/release semantics ensure transient delivery failures release the GUID so a later replay can retry. Fixes openclaw#19176, openclaw#12053. Co-authored-by: Omar Shahine <omar@shahine.com>
…rt (openclaw#66857) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in openclaw#66721: BB's WebhookService is fire-and-forget on POST failure, and MessagePoller only re-fires webhooks on BB-side reconnection events, not on webhook-receiver recovery. - New extensions/bluebubbles/src/catchup.ts with singleflight per accountId, cursor persistence via the canonical state-paths resolver, bounded query (perRunLimit + maxAgeMinutes), failure-held cursor, truncation-aware page-boundary advancement, future-cursor recovery, isFromMe filter (pre- and post-normalization). - monitor.ts fires catchup as a background task after the webhook target registers. - config-schema.ts adds optional catchup block; accounts.ts adds catchup to nestedObjectKeys for deep-merge per-account overrides. - Dedupes against openclaw#66816's persistent inbound GUID cache. - 22 scoped tests; full BB suite 411/411; pnpm check green; live E2E on macOS 26.3 / BB Server 1.9.x recovered 3/3 missed messages. Closes openclaw#66721. Co-authored-by: Omar Shahine <omar@shahine.com>
…9176, openclaw#12053) (openclaw#66816) BlueBubbles MessagePoller replays its ~1-week lookback window as new-message webhooks after BB Server restart or reconnect. Add a persistent file-backed GUID dedupe (TTL=7d) at the top of processMessage using createClaimableDedupe from the Plugin SDK. Claim/finalize/release semantics ensure transient delivery failures release the GUID so a later replay can retry. Fixes openclaw#19176, openclaw#12053. Co-authored-by: Omar Shahine <omar@shahine.com>
…rt (openclaw#66857) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in openclaw#66721: BB's WebhookService is fire-and-forget on POST failure, and MessagePoller only re-fires webhooks on BB-side reconnection events, not on webhook-receiver recovery. - New extensions/bluebubbles/src/catchup.ts with singleflight per accountId, cursor persistence via the canonical state-paths resolver, bounded query (perRunLimit + maxAgeMinutes), failure-held cursor, truncation-aware page-boundary advancement, future-cursor recovery, isFromMe filter (pre- and post-normalization). - monitor.ts fires catchup as a background task after the webhook target registers. - config-schema.ts adds optional catchup block; accounts.ts adds catchup to nestedObjectKeys for deep-merge per-account overrides. - Dedupes against openclaw#66816's persistent inbound GUID cache. - 22 scoped tests; full BB suite 411/411; pnpm check green; live E2E on macOS 26.3 / BB Server 1.9.x recovered 3/3 missed messages. Closes openclaw#66721. Co-authored-by: Omar Shahine <omar@shahine.com>
…9176, openclaw#12053) (openclaw#66816) BlueBubbles MessagePoller replays its ~1-week lookback window as new-message webhooks after BB Server restart or reconnect. Add a persistent file-backed GUID dedupe (TTL=7d) at the top of processMessage using createClaimableDedupe from the Plugin SDK. Claim/finalize/release semantics ensure transient delivery failures release the GUID so a later replay can retry. Fixes openclaw#19176, openclaw#12053. Co-authored-by: Omar Shahine <omar@shahine.com>
…rt (openclaw#66857) Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered since a persisted per-account cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in openclaw#66721: BB's WebhookService is fire-and-forget on POST failure, and MessagePoller only re-fires webhooks on BB-side reconnection events, not on webhook-receiver recovery. - New extensions/bluebubbles/src/catchup.ts with singleflight per accountId, cursor persistence via the canonical state-paths resolver, bounded query (perRunLimit + maxAgeMinutes), failure-held cursor, truncation-aware page-boundary advancement, future-cursor recovery, isFromMe filter (pre- and post-normalization). - monitor.ts fires catchup as a background task after the webhook target registers. - config-schema.ts adds optional catchup block; accounts.ts adds catchup to nestedObjectKeys for deep-merge per-account overrides. - Dedupes against openclaw#66816's persistent inbound GUID cache. - 22 scoped tests; full BB suite 411/411; pnpm check green; live E2E on macOS 26.3 / BB Server 1.9.x recovered 3/3 missed messages. Closes openclaw#66721. Co-authored-by: Omar Shahine <omar@shahine.com>
Summary
BlueBubbles `MessagePoller` keeps a ~1-week lookback and re-fires `new-message` webhooks after BB Server restart or reconnection. With no sequence number or ack in the BB webhook protocol, the gateway previously had no way to recognize replays and would happily re-reply to messages it had already handled before the restart — producing duplicated outbound messages, and confusing replies to stale inbound that the user had already moved on from (see #19176, #12053).
This PR adds a persistent, file-backed inbound dedupe keyed by message GUID, modeled after the same `createPersistentDedupe` pattern used by the Feishu plugin.
Why this approach
Other channels with monotonic sequence IDs (Telegram's `update_id`, Matrix's sync token, Discord's gateway sequence) can dedupe natively via protocol. BlueBubbles does not expose anything like that, so an identity-based persistent dedupe at the message layer is the closest equivalent that fits how BB actually delivers webhooks.
Interaction with edit events (`updated-message`)
PR #52277 raised a related concern: if dedupe keys are GUID-only, a legitimate `updated-message` event would share a GUID with its original `new-message` and get dropped as a duplicate.
In the current codebase this cannot happen: `monitor.ts` routes `updated-message` payloads differently — without a reaction they are dropped at the webhook layer ("ignored without reaction"), and with a reaction they flow through `processReaction`, not `processMessage`. Our dedupe sits inside `processMessage`, so only `new-message` events are gated. Edits can't collide today.
If the separate work in #52277 lands and begins routing text-edit bodies into `processMessage`, the dedupe key will need to expand to include event type and edit metadata (e.g. `guid + eventType + (dateEdited||"")`) so an edit is treated as a distinct key. That's a straightforward forward-compatible change when it's needed.
Validation
Credits
Re-creates and improves on the focused fix from #31159 by @dashhuang — same behavioral goal (drop stale BB webhook replays), implemented as a persistent on-disk dedupe so it actually survives the gateway-restart case that drives the bug, and without the module-global mutable state that made the original patch need test-reset plumbing.
Fixes #19176, #12053.
Test plan