feat(bluebubbles): replay missed webhook messages after gateway restart (#66721) by omarshahine · Pull Request #66760 · openclaw/openclaw

omarshahine · 2026-04-14T19:29:37Z

Summary

Fixes #66721. Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered while the gateway was unreachable and replays them through the existing processMessage pipeline.

The hole this closes: BB Server's WebhookService is fire-and-forget on POST failure (no retries) and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the OpenClaw gateway was down, restarting, or wedged were permanently lost.

This PR is stacked on top of #66810 (persistent inbound dedupe). The dedupe makes catchup safe to be aggressive: if a webhook delivery already succeeded for a GUID, the catchup replay of that same GUID is dropped at the dedupe boundary. Recommend landing #66810 first; this PR will rebase cleanly onto main once it does.

Design

extensions/bluebubbles/src/catchup.ts (new, ~290 LoC):
- fetchBlueBubblesMessagesSince(sinceMs, limit, opts) calls /api/v1/message/query with {after, sort:\"ASC\", with:[chat, chat.participants, attachment]} so replays carry the same shape normalizeWebhookMessage already handles on live dispatch.
- loadBlueBubblesCatchupCursor / saveBlueBubblesCatchupCursor persist a single {lastSeenMs, updatedAt} per account at <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json, using the plugin-sdk's atomic JSON helpers. File layout mirrors the inbound-dedupe store from fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053) #66810.
- runBlueBubblesCatchup(target) orchestrates: clamp config, skip recent runs (<30s), clamp window to maxAgeMinutes, fetch, filter isFromMe and pre-cursor records, dispatch to processMessage, advance cursor to nowMs on success.
monitor.ts: after the webhook target registers, fire catchup as a background task; errors are logged but never block the channel-ready signal.
config-schema.ts: new optional catchup block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults are on with 2h lookback / 50 msg cap / 30-min first-run lookback.

Why this approach

The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66810's persistent dedupe automatically.

Safety

Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch all apply unchanged.
Dedupes against fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053) #66810's persistent inbound GUID cache: a webhook delivery that already succeeded cannot be reprocessed by catchup.
Never dispatches isFromMe records (double-checked before and after normalization) so the agent's own sends cannot enter the inbound path.
Cursor is only advanced on a successful query; a failed fetch leaves the prior cursor in place so the next run retries the same window.
30s minimum interval between runs prevents churn on rolling restarts.
Hard ceilings: 12h max lookback, 500 messages per run.
Cursor uses nowMs rather than the latest observed message timestamp, so messages with dateCreated > nowMs (BB host vs gateway host clock skew) are not silently skipped — they replay on the next sweep, where dedupe handles any duplicate.

Validation

Automated

New scoped tests in extensions/bluebubbles/src/catchup.test.ts (14 cases): cursor round-trip, per-account scoping, FS-unsafe account IDs, firstRunLookbackMinutes, maxAgeMinutes clamp, enabled: false, rapid-restart skip, isFromMe filter (pre- and post-normalization), query-failure-preserves-cursor, per-message failure isolation, pre-cursor defense-in-depth.
Full BlueBubbles suite passes: 403/403.
pnpm check green (madge, tsgo, oxlint, webhook-auth-body-order, no-pairing-store-group, pairing-account-scope).

Live end-to-end (macOS, BB Server 1.9.x, 2026-04-14)

Repeating the original repro from #66721's issue body, this time against the new in-process catchup with the workspace shim disabled:

Stopped gateway cleanly. Verified port refused, no process.
Sent 3 distinct iMessages from a second device. BB-server log shows all 3 dispatches failed with connect ECONNREFUSED 127.0.0.1:18789 and never retried (confirms the hole this PR fixes is real and that BB does not replay on receiver-side reachability).
Started gateway. Webhook target registered; catchup fired in the background.

Gateway log:

[bluebubbles] [default] BlueBubbles catchup:
  replayed=3 skipped_fromMe=0 skipped_preCursor=0 failed=0 fetched=3 window_ms=517184

All 3 messages produced agent replies that were delivered back via BB outbound. Persistent cursor file appeared at ~/.openclaw/bluebubbles/catchup/<accountId>__<hash>.json. A subsequent gateway restart with no new inbound activity logged replayed=0 fetched=0 (no-op, as expected).

Test plan

pnpm test extensions/bluebubbles/src/catchup.test.ts — 14/14
pnpm test extensions/bluebubbles/ — 403/403
pnpm check — green
Live macOS end-to-end repro: stop gateway, send N messages (verified N×ECONNREFUSED in BB log), start gateway, assert N messages replayed via the catchup path with cursor advanced and dedupe state populated, and no-op on subsequent restart
Maintainer review

Order of operations

#66810 (persistent inbound dedupe) is a prerequisite. Once it lands, this branch will rebase onto main and the PR will retarget automatically.

…2053) BlueBubbles MessagePoller replays its ~1-week lookback window as new-message webhooks after BB Server restart or reconnect. Without persistent dedup, the gateway re-replies to messages it already handled before the restart. Add a persistent file-backed GUID dedupe (TTL=7d, matching BB's lookback window) at the top of processMessage, using the same createClaimableDedupe SDK primitive as Feishu. The on-disk store at ~/.openclaw/bluebubbles/inbound-dedupe/<account>.json survives gateway restarts. Claim/finalize/release semantics ensure transient delivery failures release the GUID so a later replay can retry, while successful deliveries are committed and block future replays. Fixes #19176, #12053.

aisle-research-bot · 2026-04-14T19:29:45Z

🔒 Aisle Security Analysis

We found 3 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	BlueBubbles catchup sends API password in URL query string
2	🟡 Medium	Insecure test-state directory under /tmp allows symlink/hardlink overwrite of persisted JSON state
3	🟡 Medium	Local file overwrite via symlinked state subdirectories when persisting BlueBubbles cursor/dedupe JSON

1. 🟡 BlueBubbles catchup sends API password in URL query string

Property	Value
Severity	Medium
CWE	CWE-598
Location	`extensions/bluebubbles/src/catchup.ts:120-125`

Description

fetchBlueBubblesMessagesSince() constructs the BlueBubbles API URL using buildBlueBubblesApiUrl(..., password) which appends the password as a ?password=... query parameter.

Putting credentials in the URL is sensitive because:

URLs are commonly logged by reverse proxies, HTTP access logs, observability/telemetry tooling, and error reports
URLs may be stored in browser/network caches or intermediate components
If any redirect occurs, the full URL can be propagated (and may end up in Referer headers in other contexts)

Vulnerable code:

const url = buildBlueBubblesApiUrl({
  baseUrl: opts.baseUrl,
  path: "/api/v1/message/query",
  password: opts.password,
});

This newly added catchup path increases the surface area where the password-in-URL pattern is used (additional endpoint and new network call on startup).

Recommendation

Avoid placing secrets in URLs. Prefer an HTTP header (e.g., Authorization) or a dedicated header expected by BlueBubbles.

Example (using a header instead of a query param):

const url = buildBlueBubblesApiUrl({
  baseUrl: opts.baseUrl,
  path: "/api/v1/message/query",
});

const res = await blueBubblesFetchWithTimeout(
  url,
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      // Prefer a header; exact scheme depends on BB server support
      "Authorization": `Bearer ${opts.password}`,
    },
    body,
  },
  opts.timeoutMs ?? FETCH_TIMEOUT_MS,
  ssrfPolicy,
);

If BlueBubbles only supports password as a query parameter, mitigate by:

ensuring request/URL logging is disabled or redacts query strings
redacting password from any error reporting paths
using TLS and restricting network access to the BB server

2. 🟡 Insecure test-state directory under /tmp allows symlink/hardlink overwrite of persisted JSON state

Property	Value
Severity	Medium
CWE	CWE-59
Location	`extensions/bluebubbles/src/inbound-dedupe.ts:23-29`

Description

The BlueBubbles extension switches its persistence root to a predictable per-PID directory under os.tmpdir() whenever VITEST is set or NODE_ENV === "test".

If this code path is reachable in a non-test deployment (e.g., environment misconfiguration, untrusted env injection, or accidentally shipping with NODE_ENV=test), a local attacker on the same host can pre-create /tmp/openclaw-vitest-<pid> (or subpaths) as a symlink to another location. Subsequent writes via writeJsonFileAtomically() / createClaimableDedupe() ultimately call writeJsonAtomic()/writeTextAtomic(), which:

creates directories with fs.mkdir(..., { recursive: true }) (follows symlinks)
writes a temp file and then rename()s/copies it to the final path
does not perform lstat/realpath validation on the full path components
does not use O_NOFOLLOW (not available portably in Node, but symlink-safe patterns exist)

This enables link-following writes outside the intended state directory, potentially overwriting arbitrary files writable by the gateway process.

Vulnerable code (state dir selection):

if (env.VITEST || env.NODE_ENV === "test") {
  const name = "openclaw-vitest-" + process.pid;
  return path.join(os.tmpdir(), name);
}

Affected persistence sinks include:

extensions/bluebubbles/src/inbound-dedupe.ts → persistent dedupe JSON file
extensions/bluebubbles/src/catchup.ts → catchup cursor JSON file

Recommendation

Avoid predictable, attacker-precreatable directories under /tmp for any code path that can run outside a fully controlled test harness.

Options (prefer A):

A. Always require an explicit per-test state dir override (e.g., OPENCLAW_STATE_DIR set by the test harness) and remove the implicit /tmp fallback.

B. Use fs.mkdtemp to create an unguessable directory, and store it in a process-scoped global so all module instances share it during the process lifetime.

C. Add symlink defenses before writing: resolve realpath of the parent directory and verify it is within the intended root; reject if any path component is a symlink (via lstat).

Example using mkdtemp (process-scoped):

import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";

let testStateDir: string | null = null;

async function resolveStateDirFromEnv(env: NodeJS.ProcessEnv = process.env): Promise<string> {
  if (env.OPENCLAW_STATE_DIR?.trim()) return resolveStateDir(env);
  if (env.VITEST || env.NODE_ENV === "test") {
    if (!testStateDir) {
      testStateDir = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-vitest-"));
    }
    return testStateDir;
  }
  return resolveStateDir(env);
}

Additionally, consider hard-failing startup if NODE_ENV==="test" is detected outside an explicit test runner context.

3. 🟡 Local file overwrite via symlinked state subdirectories when persisting BlueBubbles cursor/dedupe JSON

Property	Value
Severity	Medium
CWE	CWE-59
Location	`extensions/bluebubbles/src/catchup.ts:94-101`

Description

The BlueBubbles extension persists two new pieces of state to disk (catchup cursor and inbound dedupe cache) under the OpenClaw state directory:

catchup.ts writes ${stateDir}/bluebubbles/catchup/<prefix>__<hash>.json
inbound-dedupe.ts writes ${stateDir}/bluebubbles/inbound-dedupe/<prefix>__<hash>.json

These paths are ultimately written using writeJsonFileAtomically() → writeJsonAtomic() → writeTextAtomic(), which creates directories with fs.mkdir(path.dirname(filePath), { recursive: true }) and then renames/copies a temp file into place.

On POSIX and Windows, this directory creation / open / rename sequence does not defend against pre-existing symlinks/junctions in parent directory components (e.g., ${stateDir}/bluebubbles being a symlink to /etc). If an attacker can write within the state directory tree (or can cause OPENCLAW_STATE_DIR to point at a shared/writable location), they can pre-create a symlinked path so the gateway will write these JSON files outside the intended state directory, potentially overwriting arbitrary files writable by the gateway process.

Vulnerable write sites include:

// extensions/bluebubbles/src/catchup.ts
await writeJsonFileAtomically(filePath, cursor);

// extensions/bluebubbles/src/inbound-dedupe.ts (via createClaimableDedupe)
resolveFilePath: resolveNamespaceFilePath,

This is primarily a local attack scenario, but is relevant in multi-tenant/shared-host deployments or when environment variables / state directory permissions are not tightly controlled.

Recommendation

Harden state-file writes against link-following in parent directories and against unsafe state dir overrides.

Recommended defense-in-depth (choose what is feasible cross-platform):

Ensure the state root is not attacker-writable and is created with 0700 (already attempted via ensureDirMode) and document this requirement.
Reject symlinks/junctions in the state path components before writing. For example, walk each path segment under stateDir and lstat() it; if any component is a symbolic link (or on Windows, a reparse point), refuse to write.
Prefer opening the destination securely (POSIX): use open() with O_NOFOLLOW (where available) and write via file-descriptor-based APIs, or use a library that supports O_NOFOLLOW and secure atomic replace.
Constrain OPENCLAW_STATE_DIR: if running in a context where untrusted parties can influence env vars, ignore OPENCLAW_STATE_DIR or require it to be an absolute path under an allowlisted base.

Example (conceptual) path-component validation:

import fs from "node:fs/promises";
import path from "node:path";

async function assertNoSymlinksUnder(root: string, targetDir: string) {
  const rel = path.relative(root, targetDir);
  const parts = rel.split(path.sep).filter(Boolean);
  let cur = root;
  for (const p of parts) {
    cur = path.join(cur, p);
    const st = await fs.lstat(cur);
    if (st.isSymbolicLink()) throw new Error(`refusing to write under symlink: ${cur}`);
  }
}

Apply this check (or equivalent) before calling writeJsonFileAtomically / persistent dedupe writes.

Analyzed PR: #66760 at commit f53bcc6

_{Last updated on: 2026-04-14T21:36:39Z}

…rt (#66721) Adds a per-account startup catchup pass that queries BB Server for messages delivered since a persisted cursor and re-feeds each through the existing processMessage pipeline. Fixes the missed-message hole documented in #66721: BB's WebhookService is fire-and-forget on POST failure and MessagePoller does not replay webhook-receiver recovery, so inbound messages delivered while the gateway was down/wedged/restarting were permanently lost. Design - New `extensions/bluebubbles/src/catchup.ts`: - `fetchBlueBubblesMessagesSince(sinceMs, limit, opts)` calls `/api/v1/message/query` with `{after, sort:"ASC", with:[chat, chat.participants, attachment]}` so replays carry the same shape `normalizeWebhookMessage` already handles on live dispatch. - `loadBlueBubblesCatchupCursor` / `saveBlueBubblesCatchupCursor` persist a single `{lastSeenMs, updatedAt}` per account at `<stateDir>/bluebubbles/catchup/<accountId>__<hash>.json`, using the plugin-sdk's atomic JSON helpers. File layout mirrors the inbound-dedupe store from #66230. - `runBlueBubblesCatchup(target)` orchestrates: clamp config, skip recent runs (<30s), clamp window to `maxAgeMinutes`, fetch, filter `isFromMe` and pre-cursor records, dispatch to `processMessage`, advance cursor to `nowMs` on success. - `monitor.ts`: after the webhook target registers, fire catchup as a background task; errors are logged but never block readiness. - `config-schema.ts`: new optional `catchup` block (`enabled`, `maxAgeMinutes`, `perRunLimit`, `firstRunLookbackMinutes`), defaults on with 2h lookback / 50 msg cap / 30-min first-run lookback. Safety - Goes through the same `processMessage` path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch all apply unchanged. - Dedupes against #66230's persistent inbound GUID cache, so a webhook delivery that already succeeded cannot be reprocessed by catchup. - Never dispatches `isFromMe` records (double-checked before and after normalization) so the agent's own sends cannot enter the inbound path. - Cursor is only advanced on a successful query; a failed fetch leaves the prior cursor in place so the next run retries the same window. - 30s minimum interval between runs prevents churn on rolling restarts. - Hard ceilings: 12h max lookback, 500 messages per run. Validation - New scoped tests in `extensions/bluebubbles/src/catchup.test.ts` (14 cases covering cursor round-trip, account scoping, FS-unsafe account IDs, firstRunLookback, maxAge clamp, enabled:false, rapid restart skip, isFromMe filter, query failure preserving cursor, per-message failure isolation, and pre-cursor defense-in-depth). - Full BlueBubbles suite (403/403) and `pnpm check` green. - End-to-end verified on macOS with a live BB Server 1.9.x: stop gateway, send N messages to monitored threads (verified 3/3 ECONNREFUSED in BB's `main.log`), restart gateway; catchup queried and re-POSTed all N missed messages through `processMessage`, and subsequent boots were no-ops (empty delta). Retires the reference implementation at `openclaw-agents/lobster/scripts/bb-catchup.sh` in the Lobster workspace, which has been running this exact shape for ~4 weeks. Closes #66721. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8cd511dae7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

greptile-apps · 2026-04-14T19:35:14Z

Greptile Summary

Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages missed while the gateway was unreachable and replays them through the existing processMessage pipeline. The implementation is well-designed: it integrates cleanly with #66230's persistent inbound-dedupe (which drops already-handled GUIDs), double-filters isFromMe records, uses atomic cursor persistence, and is fire-and-forget from monitor.ts so it never blocks the channel-ready signal.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/observability suggestions that don't affect correctness.

No P0/P1 issues found. The two P2 items are: (1) resolveStateDirFromEnv is duplicated between catchup.ts and inbound-dedupe.ts — a drift risk but both copies are currently identical; and (2) no warning is emitted when fetchedCount hits perRunLimit, making silent truncation during long outages hard to diagnose. Core logic, dedupe integration, cursor semantics, and test coverage are solid.

extensions/bluebubbles/src/catchup.ts — see comments on resolveStateDirFromEnv duplication and the perRunLimit truncation warning.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/bluebubbles/src/catchup.ts
Line: 50-59

Comment:
**Duplicated `resolveStateDirFromEnv` across catchup.ts and inbound-dedupe.ts**

This function is byte-for-byte identical to the one in `inbound-dedupe.ts`. If either copy drifts (e.g., adding a new env-var override), the two stores will resolve different directories and cursor/dedupe files will be written to different roots, silently breaking the catchup ↔ dedupe pairing. Extract to a shared local utility (e.g. `state-dir.ts` inside the plugin) and import it from both files.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/bluebubbles/src/catchup.ts
Line: 308-319

Comment:
**No warning logged when fetch results are at the `perRunLimit` cap**

When BB returns exactly `perRunLimit` messages it has likely truncated the result set — the remaining messages (with `dateCreated` between the last fetched message and `nowMs`) are permanently unreachable because the cursor advances to `nowMs`. The existing log line shows `fetched=N` but gives no indication that `N` equals the configured cap. Emitting a distinct warning when `messages.length >= perRunLimit` would help operators know to raise the limit before a long outage silently drops messages.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(bluebubbles): replay missed webhook..." | Re-trigger Greptile}

…rn on perRunLimit truncation Addresses review feedback on PR #66760: - Greptile P2: catchup.ts and inbound-dedupe.ts each had their own `resolveStateDirFromEnv`. Catchup's was a half-baked local copy that missed `~` expansion and the legacy/new state-dir fallback that the canonical `openclaw/plugin-sdk/state-paths.resolveStateDir` provides. Switch catchup to the canonical resolver so the catchup cursor and the inbound-dedupe state are guaranteed to land under the same root, preserving the dedupe-after-replay invariant the catchup design depends on. Test isolation preserved by honoring an explicit `OPENCLAW_STATE_DIR` override before the VITEST/test-mode default. - Greptile P2 + Aisle CWE-532-adjacent: when the BB result hits `perRunLimit`, emit a distinct WARNING through the runtime `error` channel pointing at the config knob to raise. Previously the run summary just showed `fetched=N` with no signal that older messages in the window may have been silently truncated (the cursor still advances to nowMs, so they're unreachable on the next sweep). - Tests: 2 new cases covering "warn on cap-hit" and "no warn below cap"; full BB suite remains 405/405 (was 403/403; +2 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e8ddbaa61

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…kew gate Addresses Codex review findings on PR #66760: - P1: Avoid committing catchup cursor after replay failures Previously the cursor advanced to nowMs unconditionally, so a transient processMessage failure during startup catchup permanently dropped the failing record — the next sweep queried only `after nowMs` and never retried it. Now we track the earliest timestamp where processMessage threw and hold the cursor just before it (still clamped to >= the prior cursor and <= nowMs), so retries pick up exactly the failed records on the next run. The inbound-dedupe cache from #66230 keeps successfully-replayed messages from being re-processed on retry. Normalize failures (record didn't yield a usable NormalizedWebhookMessage) are still treated as permanent skips, so a malformed payload doesn't wedge catchup forever. - P2: Skip min-interval gate when stored cursor is in the future If the host clock jumped backwards (NTP correction, manual adjust), `nowMs - existing.lastSeenMs` was negative, which still satisfies `< MIN_INTERVAL_MS`, so catchup returned null repeatedly until wall time caught up. Added `nowMs >= existing.lastSeenMs` precondition so future-dated cursors stop disabling catchup. Tests: 3 new cases (cursor-held-on-failure, clamp-to-prior-cursor, clock-skew-bypass-gate); BB suite remains 408/408. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e catchup overrides Addresses two more Codex P2 findings on PR #66760: - Enforce maxAgeMinutes on first catchup run Previously the no-cursor branch used `nowMs - firstRunLookbackMs` directly without applying `earliestAllowed`, so a config like `maxAgeMinutes: 5, firstRunLookbackMinutes: 30` silently scanned 30 minutes on first startup. Now both branches clamp against the earliestAllowed ceiling. - Preserve channel-level `catchup.enabled` in account overrides BlueBubbles account merging only deep-merges `network`. With catchup outside that list, an override like `accounts.work.catchup: { perRunLimit: 20 }` would replace the whole channel-level catchup block and silently drop a global `catchup.enabled: false`, re-enabling replay for that account. Adding `catchup` to `nestedObjectKeys` deep-merges the override on top of inherited defaults, matching the existing `network` precedent. Tests: 1 new case for the first-run maxAge clamp; BB suite 409/409. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00de3cc1b7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…covers via firstRunLookback Addresses Codex P2 finding on PR #66760 (cursor in future causes silent skip). Previously, after a host clock rollback (NTP correction or manual adjust), `existing.lastSeenMs` could be ahead of `nowMs`. The earlier `Math.max(existing.lastSeenMs, earliestAllowed)` then computed `windowStartMs` in the future, so the BB query ran with `after=future`, returned zero records, and we still saved cursor = nowMs at the end of the run — permanently skipping every real message delivered in the [earliestAllowed, nowMs] window. Now: a cursor strictly in the future is treated as if no cursor exists at all. The window falls through to the firstRunLookback path (clamped to maxAge), so catchup looks back a sensible distance and replays through the existing dedupe-protected pipeline. Cursor save at the end of the run repairs the future timestamp. Test: extends the previous "skips rate-limit gate" case to assert that the recovery window is firstRunLookback-clamped (not nowMs) and that the cursor is repaired to nowMs after the run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7990aeb7da

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80d2393528

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ge boundary on truncation Addresses two more Codex P1 findings on PR #66760: - Run catchup even when cursor age is under MIN_INTERVAL Catchup is the *only* mechanism that recovers messages dropped during the gateway-down window, and it runs once per gateway startup. The 30s MIN_INTERVAL_MS gate was protecting against ~nothing (a few extra BB queries on rolling restarts) at the cost of permanent message loss when restarts happened within that window. A real failure: t0 startup, cursor=t0; t0+10s gateway down, webhook ECONNREFUSED at t0+15s; t0+20s restart skips catchup entirely. Gate removed; bounded by perRunLimit/maxAge + dedupe-protected so the cost of always running is capped. - Keep cursor behind unfetched pages when limit is hit Previously a long outage with >perRunLimit messages would fetch the oldest perRunLimit (sort:ASC), process them, and then advance the cursor to nowMs — permanently skipping the unfetched newer tail. Now we track the latest fetched timestamp regardless of fate, and on truncation (fetchedCount === perRunLimit) the cursor advances only to that page boundary so the next gateway startup picks up the rest. Updated the existing perRunLimit warn to reflect the new recoverable-on-next-startup semantics. Tests: replaced obsolete "skips rapid restart" with "runs catchup even on rapid restarts"; added "advances cursor only to last fetched ts when result is truncated". BB suite 410/410. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

omarshahine · 2026-04-14T20:38:41Z

Review feedback round-up

Thanks for the thorough reviews. Pushed 5 follow-up commits since the original 4a89afbe. Summary of what's in / not in:

Greptile (✅ both addressed)

Duplicated resolveStateDirFromEnv → 0e8ddbaa61 switches catchup to the canonical openclaw/plugin-sdk/state-paths.resolveStateDir that inbound-dedupe already uses, so the two stores can never drift apart. (Note: catchup tests use OPENCLAW_STATE_DIR for per-test isolation, so the test-mode default short-circuit now only fires when that env var is unset; explicit overrides win.)
No warning on perRunLimit truncation → 0e8ddbaa61 adds a distinct WARN line; ed4e403d7c then made truncation actually recoverable (cursor advances only to the page boundary), and the warn message was updated to reflect the new "next startup picks up the rest" semantics.

Codex (✅ all 7 addressed)

P1: cursor advances after replay failure → 00de3cc1b7 tracks the earliest processMessage failure timestamp and holds the cursor just before it. Successful replays from this run are dedupe-protected by fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053) #66230. Normalize failures (record didn't yield a NormalizedWebhookMessage) are still treated as permanent skips so a malformed payload doesn't wedge catchup forever.
P2: clock-skew negative-delta hits gate → addressed inside the broader fix in ed4e403d7c (see below).
P2: maxAgeMinutes not enforced on first run → 7990aeb7da clamps both the existing-cursor and first-run branches to earliestAllowed.
P2: account override drops catchup.enabled → 7990aeb7da adds catchup to nestedObjectKeys in accounts.ts, mirroring the existing network precedent so deep-merge semantics apply.
P2: future cursor causes silent skip → 80d2393528 treats a strictly-future cursor as if no cursor exists, falling through to firstRunLookback so the [earliestAllowed, nowMs] window does get queried; cursor save at the end repairs the future timestamp.
P1: run catchup even when cursor age is under MIN_INTERVAL → ed4e403d7c removes the MIN_INTERVAL_MS gate entirely. Catchup runs once per gateway startup and is the only recovery mechanism, so skipping the second startup's pass after a quick restart loses any messages that arrived during the brief downtime. The bounded query (perRunLimit, maxAge) and fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053) #66230's dedupe cap the cost of running every restart. (This also obviates the clock-skew gate that the P2 above asked for — solved by removal rather than >= precondition.)
P1: keep cursor behind unfetched pages when limit is hit → ed4e403d7c tracks latestFetchedTs regardless of message fate, and on truncation (fetchedCount === perRunLimit) the cursor advances only to the page boundary so the next gateway startup picks up the unfetched tail. Previously the unfetched tail past the cap would have been permanently unreachable — exactly the failure mode the feature is meant to prevent.

Tests grew from 14 → 21 cases; full BlueBubbles suite remains 410/410; pnpm check green.

Aisle (5 findings, summary)

fix: add @lid format support and allowFrom wildcard handling #1 Password in URL query (catchup.ts) — Real concern, but follows the existing BB plugin convention used everywhere (history.ts, probe.ts, send.ts). Fixing only catchup would create inconsistency; this should be a cross-cutting BB-wide PR that updates buildBlueBubblesApiUrl to use header-based auth, and is out of scope here.
Images not passed to Claude CLI - only path reference in text #4 OPENCLAW_STATE_DIR symlink risk (catchup.ts) — Same pattern as fix(bluebubbles): dedupe inbound webhooks across restarts (#19176, #12053) #66230's inbound-dedupe (catchup now actually shares its resolver via the SDK helper, per the Greptile fix). The proper hardening (lstat + reject symlink) belongs in the plugin-sdk's resolveStateDir helper itself so all callers benefit, not bolted onto extension files.
Login fails with 'WebSocket Error (socket hang up)' ECONNRESET #2 sendPolicy "deny", WA business, groups & office hours #3 reply-media-paths symlink, CLI: add Opencode integration #5 PII in dedupe drop logs — Not in this PR's diff (src/auto-reply/... and extensions/bluebubbles/src/monitor-processing.ts:634-639 are untouched here). These appear to have been picked up because Aisle scanned the pre-rebase SHA 8cd511d when this PR's diff still included ghost commits from the force-pushed dedupe base. They properly belong to the already-merged PRs fix: sendPolicy deny should suppress delivery, not inbound processing (#53328) #65461 / fix(bluebubbles): lazy-refresh Private API status on send (#43764) #65447 / docs(changelog): backfill #65447 and #65461 entries #66220 (or follow-up PRs against main), not here.

Live end-to-end repro of the original failure mode is in the PR body and reproduced post-fix on macOS 26.3 / BB Server 1.9.x. Ready for another look.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed4e403d7c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T20:42:06Z

+  } else if (isTruncated) {
+    // Use latestFetchedTs (clamped to >= prior cursor and <= nowMs) so the
+    // next run starts where this page ended.
+    nextCursorMs = Math.min(Math.max(latestFetchedTs, cursorBefore ?? windowStartMs), nowMs);


Avoid dropping same-timestamp tail at truncation boundary

When a catchup run hits perRunLimit, this branch advances the cursor to latestFetchedTs. If there are additional unfetched messages with that exact same dateCreated value (which is realistic when timestamps are coarse or bursts occur), the next run starts from that same timestamp and the loop’s ts <= windowStartMs guard drops them, so they are never replayed. In outage backlogs this causes permanent message loss even though catchup is enabled; truncated pagination needs a stable tie-breaker (or a cursor rewind below the boundary) to avoid skipping equal-timestamp records.

Useful? React with 👍 / 👎.

…66810 The inbound-dedupe PR was reopened as #66810 (#66230 closed without merging at 21:10:28Z, #66810 created 7s later on the same lobster/bb-inbound-dedupe branch). Updating code comments and the catchup CHANGELOG entry to point at the live PR. The branch dependency chain (this PR stacked on lobster/bb-inbound-dedupe) is unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…66816 The inbound-dedupe PR was reopened again as #66816 (closed-without-merge trail: #66230 → #66810 → #66816). The branch was force-pushed and the new PR uses the parallel `fix/bb-inbound-dedupe` branch. Updating code comments and the catchup CHANGELOG entry to point at the live PR. Stacking on top of the dedupe branch will be addressed in a follow-up rebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rn on perRunLimit truncation Addresses review feedback on PR #66760: - Greptile P2: catchup.ts and inbound-dedupe.ts each had their own `resolveStateDirFromEnv`. Catchup's was a half-baked local copy that missed `~` expansion and the legacy/new state-dir fallback that the canonical `openclaw/plugin-sdk/state-paths.resolveStateDir` provides. Switch catchup to the canonical resolver so the catchup cursor and the inbound-dedupe state are guaranteed to land under the same root, preserving the dedupe-after-replay invariant the catchup design depends on. Test isolation preserved by honoring an explicit `OPENCLAW_STATE_DIR` override before the VITEST/test-mode default. - Greptile P2 + Aisle CWE-532-adjacent: when the BB result hits `perRunLimit`, emit a distinct WARNING through the runtime `error` channel pointing at the config knob to raise. Previously the run summary just showed `fetched=N` with no signal that older messages in the window may have been silently truncated (the cursor still advances to nowMs, so they're unreachable on the next sweep). - Tests: 2 new cases covering "warn on cap-hit" and "no warn below cap"; full BB suite remains 405/405 (was 403/403; +2 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…kew gate Addresses Codex review findings on PR #66760: - P1: Avoid committing catchup cursor after replay failures Previously the cursor advanced to nowMs unconditionally, so a transient processMessage failure during startup catchup permanently dropped the failing record — the next sweep queried only `after nowMs` and never retried it. Now we track the earliest timestamp where processMessage threw and hold the cursor just before it (still clamped to >= the prior cursor and <= nowMs), so retries pick up exactly the failed records on the next run. The inbound-dedupe cache from #66230 keeps successfully-replayed messages from being re-processed on retry. Normalize failures (record didn't yield a usable NormalizedWebhookMessage) are still treated as permanent skips, so a malformed payload doesn't wedge catchup forever. - P2: Skip min-interval gate when stored cursor is in the future If the host clock jumped backwards (NTP correction, manual adjust), `nowMs - existing.lastSeenMs` was negative, which still satisfies `< MIN_INTERVAL_MS`, so catchup returned null repeatedly until wall time caught up. Added `nowMs >= existing.lastSeenMs` precondition so future-dated cursors stop disabling catchup. Tests: 3 new cases (cursor-held-on-failure, clamp-to-prior-cursor, clock-skew-bypass-gate); BB suite remains 408/408. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e catchup overrides Addresses two more Codex P2 findings on PR #66760: - Enforce maxAgeMinutes on first catchup run Previously the no-cursor branch used `nowMs - firstRunLookbackMs` directly without applying `earliestAllowed`, so a config like `maxAgeMinutes: 5, firstRunLookbackMinutes: 30` silently scanned 30 minutes on first startup. Now both branches clamp against the earliestAllowed ceiling. - Preserve channel-level `catchup.enabled` in account overrides BlueBubbles account merging only deep-merges `network`. With catchup outside that list, an override like `accounts.work.catchup: { perRunLimit: 20 }` would replace the whole channel-level catchup block and silently drop a global `catchup.enabled: false`, re-enabling replay for that account. Adding `catchup` to `nestedObjectKeys` deep-merges the override on top of inherited defaults, matching the existing `network` precedent. Tests: 1 new case for the first-run maxAge clamp; BB suite 409/409. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…covers via firstRunLookback Addresses Codex P2 finding on PR #66760 (cursor in future causes silent skip). Previously, after a host clock rollback (NTP correction or manual adjust), `existing.lastSeenMs` could be ahead of `nowMs`. The earlier `Math.max(existing.lastSeenMs, earliestAllowed)` then computed `windowStartMs` in the future, so the BB query ran with `after=future`, returned zero records, and we still saved cursor = nowMs at the end of the run — permanently skipping every real message delivered in the [earliestAllowed, nowMs] window. Now: a cursor strictly in the future is treated as if no cursor exists at all. The window falls through to the firstRunLookback path (clamped to maxAge), so catchup looks back a sensible distance and replays through the existing dedupe-protected pipeline. Cursor save at the end of the run repairs the future timestamp. Test: extends the previous "skips rate-limit gate" case to assert that the recovery window is firstRunLookback-clamped (not nowMs) and that the cursor is repaired to nowMs after the run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ge boundary on truncation Addresses two more Codex P1 findings on PR #66760: - Run catchup even when cursor age is under MIN_INTERVAL Catchup is the *only* mechanism that recovers messages dropped during the gateway-down window, and it runs once per gateway startup. The 30s MIN_INTERVAL_MS gate was protecting against ~nothing (a few extra BB queries on rolling restarts) at the cost of permanent message loss when restarts happened within that window. A real failure: t0 startup, cursor=t0; t0+10s gateway down, webhook ECONNREFUSED at t0+15s; t0+20s restart skips catchup entirely. Gate removed; bounded by perRunLimit/maxAge + dedupe-protected so the cost of always running is capped. - Keep cursor behind unfetched pages when limit is hit Previously a long outage with >perRunLimit messages would fetch the oldest perRunLimit (sort:ASC), process them, and then advance the cursor to nowMs — permanently skipping the unfetched newer tail. Now we track the latest fetched timestamp regardless of fate, and on truncation (fetchedCount === perRunLimit) the cursor advances only to that page boundary so the next gateway startup picks up the rest. Updated the existing perRunLimit warn to reflect the new recoverable-on-next-startup semantics. Tests: replaced obsolete "skips rapid restart" with "runs catchup even on rapid restarts"; added "advances cursor only to last fetched ts when result is truncated". BB suite 410/410. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

omarshahine requested a review from a team as a code owner April 14, 2026 19:29

openclaw-barnacle Bot added channel: bluebubbles Channel integration: bluebubbles channel: slack Channel integration: slack gateway Gateway runtime scripts Repository scripts agents Agent runtime and tooling extensions: qa-lab size: XL maintainer Maintainer-authored PR labels Apr 14, 2026

omarshahine force-pushed the lobster/bb-catchup branch from 8cd511d to 4a89afb Compare April 14, 2026 19:32

openclaw-barnacle Bot added size: L and removed channel: slack Channel integration: slack gateway Gateway runtime scripts Repository scripts agents Agent runtime and tooling extensions: qa-lab size: XL labels Apr 14, 2026

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread extensions/bluebubbles/src/catchup.ts Outdated

Comment thread extensions/bluebubbles/src/catchup.ts Outdated

greptile-apps Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread extensions/bluebubbles/src/catchup.ts

Comment thread extensions/bluebubbles/src/catchup.ts Outdated

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread extensions/bluebubbles/src/catchup.ts Outdated

Comment thread extensions/bluebubbles/src/catchup.ts

Lobster and others added 2 commits April 14, 2026 13:09

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread extensions/bluebubbles/src/catchup.ts Outdated

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread extensions/bluebubbles/src/catchup.ts Outdated

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread extensions/bluebubbles/src/catchup.ts

openclaw-barnacle Bot added size: XL and removed size: L labels Apr 14, 2026

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

omarshahine force-pushed the lobster/bb-inbound-dedupe branch from 4d63af6 to 82bce26 Compare April 14, 2026 21:02

Lobster and others added 2 commits April 14, 2026 14:13

omarshahine deleted the branch lobster/bb-inbound-dedupe April 14, 2026 22:48

omarshahine closed this Apr 14, 2026

Uh oh!

Conversation

omarshahine commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Why this approach

Safety

Validation

Automated

Live end-to-end (macOS, BB Server 1.9.x, 2026-04-14)

Test plan

Order of operations

Uh oh!

aisle-research-bot Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Description

Recommendation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

omarshahine commented Apr 14, 2026

Review feedback round-up

Greptile (✅ both addressed)

Codex (✅ all 7 addressed)

Aisle (5 findings, summary)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

omarshahine commented Apr 14, 2026 •

edited

Loading

aisle-research-bot Bot commented Apr 14, 2026 •

edited

Loading