fix(imessage): always-on inbound recovery and dedupe#91335
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 8, 2026, 3:46 AM ET / 07:46 UTC. Summary PR surface: Source +569, Tests +384, Docs -43, Generated 0. Total +910 across 18 files. Reproducibility: yes. source-reproducible: current main's live iMessage monitor path has no persistent inbound replay guard or stale age fence, and upstream Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the monitor-layer dedupe/recovery design only after the PR body is updated with final-head remote Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible: current main's live iMessage monitor path has no persistent inbound replay guard or stale age fence, and upstream Is this the best way to solve the issue? Mostly yes: fixing before iMessage dispatch with AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 538d36eaaaa6. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +569, Tests +384, Docs -43, Generated 0. Total +910 across 18 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Replaces the opt-in catchup subsystem with always-on inbound replay protection that brings iMessage in line with the other channels, and fixes #89237 (stale backlog dispatched as fresh after bridge recovery). - New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/ release) plus a stale-backlog age fence that suppresses live rows whose send date is materially older than arrival (logged, never silent). - monitor-provider: claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush / release on dispatch failure (per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the local startup since_rowid watermark so startup-window rows are not skipped. - Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the channels.imessage.catchup schema, cursor migration, and config-guard nag. Back-compat: strip the retired key before validation; new imessage doctor contract reports + removes it on doctor --fix. - Docs updated for the new recovery model. Net -947 prod LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Builds downtime recovery on the new inbound dedupe instead of restoring the old catchup subsystem. On startup the monitor passes the last dispatched rowid (a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg replays the messages that landed while the gateway was down, then tails live. The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping is needed. - recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid. - monitor-provider: since_rowid = cursor (capped to the most recent IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary so replayed rows (<= boundary) use the wider recovery window and live rows (> boundary) keep the tight #89237 fence; advance the cursor on commit. - Local only: remote SSH cliPath cannot read chat.db, so it tails from the current rowid (suppress-and-move-on) as before. Restores missed-message recovery that the catchup removal dropped, with no config and a fraction of the old LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e9b5b23 to
408d00a
Compare
|
@clawsweeper re-review Reshaped since the last review: instead of deprecating catchup outright, this now replaces the ~850-LOC catchup subsystem with downtime recovery built on the new inbound dedupe (startup |
|
🦞🧹 I asked ClawSweeper to review this item again. |
…safe Addresses two cursor-state regressions in the downtime-recovery path: - Failed replay rows could be skipped forever: a released (failed) row keeps its dedupe claim for retry, but a later successful row in the same flush advanced the cursor past it, so the next startup's since_rowid skipped it. Hold a per-session floor at the lowest released rowid and never advance the cursor past it. - Suppressed live backlog could be re-delivered after a restart: a live row suppressed under the tight live fence was not recorded, so after a restart it fell under the wider recovery window (its rowid now below the new boundary) and was delivered. Commit its dedupe key on suppression so the recovery replay treats it as already handled. Both caught by Codex autoreview. Adds regression tests for the floor and the suppression record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hash the composite fallback key's variable parts (conversation, sender, created_at, text) so the key is length-bounded regardless of message text. The persistent dedupe store already hashes keys internally, so this was not a live overflow, but the bounded key removes the dependency on that and keeps the fallback fail-open. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@clawsweeper re-review Your P1 ("users who enabled catchup would silently move to suppress-and-move-on and miss downtime messages") was reviewed against the pre-reshape commit
On the Two cursor-state edges in the new recovery path (failed-replay cursor leapfrog, suppressed-row re-delivery after restart) and a bounded-key concern were caught by Codex autoreview and fixed in |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
The since_rowid replay runs over the imsg RPC client, so driving it from the persisted recovery cursor (not the local chat.db boundary) makes downtime recovery work for remote SSH cliPath gateways — the topology the old RPC-based catchup served and that the rowid-boundary-only version regressed. Local setups keep the wider, capped recovery window via the chat.db boundary; remote uses the live age-fence window. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…grade A one-time, self-cleaning migration: when the recovery cursor is empty on the first startup after upgrade, seed it from the retired imessage.catchup-cursors lastSeenRowid and consume the legacy entry. Without this a user who had catchup enabled would not replay messages missed across the upgrade restart. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Maintainer fix pushed in d84c84d. What changed:
Proof:
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
* feat(imessage): always-on inbound recovery, deprecate catchup Replaces the opt-in catchup subsystem with always-on inbound replay protection that brings iMessage in line with the other channels, and fixes openclaw#89237 (stale backlog dispatched as fresh after bridge recovery). - New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/ release) plus a stale-backlog age fence that suppresses live rows whose send date is materially older than arrival (logged, never silent). - monitor-provider: claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush / release on dispatch failure (per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the local startup since_rowid watermark so startup-window rows are not skipped. - Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the channels.imessage.catchup schema, cursor migration, and config-guard nag. Back-compat: strip the retired key before validation; new imessage doctor contract reports + removes it on doctor --fix. - Docs updated for the new recovery model. Net -947 prod LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(imessage): recover downtime messages via since_rowid replay Builds downtime recovery on the new inbound dedupe instead of restoring the old catchup subsystem. On startup the monitor passes the last dispatched rowid (a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg replays the messages that landed while the gateway was down, then tails live. The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping is needed. - recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid. - monitor-provider: since_rowid = cursor (capped to the most recent IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary so replayed rows (<= boundary) use the wider recovery window and live rows (> boundary) keep the tight openclaw#89237 fence; advance the cursor on commit. - Local only: remote SSH cliPath cannot read chat.db, so it tails from the current rowid (suppress-and-move-on) as before. Restores missed-message recovery that the catchup removal dropped, with no config and a fraction of the old LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): make recovery cursor advance failure- and suppression-safe Addresses two cursor-state regressions in the downtime-recovery path: - Failed replay rows could be skipped forever: a released (failed) row keeps its dedupe claim for retry, but a later successful row in the same flush advanced the cursor past it, so the next startup's since_rowid skipped it. Hold a per-session floor at the lowest released rowid and never advance the cursor past it. - Suppressed live backlog could be re-delivered after a restart: a live row suppressed under the tight live fence was not recorded, so after a restart it fell under the wider recovery window (its rowid now below the new boundary) and was delivered. Commit its dedupe key on suppression so the recovery replay treats it as already handled. Both caught by Codex autoreview. Adds regression tests for the floor and the suppression record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): bound the GUID-less replay key length Hash the composite fallback key's variable parts (conversation, sender, created_at, text) so the key is length-bounded regardless of message text. The persistent dedupe store already hashes keys internally, so this was not a live overflow, but the bounded key removes the dependency on that and keeps the fallback fail-open. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): recover downtime messages on remote cliPath setups too The since_rowid replay runs over the imsg RPC client, so driving it from the persisted recovery cursor (not the local chat.db boundary) makes downtime recovery work for remote SSH cliPath gateways — the topology the old RPC-based catchup served and that the rowid-boundary-only version regressed. Local setups keep the wider, capped recovery window via the chat.db boundary; remote uses the live age-fence window. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): seed recovery cursor from retired catchup cursor on upgrade A one-time, self-cleaning migration: when the recovery cursor is empty on the first startup after upgrade, seed it from the retired imessage.catchup-cursors lastSeenRowid and consume the legacy entry. Without this a user who had catchup enabled would not replay messages missed across the upgrade restart. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): preserve catchup recovery on upgrade --------- Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…26.6.6) (#1040) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.5` → `2026.6.6` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.6.6`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202666) [Compare Source](openclaw/openclaw@v2026.6.5...v2026.6.6) ##### Highlights - Security boundaries are substantially tighter across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP access, native search policy, elevated sender checks, deleted-agent ACP bypasses, loopback tools, Discord moderation, and Teams group actions; exec approvals now fail closed on timeout. ([#​91529](openclaw/openclaw#91529), [#​91618](openclaw/openclaw#91618), [#​91615](openclaw/openclaw#91615), [#​91619](openclaw/openclaw#91619), [#​91741](openclaw/openclaw#91741), [#​91745](openclaw/openclaw#91745), [#​91746](openclaw/openclaw#91746), [#​91748](openclaw/openclaw#91748), [#​91749](openclaw/openclaw#91749), [#​91750](openclaw/openclaw#91750), [#​91751](openclaw/openclaw#91751), [#​91752](openclaw/openclaw#91752), [#​91763](openclaw/openclaw#91763), [#​89938](openclaw/openclaw#89938)) Thanks [@​joshavant](https://github.com/joshavant), [@​pgondhi987](https://github.com/pgondhi987), [@​mmaps](https://github.com/mmaps), [@​eleqtrizit](https://github.com/eleqtrizit), [@​shakkernerd](https://github.com/shakkernerd), and [@​drobison00](https://github.com/drobison00). - Telegram delivery is safer and more coherent: account-scoped topics route to the right agent, streamed text survives tool calls, `/compact` works on generic ingress, callback handling uses concrete APIs, draft chunking is shared, durable dispatch dedupe moved into the SDK, and unauthorized DM text stays out of cache and prompt context. ([#​91189](openclaw/openclaw#91189), [#​88682](openclaw/openclaw#88682), [#​89588](openclaw/openclaw#89588), [#​90212](openclaw/openclaw#90212), [#​91876](openclaw/openclaw#91876), [#​91874](openclaw/openclaw#91874), [#​91904](openclaw/openclaw#91904), [#​91478](openclaw/openclaw#91478), [#​91915](openclaw/openclaw#91915)) Thanks [@​codysai001](https://github.com/codysai001), [@​alexzhu0](https://github.com/alexzhu0), [@​joelnishanth](https://github.com/joelnishanth), [@​snowzlm](https://github.com/snowzlm), [@​obviyus](https://github.com/obviyus), and [@​sallyom](https://github.com/sallyom). - iMessage recovery and delivery now cover always-on inbound restart, durable echo markers, block streaming, idle approval discovery, hardened outbound transport, and actionable inbound startup diagnostics. ([#​91335](openclaw/openclaw#91335), [#​91449](openclaw/openclaw#91449), [#​88969](openclaw/openclaw#88969), [#​88530](openclaw/openclaw#88530), [#​91783](openclaw/openclaw#91783), [#​91785](openclaw/openclaw#91785)) Thanks [@​omarshahine](https://github.com/omarshahine), [@​jmissig](https://github.com/jmissig), and [@​colmbrogan](https://github.com/colmbrogan). - Browser and MCP connectivity gained existing-session CDP support, discovered WebSocket validation, default-profile `cdpUrl` handling, safer browser-output boundaries, Streamable HTTP loopback transport, corrected OAuth/SSE authorization handling, and broader schema compatibility. ([#​91422](openclaw/openclaw#91422), [#​89851](openclaw/openclaw#89851), [#​91736](openclaw/openclaw#91736), [#​91747](openclaw/openclaw#91747), [#​91451](openclaw/openclaw#91451), [#​80143](openclaw/openclaw#80143)) Thanks [@​pgondhi987](https://github.com/pgondhi987), [@​anagnorisis2peripeteia](https://github.com/anagnorisis2peripeteia), [@​lifuyue](https://github.com/lifuyue), [@​eleqtrizit](https://github.com/eleqtrizit), [@​LiuwqGit](https://github.com/LiuwqGit), and [@​HemantSudarshan](https://github.com/HemantSudarshan). - Control UI startup and first-reply latency are lower through cached model metadata, removal of the startup catalog wait, lazy slash-command loading, and first-event tracing with slow-reply diagnostics. ([#​91531](openclaw/openclaw#91531), [#​91538](openclaw/openclaw#91538), [#​91568](openclaw/openclaw#91568), [#​91583](openclaw/openclaw#91583), [#​91598](openclaw/openclaw#91598)) - Provider support expands with OpenRouter OAuth onboarding and Claude Fable 5 adaptive thinking, while Codex sessions keep correct compaction ownership, local models skip guardian review, dynamic tool progress normalizes cleanly, and Gemma 4 reasoning replay is preserved. ([#​91830](openclaw/openclaw#91830), [#​91882](openclaw/openclaw#91882), [#​91590](openclaw/openclaw#91590), [#​88630](openclaw/openclaw#88630), [#​88768](openclaw/openclaw#88768), [#​91696](openclaw/openclaw#91696)) Thanks [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@​joshavant](https://github.com/joshavant), [@​bdjben](https://github.com/bdjben), and [@​Coder-Wangyankun](https://github.com/Coder-Wangyankun). ##### Changes - CLI progress: emit Claude CLI commentary progress events and bridge inter-tool commentary into channel progress without exposing internal protocol scaffolding. ([#​89834](openclaw/openclaw#89834), [#​90883](openclaw/openclaw#90883)) Thanks [@​anagnorisis2peripeteia](https://github.com/anagnorisis2peripeteia). - Observability: allow trusted diagnostics channels to capture tool input/output content, add first-assistant-event traces, and warn on slow initial replies. ([#​91256](openclaw/openclaw#91256), [#​91568](openclaw/openclaw#91568), [#​91583](openclaw/openclaw#91583)) Thanks [@​amknight](https://github.com/amknight). - Plugins/ClawHub: dogfood reusable package publishing, let dry runs skip publish approval, allow declared installed trusted hooks, report managed plugin version drift, and warn instead of failing on retired Skill Workshop configuration. ([#​91574](openclaw/openclaw#91574), [#​91591](openclaw/openclaw#91591), [#​90004](openclaw/openclaw#90004), [#​90927](openclaw/openclaw#90927), [#​90838](openclaw/openclaw#90838)) Thanks [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@​brokemac79](https://github.com/brokemac79), and [@​lonexreb](https://github.com/lonexreb). - Memory/providers: move the local llama.cpp runtime into its provider plugin, batch embeddings across files, persist the agent model catalog cache, and keep QMD JSON search one-shot while filtering stale REM recall previews. ([#​91324](openclaw/openclaw#91324), [#​89138](openclaw/openclaw#89138), [#​90457](openclaw/openclaw#90457), [#​91837](openclaw/openclaw#91837), [#​91851](openclaw/openclaw#91851)) Thanks [@​osolmaz](https://github.com/osolmaz), [@​mushuiyu886](https://github.com/mushuiyu886), [@​ai-hpc](https://github.com/ai-hpc), and [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Channels/mobile: add the QQBot group mention toggle, improve iPad and iPhone control surfaces, and expose the active connection host in the TUI footer. ([#​91423](openclaw/openclaw#91423), [#​91557](openclaw/openclaw#91557), [#​89909](openclaw/openclaw#89909)) Thanks [@​cxyhhhhh](https://github.com/cxyhhhhh), [@​Solvely-Colin](https://github.com/Solvely-Colin), and [@​baskduf](https://github.com/baskduf). - Performance: prewarm TUI runtime plugins, deduplicate plugin auto-enable fanout, trim dense text-delta snapshots, and reuse prepared startup model metadata. ([#​90782](openclaw/openclaw#90782), [#​89978](openclaw/openclaw#89978), [#​91580](openclaw/openclaw#91580), [#​91531](openclaw/openclaw#91531)) Thanks [@​RomneyDa](https://github.com/RomneyDa) and [@​ai-hpc](https://github.com/ai-hpc). ##### Fixes - Agent/session recovery: drop stale approval follow-ups after session rebind, remove drained reply-queue items by identity, recover stale main and visible replies, preserve Codex context-engine compaction ownership, lower the default compaction timeout to 180 seconds while respecting explicit configuration, and keep provider-failure terminal lifecycle state correct. ([#​85679](openclaw/openclaw#85679), [#​91450](openclaw/openclaw#91450), [#​91566](openclaw/openclaw#91566), [#​91840](openclaw/openclaw#91840), [#​91590](openclaw/openclaw#91590), [#​91361](openclaw/openclaw#91361), [#​91895](openclaw/openclaw#91895)) Thanks [@​openperf](https://github.com/openperf), [@​yetval](https://github.com/yetval), [@​joshavant](https://github.com/joshavant), [@​wangmiao0668000666](https://github.com/wangmiao0668000666), and [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - User-visible content boundaries: suppress Codex/Harmony protocol artifacts, neutralize browser and LanceDB memory media directives, redact transcript images, and preserve native `/compact` replies through source suppression. ([#​89151](openclaw/openclaw#89151), [#​91422](openclaw/openclaw#91422), [#​91425](openclaw/openclaw#91425), [#​91529](openclaw/openclaw#91529), [#​90212](openclaw/openclaw#90212)) Thanks [@​joelnishanth](https://github.com/joelnishanth), [@​pgondhi987](https://github.com/pgondhi987), [@​joshavant](https://github.com/joshavant), and [@​snowzlm](https://github.com/snowzlm). - Channel delivery: keep WhatsApp captured replies attached to the successor controller after restart, retry Feishu rate limits, preserve Mattermost thread replies, canonicalize LINE webhook paths, restore Discord reply hydration and runtime timeout exports, and show OpenAI Realtime WebRTC assistant transcripts. ([#​85823](openclaw/openclaw#85823), [#​89659](openclaw/openclaw#89659), [#​91684](openclaw/openclaw#91684), [#​91649](openclaw/openclaw#91649), [#​90263](openclaw/openclaw#90263), [#​91686](openclaw/openclaw#91686), [#​90426](openclaw/openclaw#90426)) Thanks [@​itsuzef](https://github.com/itsuzef), [@​ladygege](https://github.com/ladygege), [@​jacobtomlinson](https://github.com/jacobtomlinson), [@​fuller-stack-dev](https://github.com/fuller-stack-dev), and [@​shushushv](https://github.com/shushushv). - Cron: cancel active task runs cleanly, preserve terminal timeout/cancel state, and recover no-deliver tool warnings instead of silently losing the outcome. ([#​90666](openclaw/openclaw#90666), [#​90678](openclaw/openclaw#90678)) Thanks [@​ai-hpc](https://github.com/ai-hpc). - Gateway/config/auth: share the approval runtime socket token, replace arrays explicitly in `config.patch`, skip the deleted-agent guard only for valid ACP harness sessions, surface headless LaunchAgent state, verify SQLite auth migration before cleanup, and arm QMD startup maintenance. ([#​87105](openclaw/openclaw#87105), [#​91551](openclaw/openclaw#91551), [#​91219](openclaw/openclaw#91219), [#​91614](openclaw/openclaw#91614), [#​91740](openclaw/openclaw#91740), [#​91978](openclaw/openclaw#91978)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev) and [@​scotthuang](https://github.com/scotthuang). - Providers/Codex: clarify quota errors, restore the Codex synthetic usage line, canonicalize Codex protocol assets, require API-key auth for realtime voice, normalize ACP model refs, preserve Gemma 4 `reasoning_content`, and avoid guardian review for local models. ([#​91390](openclaw/openclaw#91390), [#​91709](openclaw/openclaw#91709), [#​91507](openclaw/openclaw#91507), [#​91567](openclaw/openclaw#91567), [#​88630](openclaw/openclaw#88630), [#​91696](openclaw/openclaw#91696)) Thanks [@​hxy91819](https://github.com/hxy91819), [@​brokemac79](https://github.com/brokemac79), [@​RomneyDa](https://github.com/RomneyDa), [@​joshavant](https://github.com/joshavant), and [@​Coder-Wangyankun](https://github.com/Coder-Wangyankun). - Updates/builds: recover package Gateway restarts after refresh failure, expose plugin convergence repair, fall back to Corepack in PATH-less pnpm environments, seed the correct Docker store packages, and keep ClawHub dry-run and publish paths reusable. ([#​91581](openclaw/openclaw#91581), [#​91599](openclaw/openclaw#91599), [#​91547](openclaw/openclaw#91547), [#​91591](openclaw/openclaw#91591)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev), [@​sallyom](https://github.com/sallyom), and [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen). - UI: require explicit user intent before opening chat sessions and drain restored chat queues after session switches. ([#​91480](openclaw/openclaw#91480)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Android: avoid the `dataSync` foreground-service type for persistent nodes. ([#​80082](openclaw/openclaw#80082)) Thanks [@​davelutztx](https://github.com/davelutztx). - Native hooks: bound relay lifetimes so abandoned native hook connections cannot linger indefinitely. ([#​91550](openclaw/openclaw#91550)) Thanks [@​joshavant](https://github.com/joshavant). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/1040
Summary
What problem does this PR solve?
chat.dbwith fresh ROWIDs but old send dates;imsg watchemits them and OpenClaw dispatched them as fresh user requests (iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237, "backlog bomb"). Separately, iMessage was the only channel with no inbound replay protection, and messages that arrived while the gateway was down were either lost (catchup off) or recovered by a heavy opt-in subsystem.Why does this matter now?
catchupsubsystem.What is the intended outcome?
createClaimableDedupe, the same primitive whatsapp/discord/signal/mattermost/zalo/line/nextcloud-talk use): claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush, release on dispatch failure (per dispatch unit, so a coalesced bucket cannot strand a sibling's claim).imsg watch.subscribeassince_rowid, so imsg replays the rows that landed while the gateway was down, then tails live. The dedupe drops anything already handled, so none of the old cursor/messages.history/retry bookkeeping is needed.What is intentionally out of scope?
chats.list+messages.historycatchup mechanism (deleted). Recovery needs localchat.dbaccess; over a remote SSHcliPaththe monitor tails from the current rowid (suppress-and-move-on).What does success look like?
What should reviewers focus on?
Linked context
Which issue does this close?
Closes #89237
Which issues, PRs, or discussions are related?
Related #91243 (supersedes the catchup half).
Was this requested by a maintainer or owner?
Maintainer-authored; fixes a
P1issue-rating: diamond lobsterreport.Real behavior proof (required for external PRs)
imsg0.11.0 / Messages.app. Empty allowlist so the test gateway sent nothing.cliPath, the exact setup iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 was reported on): a real inbound (age ~1.5s) under the 15min fence →CLAIMED+ENQUEUED(dispatched); with the fence forced to 1ms the same message →AGE-FENCE-SUPPRESSED(no enqueue). The fence gates exactly on send-date age.imsg/Messages on the gateway Mac; messages scripted from a second Mac so it is end-to-end):startup cursor(L)=127953 boundary(M)=127955 since_rowid=127953, theninbound rowid=127954 recovery=true text="DOWNTIME MSG A (sent while gateway down)" DELIVEREDandrowid=127955 recovery=true ... MSG B ... DELIVEREDrecovery=true); without the cursor/since_rowid they are skipped by imsg's self-fence. The iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 backlog (old date, live rowid) is still suppressed.[PROOF-...]instrumentation was added only on a throwaway local checkout to capture the traces and has been reverted; no host state was modified.Tests and validation
Which commands did you run?
node scripts/run-vitest.mjs extensions/imessage(537 pass) + config-guard + config-validationpnpm tsgo:core/tsgo:extensions/tsgo:extensions:test/tsgo:core:test(clean);oxlint/oxfmtclean; generated config artifacts + docs format in syncWhat regression coverage was added or updated?
inbound-dedupe.test.ts(claim/commit/release, retry, in-flight duplicate, composite-key round-trip, age-fence threshold/fail-open),recovery-cursor.test.ts(load/advance/monotonic/per-account),doctor-contract-api.test.ts(catchup strip),monitor.last-route.test.ts(startup since_rowid = cursor, recovery replay delivered vs live old suppressed, stale-vs-fresh). Catchup tests deleted.What failed before this fix, if known?
Risk checklist
Did user-visible behavior change? (
Yes/No)Yes. Inbound is deduped; the Push-flush backlog is suppressed (logged); downtime recovery is now automatic on local setups (was opt-in catchup).
Did config, environment, or migration behavior change? (
Yes/No)Yes.
channels.imessage.catchup.*is retired; existing configs still load (key stripped before validation) andopenclaw doctor --fixremoves it via a new iMessage doctor contract that also reports it.Did security, auth, secrets, network, or tool execution behavior change? (
Yes/No)No.
What is the highest-risk area?
The catchup config retirement and the claim/commit/release + rowid-boundary recovery wiring.
How is that risk mitigated?
Strip-before-validation (no load break) + doctor report/migrate; per-unit commit/release and the dual-threshold split are unit-tested; the dedupe backstops cursor imprecision; both behaviors were live-proven on real imsg/Messages.
Current review state
What is the next action?
Maintainer review; Greptile + CI.
What is still waiting on author, maintainer, CI, or external proof?
CI + Greptile.
Which bot or reviewer comments were addressed?
Codex
autoreviewflagged premature replay record, a check-then-record race, a coalesced GUID-less claim leak, and a startup-window message-loss window — all fixed; the branch-level Codex review came back clean.