Skip to content

Commit fc6400e

Browse files
omarshahineclaudevincentkoc
authored
fix(imessage): always-on inbound recovery and dedupe (#91335)
* feat(imessage): always-on inbound recovery, deprecate catchup Replaces the opt-in catchup subsystem with always-on inbound replay protection that brings iMessage in line with the other channels, and fixes #89237 (stale backlog dispatched as fresh after bridge recovery). - New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/ release) plus a stale-backlog age fence that suppresses live rows whose send date is materially older than arrival (logged, never silent). - monitor-provider: claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush / release on dispatch failure (per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the local startup since_rowid watermark so startup-window rows are not skipped. - Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the channels.imessage.catchup schema, cursor migration, and config-guard nag. Back-compat: strip the retired key before validation; new imessage doctor contract reports + removes it on doctor --fix. - Docs updated for the new recovery model. Net -947 prod LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(imessage): recover downtime messages via since_rowid replay Builds downtime recovery on the new inbound dedupe instead of restoring the old catchup subsystem. On startup the monitor passes the last dispatched rowid (a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg replays the messages that landed while the gateway was down, then tails live. The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping is needed. - recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid. - monitor-provider: since_rowid = cursor (capped to the most recent IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary so replayed rows (<= boundary) use the wider recovery window and live rows (> boundary) keep the tight #89237 fence; advance the cursor on commit. - Local only: remote SSH cliPath cannot read chat.db, so it tails from the current rowid (suppress-and-move-on) as before. Restores missed-message recovery that the catchup removal dropped, with no config and a fraction of the old LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): make recovery cursor advance failure- and suppression-safe Addresses two cursor-state regressions in the downtime-recovery path: - Failed replay rows could be skipped forever: a released (failed) row keeps its dedupe claim for retry, but a later successful row in the same flush advanced the cursor past it, so the next startup's since_rowid skipped it. Hold a per-session floor at the lowest released rowid and never advance the cursor past it. - Suppressed live backlog could be re-delivered after a restart: a live row suppressed under the tight live fence was not recorded, so after a restart it fell under the wider recovery window (its rowid now below the new boundary) and was delivered. Commit its dedupe key on suppression so the recovery replay treats it as already handled. Both caught by Codex autoreview. Adds regression tests for the floor and the suppression record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): bound the GUID-less replay key length Hash the composite fallback key's variable parts (conversation, sender, created_at, text) so the key is length-bounded regardless of message text. The persistent dedupe store already hashes keys internally, so this was not a live overflow, but the bounded key removes the dependency on that and keeps the fallback fail-open. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): recover downtime messages on remote cliPath setups too The since_rowid replay runs over the imsg RPC client, so driving it from the persisted recovery cursor (not the local chat.db boundary) makes downtime recovery work for remote SSH cliPath gateways — the topology the old RPC-based catchup served and that the rowid-boundary-only version regressed. Local setups keep the wider, capped recovery window via the chat.db boundary; remote uses the live age-fence window. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): seed recovery cursor from retired catchup cursor on upgrade A one-time, self-cleaning migration: when the recovery cursor is empty on the first startup after upgrade, seed it from the retired imessage.catchup-cursors lastSeenRowid and consume the legacy entry. Without this a user who had catchup enabled would not replay messages missed across the upgrade restart. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): preserve catchup recovery on upgrade --------- Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
1 parent 538d36e commit fc6400e

18 files changed

Lines changed: 1380 additions & 470 deletions
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
a5a97a8b484acd13e68604037c8d8f448699700103c6ea2186f5914ad35a0623 config-baseline.json
2-
b0d668dbd794d2f54738152a4bcfd2a306c7954901e78d4dfbde7545a8301ce5 config-baseline.core.json
3-
0637c9bdcb9517f56049dd786563366877458d35df575328a6b80a890c8bc915 config-baseline.channel.json
1+
37b56008790612b8293930b6a29d74490e98daa90f954fca9d133fcc28645c4c config-baseline.json
2+
75b64c2ea081369ba4306493313a8a4cd48b784145f92fed995e6b77a5df350d config-baseline.core.json
3+
17d64c9799dfa239a49493413f1100bdd9237e9b67aaeae331a4604dbc227023 config-baseline.channel.json
44
f9d1f50bfa8403891e76cd99dc1357cdece4a71e8ae18a39b190c2a14e6f97b0 config-baseline.plugin.json

docs/channels/imessage-from-bluebubbles.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -221,22 +221,22 @@ If the gateway logs `imessage: dropping group message from chat_id=<id>` or the
221221

222222
## Action parity at a glance
223223

224-
| Action | legacy BlueBubbles | bundled iMessage |
225-
| ---------------------------------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
226-
| Send text / SMS fallback || |
227-
| Send media (photo, video, file, voice) || |
228-
| Threaded reply (`reply_to_guid`) || ✅ (closes [#51892](https://github.com/openclaw/openclaw/issues/51892)) |
229-
| Tapback (`react`) || |
230-
| Edit / unsend (macOS 13+ recipients) || |
231-
| Send with screen effect || ✅ (closes part of [#9394](https://github.com/openclaw/openclaw/issues/9394)) |
232-
| Rich text bold / italic / underline / strikethrough || ✅ (typed-run formatting via attributedBody) |
233-
| Rename group / set group icon || |
234-
| Add / remove participant, leave group || |
235-
| Read receipts and typing indicator || ✅ (gated on private API probe) |
236-
| Same-sender DM coalescing || ✅ (DM-only; opt-in via `channels.imessage.coalesceSameSenderDms`) |
237-
| Catchup of inbound messages received while gateway is down | ✅ (webhook replay + history fetch) | ✅ (opt-in via `channels.imessage.catchup.enabled`; closes [#78649](https://github.com/openclaw/openclaw/issues/78649)) |
238-
239-
iMessage catchup is now available as an opt-in feature on the bundled plugin. On gateway startup, if `channels.imessage.catchup.enabled` is `true`, the gateway runs one `chats.list` + per-chat `messages.history` pass against the same JSON-RPC client used by `imsg watch`, replays each missed inbound row through the live dispatch path (allowlists, group policy, debouncer, echo cache), and persists a per-account cursor so subsequent startups pick up where they left off. See [Catching up after gateway downtime](/channels/imessage#catching-up-after-gateway-downtime) for tuning.
224+
| Action | legacy BlueBubbles | bundled iMessage |
225+
| --------------------------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------- |
226+
| Send text / SMS fallback |||
227+
| Send media (photo, video, file, voice) |||
228+
| Threaded reply (`reply_to_guid`) || ✅ (closes [#51892](https://github.com/openclaw/openclaw/issues/51892)) |
229+
| Tapback (`react`) |||
230+
| Edit / unsend (macOS 13+ recipients) |||
231+
| Send with screen effect || ✅ (closes part of [#9394](https://github.com/openclaw/openclaw/issues/9394)) |
232+
| Rich text bold / italic / underline / strikethrough || ✅ (typed-run formatting via attributedBody) |
233+
| Rename group / set group icon |||
234+
| Add / remove participant, leave group |||
235+
| Read receipts and typing indicator || ✅ (gated on private API probe) |
236+
| Same-sender DM coalescing || ✅ (DM-only; opt-in via `channels.imessage.coalesceSameSenderDms`) |
237+
| Inbound recovery after a restart | ✅ (webhook replay + history fetch) | ✅ (automatic: replay missed via since_rowid + dedupe; wider window on local) |
238+
239+
iMessage recovers messages missed while the gateway was down: on startup it replays from the last dispatched rowid via `imsg watch.subscribe` `since_rowid` and dedupes by GUID, while a stale-backlog age fence suppresses the Push-flush "backlog bomb". This runs over the `imsg` RPC connection, so it works for remote SSH `cliPath` setups too; local setups get a wider recovery window because they can read `chat.db`. See [Inbound recovery after a bridge or gateway restart](/channels/imessage#inbound-recovery-after-a-bridge-or-gateway-restart).
240240

241241
## Pairing, sessions, and ACP bindings
242242

docs/channels/imessage.md

Lines changed: 12 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ title: "iMessage"
99
<Note>
1010
For OpenClaw iMessage deployments, use `imsg` on a signed-in macOS Messages host. If your Gateway runs on Linux or Windows, point `channels.imessage.cliPath` at an SSH wrapper that runs `imsg` on the Mac.
1111

12-
**Gateway-downtime catchup is opt-in.** When enabled (`channels.imessage.catchup.enabled: true`), the gateway replays inbound messages that landed in `chat.db` while it was offline (crash, restart, Mac sleep) on next startup. Disabled by default — see [Catching up after gateway downtime](#catching-up-after-gateway-downtime). Closes [openclaw#78649](https://github.com/openclaw/openclaw/issues/78649).
12+
**Inbound recovery is automatic.** After a bridge or gateway restart, iMessage replays the messages missed while it was down and suppresses the stale "backlog bomb" Apple can flush after a Push recovery, deduping so nothing is dispatched twice. There is no config to enable — see [Inbound recovery after a bridge or gateway restart](#inbound-recovery-after-a-bridge-or-gateway-restart).
1313
</Note>
1414

1515
<Warning>
@@ -725,67 +725,27 @@ The "Flag on" column shows behavior on an `imsg` build that emits `balloon_bundl
725725
| Rapid flood (>10 small DMs inside window) | N rows without URL balloon metadata | N turns | N turns (legacy merge on metadata-less builds) |
726726
| Two people typing in a group chat | N rows from M senders | M+ turns (one per sender bucket) | M+ turns — group chats are not coalesced |
727727

728-
## Catching up after gateway downtime
728+
## Inbound recovery after a bridge or gateway restart
729729

730-
When the gateway is offline (crash, restart, Mac sleep, machine off), `imsg watch` resumes from the current `chat.db` state once the gateway comes back up — anything that arrived during the gap is, by default, never seen. Catchup replays those messages on the next startup so the agent does not silently miss inbound traffic.
730+
iMessage recovers messages missed while the gateway was down, and at the same time suppresses the stale "backlog bomb" Apple can flush after a Push recovery. The default behavior is always on, built on the inbound dedupe.
731731

732-
Catchup is **disabled by default**. Enable it per channel:
732+
- **Replay dedupe.** Every dispatched inbound message is recorded by its Apple GUID in persistent plugin state (`imessage.inbound-dedupe`), claimed at ingestion and committed after handling (released on a transient failure so it can retry). Anything already handled is dropped instead of dispatched twice. This is what lets recovery replay aggressively without per-message bookkeeping.
733+
- **Downtime recovery.** On startup the monitor remembers the last dispatched `chat.db` rowid (a persisted per-account cursor) and passes it to `imsg watch.subscribe` as `since_rowid`, so imsg replays the rows that landed while the gateway was down, then tails live. Replay is bounded to the most recent rows and to messages up to ~2 hours old, and the dedupe drops anything already handled.
734+
- **Stale-backlog age fence.** Rows above the startup boundary are genuinely live; one whose send date is more than ~15 minutes older than its arrival is the Push-flush backlog and is suppressed. Replayed rows (at or below the boundary) use the wider recovery window instead, so a recently-missed message is delivered while ancient history is not.
733735

734-
```ts
735-
channels: {
736-
imessage: {
737-
catchup: {
738-
enabled: true, // master switch (default: false)
739-
maxAgeMinutes: 120, // skip rows older than now - 2h (default: 120, clamp 1..720)
740-
perRunLimit: 50, // max rows replayed per startup (default: 50, clamp 1..500)
741-
firstRunLookbackMinutes: 30, // first run with no cursor: look back 30 min (default: 30)
742-
maxFailureRetries: 10, // give up on a wedged guid after 10 dispatch failures (default: 10)
743-
},
744-
},
745-
}
746-
```
736+
Recovery works over both local and remote `cliPath` setups, because `since_rowid` replay runs over the same `imsg` RPC connection. The difference is the window: when the gateway can read `chat.db` (local), it anchors the startup rowid boundary, caps the replay span, and delivers missed messages up to a couple of hours old. Over a remote SSH `cliPath` it cannot read the database, so the replay is uncapped and every row uses the live age fence — it still recovers recently-missed messages and still suppresses old backlog, just with the narrower live window. Run the gateway on the Messages Mac for the wider recovery window.
747737

748-
### How it runs
738+
### Operator-visible signal
749739

750-
One pass per `monitorIMessageProvider` startup, sequenced as `imsg launch` ready → `watch.subscribe``performIMessageCatchup` → live dispatch loop. Catchup itself uses `chats.list` + per-chat `messages.history` against the same JSON-RPC client used by `imsg watch`. Anything that arrives during the catchup pass flows through live dispatch normally; the existing inbound-dedupe cache absorbs any overlap with replayed rows.
740+
Suppressed backlog is logged at the default level, never silently dropped (the `recovery` flag shows which window applied):
751741

752-
Each replayed row is fed through the live dispatch path (`evaluateIMessageInbound` + `dispatchInboundMessage`), so allowlists, group policy, debouncer, echo cache, and read receipts behave identically on replayed and live messages.
753-
754-
### Cursor and retry semantics
755-
756-
Catchup keeps a per-account cursor in SQLite plugin state:
757-
758-
```json
759-
{
760-
"lastSeenMs": 1717900800000,
761-
"lastSeenRowid": 482910,
762-
"updatedAt": 1717900801234,
763-
"failureRetries": { "<guid>": 1 }
764-
}
765742
```
766-
767-
- The cursor advances on each successful dispatch and is held when a row's dispatch throws — the next startup retries the same row from the held cursor.
768-
- After the startup catchup query succeeds, later live-handled rows also advance the same cursor so a gateway restart does not replay messages that were already handled live. Live cursor writes do not jump past catchup failures that are still below `maxFailureRetries`.
769-
- After `maxFailureRetries` consecutive throws against the same `guid`, catchup logs a `warn` and force-advances the cursor past the wedged message so subsequent startups can make progress.
770-
- Already-given-up guids are skipped on sight (no dispatch attempt) on later runs and counted under `skippedGivenUp` in the run summary.
771-
- `openclaw doctor --fix` imports legacy `<openclawStateDir>/imessage/catchup/*.json` cursor files into SQLite plugin state and archives the old files.
772-
773-
### Operator-visible signals
774-
743+
imessage: suppressed stale inbound backlog account=<id> sent=<iso> recovery=<bool> (<N> suppressed since start)
775744
```
776-
imessage catchup: replayed=N skippedFromMe=… skippedGivenUp=… failed=… givenUp=… fetchedCount=…
777-
imessage catchup: giving up on guid=<guid> after <N> failures; advancing cursor past it
778-
imessage catchup: fetched <X> rows across chats, capped to perRunLimit=<Y>
779-
```
780-
781-
A `WARN ... capped to perRunLimit` line means a single startup did not drain the full backlog. Raise `perRunLimit` (max 500) if your gaps regularly exceed the default 50-row pass.
782-
783-
### When to leave it off
784745

785-
- Gateway runs continuously with watchdog auto-restart and gaps are always < a few seconds — the default of off is fine.
786-
- DM volume is low and missed messages would not change agent behavior — the `firstRunLookbackMinutes` initial window can dispatch surprising old context on first enable.
746+
### Migration
787747

788-
When you turn catchup on, the first startup with no cursor only looks back `firstRunLookbackMinutes` (30 min default), not the full `maxAgeMinutes` window — this avoids replaying a long history of pre-enable messages.
748+
`channels.imessage.catchup.*` is deprecated — downtime recovery is now automatic and needs no config for new setups. Existing configs with `catchup.enabled: true` remain honored as a compatibility profile for the recovery replay window. Disabled catchup blocks (`enabled: false` or no `enabled: true`) are retired; `openclaw doctor --fix` removes those.
789749

790750
## Troubleshooting
791751

docs/gateway/config-channels.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -624,9 +624,6 @@ Before relying on an SSH wrapper for production sends, verify an outbound `imsg
624624
sendWithEffect: true,
625625
sendAttachment: true,
626626
},
627-
catchup: {
628-
enabled: false,
629-
},
630627
},
631628
},
632629
}
@@ -642,7 +639,7 @@ Before relying on an SSH wrapper for production sends, verify an outbound `imsg
642639
- `channels.imessage.configWrites`: allow or deny iMessage-initiated config writes.
643640
- `channels.imessage.actions.*`: enable private API actions that are also gated by `imsg status` / `openclaw channels status --probe`.
644641
- `channels.imessage.includeAttachments` is off by default; set it to `true` before expecting inbound media in agent turns.
645-
- `channels.imessage.catchup.enabled`: opt in to replaying inbound messages that arrived while the Gateway was down.
642+
- Inbound recovery after a bridge/gateway restart is automatic (GUID dedupe plus a stale-backlog age fence). Existing `channels.imessage.catchup.enabled: true` configs are still honored as a deprecated compatibility profile.
646643
- `channels.imessage.groups`: group registry and per-group settings. With `groupPolicy: "allowlist"`, configure either explicit `chat_id` keys or a `"*"` wildcard entry so group messages can pass the registry gate.
647644
- Top-level `bindings[]` entries with `type: "acp"` can bind iMessage conversations to persistent ACP sessions. Use a normalized handle or explicit chat target (`chat_id:*`, `chat_guid:*`, `chat_identifier:*`) in `match.peer.id`. Shared field semantics: [ACP Agents](/tools/acp-agents#persistent-channel-bindings).
648645

0 commit comments

Comments
 (0)