You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(imessage): always-on inbound recovery and dedupe (#91335)
* feat(imessage): always-on inbound recovery, deprecate catchup
Replaces the opt-in catchup subsystem with always-on inbound replay
protection that brings iMessage in line with the other channels, and
fixes#89237 (stale backlog dispatched as fresh after bridge recovery).
- New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/
release) plus a stale-backlog age fence that suppresses live rows whose
send date is materially older than arrival (logged, never silent).
- monitor-provider: claim at ingestion, carry the exact claimed key on the
debouncer entry, commit on successful flush / release on dispatch failure
(per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the
local startup since_rowid watermark so startup-window rows are not skipped.
- Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the
channels.imessage.catchup schema, cursor migration, and config-guard nag.
Back-compat: strip the retired key before validation; new imessage doctor
contract reports + removes it on doctor --fix.
- Docs updated for the new recovery model.
Net -947 prod LOC.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(imessage): recover downtime messages via since_rowid replay
Builds downtime recovery on the new inbound dedupe instead of restoring the
old catchup subsystem. On startup the monitor passes the last dispatched rowid
(a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg
replays the messages that landed while the gateway was down, then tails live.
The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping
is needed.
- recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid.
- monitor-provider: since_rowid = cursor (capped to the most recent
IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary
so replayed rows (<= boundary) use the wider recovery window and live rows
(> boundary) keep the tight #89237 fence; advance the cursor on commit.
- Local only: remote SSH cliPath cannot read chat.db, so it tails from the
current rowid (suppress-and-move-on) as before.
Restores missed-message recovery that the catchup removal dropped, with no
config and a fraction of the old LOC.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(imessage): make recovery cursor advance failure- and suppression-safe
Addresses two cursor-state regressions in the downtime-recovery path:
- Failed replay rows could be skipped forever: a released (failed) row keeps
its dedupe claim for retry, but a later successful row in the same flush
advanced the cursor past it, so the next startup's since_rowid skipped it.
Hold a per-session floor at the lowest released rowid and never advance the
cursor past it.
- Suppressed live backlog could be re-delivered after a restart: a live row
suppressed under the tight live fence was not recorded, so after a restart it
fell under the wider recovery window (its rowid now below the new boundary)
and was delivered. Commit its dedupe key on suppression so the recovery
replay treats it as already handled.
Both caught by Codex autoreview. Adds regression tests for the floor and the
suppression record.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(imessage): bound the GUID-less replay key length
Hash the composite fallback key's variable parts (conversation, sender,
created_at, text) so the key is length-bounded regardless of message text.
The persistent dedupe store already hashes keys internally, so this was not a
live overflow, but the bounded key removes the dependency on that and keeps the
fallback fail-open. Flagged by autoreview.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(imessage): recover downtime messages on remote cliPath setups too
The since_rowid replay runs over the imsg RPC client, so driving it from the
persisted recovery cursor (not the local chat.db boundary) makes downtime
recovery work for remote SSH cliPath gateways — the topology the old RPC-based
catchup served and that the rowid-boundary-only version regressed. Local setups
keep the wider, capped recovery window via the chat.db boundary; remote uses the
live age-fence window. Flagged by autoreview.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(imessage): seed recovery cursor from retired catchup cursor on upgrade
A one-time, self-cleaning migration: when the recovery cursor is empty on the
first startup after upgrade, seed it from the retired imessage.catchup-cursors
lastSeenRowid and consume the legacy entry. Without this a user who had catchup
enabled would not replay messages missed across the upgrade restart. Flagged by
autoreview.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(imessage): preserve catchup recovery on upgrade
---------
Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
|Catchup of inbound messages received while gateway is down | ✅ (webhook replay + history fetch) | ✅ (opt-in via `channels.imessage.catchup.enabled`; closes [#78649](https://github.com/openclaw/openclaw/issues/78649)) |
238
-
239
-
iMessage catchup is now available as an opt-in feature on the bundled plugin. On gateway startup, if `channels.imessage.catchup.enabled` is `true`, the gateway runs one `chats.list` + per-chat `messages.history` pass against the same JSON-RPC client used by `imsg watch`, replays each missed inbound row through the live dispatch path (allowlists, group policy, debouncer, echo cache), and persists a per-account cursor so subsequent startups pick up where they left off. See [Catching up after gateway downtime](/channels/imessage#catching-up-after-gateway-downtime) for tuning.
|Inbound recovery after a restart | ✅ (webhook replay + history fetch) | ✅ (automatic: replay missed via since_rowid + dedupe; wider window on local) |
238
+
239
+
iMessage recovers messages missed while the gateway was down: on startup it replays from the last dispatched rowid via `imsg watch.subscribe``since_rowid` and dedupes by GUID, while a stale-backlog age fence suppresses the Push-flush "backlog bomb". This runs over the `imsg` RPC connection, so it works for remote SSH `cliPath` setups too; local setups get a wider recovery window because they can read `chat.db`. See [Inbound recovery after a bridge or gateway restart](/channels/imessage#inbound-recovery-after-a-bridge-or-gateway-restart).
Copy file name to clipboardExpand all lines: docs/channels/imessage.md
+12-52Lines changed: 12 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ title: "iMessage"
9
9
<Note>
10
10
For OpenClaw iMessage deployments, use `imsg` on a signed-in macOS Messages host. If your Gateway runs on Linux or Windows, point `channels.imessage.cliPath` at an SSH wrapper that runs `imsg` on the Mac.
11
11
12
-
**Gateway-downtime catchup is opt-in.**When enabled (`channels.imessage.catchup.enabled: true`), the gateway replays inbound messages that landed in `chat.db`while it was offline (crash, restart, Mac sleep) on next startup. Disabled by default — see [Catching up after gateway downtime](#catching-up-after-gateway-downtime). Closes [openclaw#78649](https://github.com/openclaw/openclaw/issues/78649).
12
+
**Inbound recovery is automatic.**After a bridge or gateway restart, iMessage replays the messages missed while it was down and suppresses the stale "backlog bomb" Apple can flush after a Push recovery, deduping so nothing is dispatched twice. There is no config to enable — see [Inbound recovery after a bridge or gateway restart](#inbound-recovery-after-a-bridge-or-gateway-restart).
13
13
</Note>
14
14
15
15
<Warning>
@@ -725,67 +725,27 @@ The "Flag on" column shows behavior on an `imsg` build that emits `balloon_bundl
725
725
| Rapid flood (>10 small DMs inside window) | N rows without URL balloon metadata | N turns | N turns (legacy merge on metadata-less builds) |
726
726
| Two people typing in a group chat | N rows from M senders | M+ turns (one per sender bucket) | M+ turns — group chats are not coalesced |
727
727
728
-
## Catching up after gateway downtime
728
+
## Inbound recovery after a bridge or gateway restart
729
729
730
-
When the gateway is offline (crash, restart, Mac sleep, machine off), `imsg watch` resumes from the current `chat.db` state once the gateway comes back up — anything that arrived during the gap is, by default, never seen. Catchup replays those messages on the next startup so the agent does not silently miss inbound traffic.
730
+
iMessage recovers messages missed while the gateway was down, and at the same time suppresses the stale "backlog bomb" Apple can flush after a Push recovery. The default behavior is always on, built on the inbound dedupe.
731
731
732
-
Catchup is **disabled by default**. Enable it per channel:
732
+
-**Replay dedupe.** Every dispatched inbound message is recorded by its Apple GUID in persistent plugin state (`imessage.inbound-dedupe`), claimed at ingestion and committed after handling (released on a transient failure so it can retry). Anything already handled is dropped instead of dispatched twice. This is what lets recovery replay aggressively without per-message bookkeeping.
733
+
-**Downtime recovery.** On startup the monitor remembers the last dispatched `chat.db` rowid (a persisted per-account cursor) and passes it to `imsg watch.subscribe` as `since_rowid`, so imsg replays the rows that landed while the gateway was down, then tails live. Replay is bounded to the most recent rows and to messages up to ~2 hours old, and the dedupe drops anything already handled.
734
+
-**Stale-backlog age fence.** Rows above the startup boundary are genuinely live; one whose send date is more than ~15 minutes older than its arrival is the Push-flush backlog and is suppressed. Replayed rows (at or below the boundary) use the wider recovery window instead, so a recently-missed message is delivered while ancient history is not.
733
735
734
-
```ts
735
-
channels: {
736
-
imessage: {
737
-
catchup: {
738
-
enabled: true, // master switch (default: false)
739
-
maxAgeMinutes: 120, // skip rows older than now - 2h (default: 120, clamp 1..720)
740
-
perRunLimit: 50, // max rows replayed per startup (default: 50, clamp 1..500)
741
-
firstRunLookbackMinutes: 30, // first run with no cursor: look back 30 min (default: 30)
742
-
maxFailureRetries: 10, // give up on a wedged guid after 10 dispatch failures (default: 10)
743
-
},
744
-
},
745
-
}
746
-
```
736
+
Recovery works over both local and remote `cliPath` setups, because `since_rowid` replay runs over the same `imsg` RPC connection. The difference is the window: when the gateway can read `chat.db` (local), it anchors the startup rowid boundary, caps the replay span, and delivers missed messages up to a couple of hours old. Over a remote SSH `cliPath` it cannot read the database, so the replay is uncapped and every row uses the live age fence — it still recovers recently-missed messages and still suppresses old backlog, just with the narrower live window. Run the gateway on the Messages Mac for the wider recovery window.
747
737
748
-
### How it runs
738
+
### Operator-visible signal
749
739
750
-
One pass per `monitorIMessageProvider` startup, sequenced as `imsg launch` ready → `watch.subscribe` → `performIMessageCatchup` → live dispatch loop. Catchup itself uses `chats.list` + per-chat `messages.history` against the same JSON-RPC client used by `imsg watch`. Anything that arrives during the catchup pass flows through live dispatch normally; the existing inbound-dedupe cache absorbs any overlap with replayed rows.
740
+
Suppressed backlog is logged at the default level, never silently dropped (the `recovery` flag shows which window applied):
751
741
752
-
Each replayed row is fed through the live dispatch path (`evaluateIMessageInbound` + `dispatchInboundMessage`), so allowlists, group policy, debouncer, echo cache, and read receipts behave identically on replayed and live messages.
753
-
754
-
### Cursor and retry semantics
755
-
756
-
Catchup keeps a per-account cursor in SQLite plugin state:
757
-
758
-
```json
759
-
{
760
-
"lastSeenMs": 1717900800000,
761
-
"lastSeenRowid": 482910,
762
-
"updatedAt": 1717900801234,
763
-
"failureRetries": { "<guid>": 1 }
764
-
}
765
742
```
766
-
767
-
- The cursor advances on each successful dispatch and is held when a row's dispatch throws — the next startup retries the same row from the held cursor.
768
-
- After the startup catchup query succeeds, later live-handled rows also advance the same cursor so a gateway restart does not replay messages that were already handled live. Live cursor writes do not jump past catchup failures that are still below `maxFailureRetries`.
769
-
- After `maxFailureRetries` consecutive throws against the same `guid`, catchup logs a `warn` and force-advances the cursor past the wedged message so subsequent startups can make progress.
770
-
- Already-given-up guids are skipped on sight (no dispatch attempt) on later runs and counted under `skippedGivenUp` in the run summary.
771
-
-`openclaw doctor --fix` imports legacy `<openclawStateDir>/imessage/catchup/*.json` cursor files into SQLite plugin state and archives the old files.
772
-
773
-
### Operator-visible signals
774
-
743
+
imessage: suppressed stale inbound backlog account=<id> sent=<iso> recovery=<bool> (<N> suppressed since start)
imessage catchup: giving up on guid=<guid> after <N> failures; advancing cursor past it
778
-
imessage catchup: fetched <X> rows across chats, capped to perRunLimit=<Y>
779
-
```
780
-
781
-
A `WARN ... capped to perRunLimit` line means a single startup did not drain the full backlog. Raise `perRunLimit` (max 500) if your gaps regularly exceed the default 50-row pass.
782
-
783
-
### When to leave it off
784
745
785
-
- Gateway runs continuously with watchdog auto-restart and gaps are always < a few seconds — the default of off is fine.
786
-
- DM volume is low and missed messages would not change agent behavior — the `firstRunLookbackMinutes` initial window can dispatch surprising old context on first enable.
746
+
### Migration
787
747
788
-
When you turn catchup on, the first startup with no cursor only looks back `firstRunLookbackMinutes` (30 min default), not the full `maxAgeMinutes` window — this avoids replaying a long history of pre-enable messages.
748
+
`channels.imessage.catchup.*` is deprecated — downtime recovery is now automatic and needs no config for new setups. Existing configs with `catchup.enabled: true` remain honored as a compatibility profile for the recovery replay window. Disabled catchup blocks (`enabled: false` or no `enabled: true`) are retired; `openclaw doctor --fix` removes those.
Copy file name to clipboardExpand all lines: docs/gateway/config-channels.md
+1-4Lines changed: 1 addition & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -624,9 +624,6 @@ Before relying on an SSH wrapper for production sends, verify an outbound `imsg
624
624
sendWithEffect:true,
625
625
sendAttachment:true,
626
626
},
627
-
catchup: {
628
-
enabled:false,
629
-
},
630
627
},
631
628
},
632
629
}
@@ -642,7 +639,7 @@ Before relying on an SSH wrapper for production sends, verify an outbound `imsg
642
639
-`channels.imessage.configWrites`: allow or deny iMessage-initiated config writes.
643
640
-`channels.imessage.actions.*`: enable private API actions that are also gated by `imsg status` / `openclaw channels status --probe`.
644
641
-`channels.imessage.includeAttachments` is off by default; set it to `true` before expecting inbound media in agent turns.
645
-
-`channels.imessage.catchup.enabled`: opt in to replaying inbound messages that arrived while the Gateway was down.
642
+
-Inbound recovery after a bridge/gateway restart is automatic (GUID dedupe plus a stale-backlog age fence). Existing `channels.imessage.catchup.enabled: true` configs are still honored as a deprecated compatibility profile.
646
643
-`channels.imessage.groups`: group registry and per-group settings. With `groupPolicy: "allowlist"`, configure either explicit `chat_id` keys or a `"*"` wildcard entry so group messages can pass the registry gate.
647
644
- Top-level `bindings[]` entries with `type: "acp"` can bind iMessage conversations to persistent ACP sessions. Use a normalized handle or explicit chat target (`chat_id:*`, `chat_guid:*`, `chat_identifier:*`) in `match.peer.id`. Shared field semantics: [ACP Agents](/tools/acp-agents#persistent-channel-bindings).
0 commit comments