Skip to content

Commit 81e0a1a

Browse files
omarshahineclaude
andauthored
feat(imessage): inbound catchup (cursor + replay loop + monitor wiring) (#79387)
Closes #78649. Adds opt-in inbound iMessage catchup that recovers messages landing in chat.db while the gateway is offline (crash, restart, mac sleep). Mirrors the design of the retired BlueBubbles catchup, adapted for the imsg JSON-RPC chats.list + messages.history fetch path. - Schema: new channels.imessage.catchup block with enabled / maxAgeMinutes (1..720) / perRunLimit (1..500) / firstRunLookbackMinutes (1..720) / maxFailureRetries (1..1000). Disabled by default — opt-in. - Cursor + replay loop (extensions/imessage/src/monitor/catchup.ts): per-account state under <openclawStateDir>/imessage/catchup/. Walks rows oldest-first, advances on success/give-up, holds at failed.rowid - 1 when a failure is below maxFailureRetries (cannot leapfrog held failures even when later rows in the same batch succeed). Watermark floor for parse-rejected rows. - Bridge (extensions/imessage/src/monitor/catchup-bridge.ts): live chats.list + per-chat messages.history fetch adapter; dispatch adapter routes through the live handleMessageNow path so allowlists / group policy / dedupe / echo cache behave identically on replayed and live messages. Watermark clamped to last dispatched rowid when the cap truncates. - Monitor wiring (extensions/imessage/src/monitor/monitor-provider.ts): catchup runs once between watch.subscribe and the live dispatch loop when enabled. Bypasses the inbound debouncer for serial per-row dispatch. - Echo-cache TTL bumped 2 min → 12 h so own outbound rows from before a gap are not re-fed as inbound on replay. - Generated bundled-channel-config-metadata.generated.ts so the runtime AJV schema accepts the new catchup block. - Docs: new "Catching up after gateway downtime" section + BlueBubbles migration parity update. Tests: 322/322 in extensions/imessage/, including 5 regression tests covering the cursor-leapfrog, parse-rejected stall, watermark vs held failure, and cap-truncation-cursor-floor edge cases that codex (gpt-5.4) and clawsweeper (gpt-5.5) found during review. Live-tested end-to-end against the running gateway: replayed=1 fetchedCount=1, agent reply observed, cursor persisted at the test row's exact rowid. Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8989d0a commit 81e0a1a

14 files changed

Lines changed: 1807 additions & 32 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,7 @@ Docs: https://docs.openclaw.ai
193193
- Plugin SDK: add a generic `api.runtime.llm.complete` host completion helper with runtime-derived caller attribution, config-gated model/agent overrides, session-bound context-engine access, request-scoped config, audit metadata, and normalized usage attribution. (#64294) Thanks @DaevMithran.
194194
- Control UI/exec approvals: highlight parsed shell command fragments that may deserve extra review in approval prompts. (#77153) Thanks @jesse-merhi.
195195
- Channels/iMessage: honor `channels.imessage.groups.<chat_id>.systemPrompt` (and the `groups["*"]` wildcard) by forwarding it as `GroupSystemPrompt` on inbound group turns, mirroring the byte-identical resolver semantic from WhatsApp where defining the key as an empty string on a specific group suppresses the wildcard fallback. Brings iMessage to parity with the per-group `systemPrompt` pattern already supported by Discord, Telegram, IRC, Slack, GoogleChat, and the retired BlueBubbles channel. Fixes #78285. (#79383) Thanks @omarshahine.
196+
- iMessage: add opt-in inbound catchup that replays messages received while the gateway was offline (crash, restart, mac sleep) on next startup. Enable with `channels.imessage.catchup.enabled: true`; tunables for `maxAgeMinutes`, `perRunLimit`, `firstRunLookbackMinutes`, and `maxFailureRetries`. Persists a per-account cursor under the OpenClaw state dir (`<openclawStateDir>/imessage/catchup/`), replays each row through the live dispatch path so allowlists/group policy/dedupe behave identically on replayed and live messages, and force-advances past wedged guids after `maxFailureRetries` to prevent stuck cursors. Extends the persisted echo-cache retention window so the agent's own outbound rows from before a gap are not re-fed as inbound on replay. Includes a regenerated `src/config/bundled-channel-config-metadata.generated.ts` so the runtime AJV schema accepts the new `channels.imessage.catchup` block. Fixes #78649. (#79387) Thanks @omarshahine.
196197

197198
### Breaking
198199

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
91480b7bb68280f5b762f4352e456b294d673efcb3989874f70f618714985c71 config-baseline.json
2-
7c4f1417784024d6942de993f1b4dcb9f20c82cec7674047d6b351ab1f586fde config-baseline.core.json
3-
d851534e7f7f44b427d7fa82b7ad287349f069461e3569d23583929611821c31 config-baseline.channel.json
1+
c8a698cf0968fe5b27b2bbc798d3c811ba989a7207ed372cbfd95965c894f65b config-baseline.json
2+
67c7db6eeb7f74dd454118e17304c5486ab59d33e7899c501b003c326d35db0f config-baseline.core.json
3+
e3160218e86959dfa00f35b8b9eca85c3bf436d83dbbe3e7204247dcb692f0a1 config-baseline.channel.json
44
7a9ed89a6ff7e578bfcab7828ab660af59e62402a85bfbfc05d5ae3d975e9728 config-baseline.plugin.json

docs/channels/imessage-from-bluebubbles.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -205,22 +205,22 @@ If the gateway logs `imessage: dropping group message from chat_id=<id>` or the
205205

206206
## Action parity at a glance
207207

208-
| Action | legacy BlueBubbles | bundled iMessage |
209-
| ---------------------------------------------------------- | ----------------------------------- | ------------------------------------------------------------------------------------ |
210-
| Send text / SMS fallback |||
211-
| Send media (photo, video, file, voice) |||
212-
| Threaded reply (`reply_to_guid`) || ✅ (closes [#51892](https://github.com/openclaw/openclaw/issues/51892)) |
213-
| Tapback (`react`) |||
214-
| Edit / unsend (macOS 13+ recipients) |||
215-
| Send with screen effect || ✅ (closes part of [#9394](https://github.com/openclaw/openclaw/issues/9394)) |
216-
| Rich text bold / italic / underline / strikethrough || ✅ (typed-run formatting via attributedBody) |
217-
| Rename group / set group icon |||
218-
| Add / remove participant, leave group |||
219-
| Read receipts and typing indicator || ✅ (gated on private API probe) |
220-
| Same-sender DM coalescing || ✅ (DM-only; opt-in via `channels.imessage.coalesceSameSenderDms`) |
221-
| Catchup of inbound messages received while gateway is down | ✅ (webhook replay + history fetch) | _(not yet — tracked at [#78649](https://github.com/openclaw/openclaw/issues/78649))_ |
222-
223-
The catchup gap is the most operationally significant one for production deployments: planned restarts, mac sleep, or an unexpected gateway crash that takes more than a few seconds will silently drop any inbound iMessage traffic that arrives during the gap when running on bundled iMessage. BlueBubbles' webhook + history-fetch flow recovered those messages on reconnect, but BlueBubbles is no longer supported. There is no supported migration path that preserves catchup today; wait for [#78649](https://github.com/openclaw/openclaw/issues/78649).
208+
| Action | legacy BlueBubbles | bundled iMessage |
209+
| ---------------------------------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
210+
| Send text / SMS fallback || |
211+
| Send media (photo, video, file, voice) || |
212+
| Threaded reply (`reply_to_guid`) || ✅ (closes [#51892](https://github.com/openclaw/openclaw/issues/51892)) |
213+
| Tapback (`react`) || |
214+
| Edit / unsend (macOS 13+ recipients) || |
215+
| Send with screen effect || ✅ (closes part of [#9394](https://github.com/openclaw/openclaw/issues/9394)) |
216+
| Rich text bold / italic / underline / strikethrough || ✅ (typed-run formatting via attributedBody) |
217+
| Rename group / set group icon || |
218+
| Add / remove participant, leave group || |
219+
| Read receipts and typing indicator || ✅ (gated on private API probe) |
220+
| Same-sender DM coalescing || ✅ (DM-only; opt-in via `channels.imessage.coalesceSameSenderDms`) |
221+
| Catchup of inbound messages received while gateway is down | ✅ (webhook replay + history fetch) | ✅ (opt-in via `channels.imessage.catchup.enabled`; closes [#78649](https://github.com/openclaw/openclaw/issues/78649)) |
222+
223+
iMessage catchup is now available as an opt-in feature on the bundled plugin. On gateway startup, if `channels.imessage.catchup.enabled` is `true`, the gateway runs one `chats.list` + per-chat `messages.history` pass against the same JSON-RPC client used by `imsg watch`, replays each missed inbound row through the live dispatch path (allowlists, group policy, debouncer, echo cache), and persists a per-account cursor so subsequent startups pick up where they left off. See [Catching up after gateway downtime](/channels/imessage#catching-up-after-gateway-downtime) for tuning.
224224

225225
## Pairing, sessions, and ACP bindings
226226

docs/channels/imessage.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ title: "iMessage"
99
<Note>
1010
For OpenClaw iMessage deployments, use `imsg` on a signed-in macOS Messages host. If your Gateway runs on Linux or Windows, point `channels.imessage.cliPath` at an SSH wrapper that runs `imsg` on the Mac.
1111

12-
**Known gap: no gateway-downtime catchup.** Messages that arrive while the gateway is down (crash, restart, Mac sleep, machine off) are not delivered to the agent once the gateway comes back up `imsg watch` resumes from the current state and ignores anything that landed in `chat.db` during the gap. Tracked at [openclaw#78649](https://github.com/openclaw/openclaw/issues/78649).
12+
**Gateway-downtime catchup is opt-in.** When enabled (`channels.imessage.catchup.enabled: true`), the gateway replays inbound messages that landed in `chat.db` while it was offline (crash, restart, Mac sleep) on next startup. Disabled by default — see [Catching up after gateway downtime](#catching-up-after-gateway-downtime). Closes [openclaw#78649](https://github.com/openclaw/openclaw/issues/78649).
1313
</Note>
1414

1515
<Warning>
@@ -634,6 +634,66 @@ The two rows arrive at OpenClaw ~0.8-2.0 s apart on most setups. Without coalesc
634634
| Rapid flood (>10 small DMs inside window) | N rows | N turns | One turn, bounded output (first + latest, text/attachment caps applied) |
635635
| Two people typing in a group chat | N rows from M senders | M+ turns (one per sender bucket) | M+ turns — group chats are not coalesced |
636636

637+
## Catching up after gateway downtime
638+
639+
When the gateway is offline (crash, restart, Mac sleep, machine off), `imsg watch` resumes from the current `chat.db` state once the gateway comes back up — anything that arrived during the gap is, by default, never seen. Catchup replays those messages on the next startup so the agent does not silently miss inbound traffic.
640+
641+
Catchup is **disabled by default**. Enable it per channel:
642+
643+
```ts
644+
channels: {
645+
imessage: {
646+
catchup: {
647+
enabled: true, // master switch (default: false)
648+
maxAgeMinutes: 120, // skip rows older than now - 2h (default: 120, clamp 1..720)
649+
perRunLimit: 50, // max rows replayed per startup (default: 50, clamp 1..500)
650+
firstRunLookbackMinutes: 30, // first run with no cursor: look back 30 min (default: 30)
651+
maxFailureRetries: 10, // give up on a wedged guid after 10 dispatch failures (default: 10)
652+
},
653+
},
654+
}
655+
```
656+
657+
### How it runs
658+
659+
One pass per `monitorIMessageProvider` startup, sequenced as `imsg launch` ready → `watch.subscribe``performIMessageCatchup` → live dispatch loop. Catchup itself uses `chats.list` + per-chat `messages.history` against the same JSON-RPC client used by `imsg watch`. Anything that arrives during the catchup pass flows through live dispatch normally; the existing inbound-dedupe cache absorbs any overlap with replayed rows.
660+
661+
Each replayed row is fed through the live dispatch path (`evaluateIMessageInbound` + `dispatchInboundMessage`), so allowlists, group policy, debouncer, echo cache, and read receipts behave identically on replayed and live messages.
662+
663+
### Cursor and retry semantics
664+
665+
Catchup keeps a per-account cursor at `<openclawStateDir>/imessage/catchup/<account>__<hash>.json` (the OpenClaw state dir defaults to `~/.openclaw`, overridable with `OPENCLAW_STATE_DIR`):
666+
667+
```json
668+
{
669+
"lastSeenMs": 1717900800000,
670+
"lastSeenRowid": 482910,
671+
"updatedAt": 1717900801234,
672+
"failureRetries": { "<guid>": 1 }
673+
}
674+
```
675+
676+
- The cursor advances on each successful dispatch and is held when a row's dispatch throws — the next startup retries the same row from the held cursor.
677+
- After `maxFailureRetries` consecutive throws against the same `guid`, catchup logs a `warn` and force-advances the cursor past the wedged message so subsequent startups can make progress.
678+
- Already-given-up guids are skipped on sight (no dispatch attempt) on later runs and counted under `skippedGivenUp` in the run summary.
679+
680+
### Operator-visible signals
681+
682+
```
683+
imessage catchup: replayed=N skippedFromMe=… skippedGivenUp=… failed=… givenUp=… fetchedCount=…
684+
imessage catchup: giving up on guid=<guid> after <N> failures; advancing cursor past it
685+
imessage catchup: fetched <X> rows across chats, capped to perRunLimit=<Y>
686+
```
687+
688+
A `WARN ... capped to perRunLimit` line means a single startup did not drain the full backlog. Raise `perRunLimit` (max 500) if your gaps regularly exceed the default 50-row pass.
689+
690+
### When to leave it off
691+
692+
- Gateway runs continuously with watchdog auto-restart and gaps are always < a few seconds — the default of off is fine.
693+
- DM volume is low and missed messages would not change agent behavior — the `firstRunLookbackMinutes` initial window can dispatch surprising old context on first enable.
694+
695+
When you turn catchup on, the first startup with no cursor only looks back `firstRunLookbackMinutes` (30 min default), not the full `maxAgeMinutes` window — this avoids replaying a long history of pre-enable messages.
696+
637697
## Troubleshooting
638698

639699
<AccordionGroup>

0 commit comments

Comments
 (0)