fix(imessage): always-on inbound recovery and dedupe by omarshahine · Pull Request #91335 · openclaw/openclaw

omarshahine · 2026-06-08T05:39:36Z

Summary

What problem does this PR solve?

After a bridge/Push recovery, Apple writes the messages that queued during the gap into chat.db with fresh ROWIDs but old send dates; imsg watch emits them and OpenClaw dispatched them as fresh user requests (iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237, "backlog bomb"). Separately, iMessage was the only channel with no inbound replay protection, and messages that arrived while the gateway was down were either lost (catchup off) or recovered by a heavy opt-in subsystem.

Why does this matter now?

iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 is a P1 message-integrity bug. The fix also lets a much smaller, always-on design replace the ~850-LOC opt-in catchup subsystem.

What is the intended outcome?

Persistent claimable GUID dedupe (createClaimableDedupe, the same primitive whatsapp/discord/signal/mattermost/zalo/line/nextcloud-talk use): claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush, release on dispatch failure (per dispatch unit, so a coalesced bucket cannot strand a sibling's claim).
Downtime recovery, rebuilt on the dedupe. On startup the monitor passes the last dispatched rowid (a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg replays the rows that landed while the gateway was down, then tails live. The dedupe drops anything already handled, so none of the old cursor/messages.history/retry bookkeeping is needed.
Stale-backlog age fence, split on the startup rowid boundary. Replayed rows (at/below the boundary) use a wide recovery window; genuinely-live rows (above it) use the tight live fence where iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237's Push-flush backlog appears.

What is intentionally out of scope?

The old chats.list + messages.history catchup mechanism (deleted). Recovery needs local chat.db access; over a remote SSH cliPath the monitor tails from the current rowid (suppress-and-move-on).

What does success look like?

Missed-during-downtime messages are replayed; the Push-flush backlog is suppressed; duplicates are deduped; net -710 prod LOC.

What should reviewers focus on?

The claim/commit/release wiring (per-unit commit so coalesced GUID-less rows cannot leak a claim), the rowid-boundary split of the age fence, and the catchup config retirement + back-compat.

Linked context

Which issue does this close?

Closes #89237

Which issues, PRs, or discussions are related?

Related #91243 (supersedes the catchup half).

Was this requested by a maintainer or owner?

Maintainer-authored; fixes a P1 issue-rating: diamond lobster report.

Real behavior proof (required for external PRs)

Behavior or issue addressed: (a) iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 stale-backlog suppression on the live path, and (b) recovery of messages missed while the gateway was down.
Real environment tested: the patched gateway built from this branch, run from source against real imsg 0.11.0 / Messages.app. Empty allowlist so the test gateway sent nothing.
Exact steps and evidence (redacted live monitor output):
- iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 age fence (real imsg over a remote SSH cliPath, the exact setup iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 was reported on): a real inbound (age ~1.5s) under the 15min fence → CLAIMED+ENQUEUED (dispatched); with the fence forced to 1ms the same message → AGE-FENCE-SUPPRESSED (no enqueue). The fence gates exactly on send-date age.
- Downtime recovery (local imsg/Messages on the gateway Mac; messages scripted from a second Mac so it is end-to-end):
  - baseline message → dispatched, recovery cursor persisted at its rowid 127953
  - gateway stopped; two messages sent while down (chat.db max → 127955)
  - gateway restarted → startup cursor(L)=127953 boundary(M)=127955 since_rowid=127953, then inbound rowid=127954 recovery=true text="DOWNTIME MSG A (sent while gateway down)" DELIVERED and rowid=127955 recovery=true ... MSG B ... DELIVERED
Observed result: both messages sent while the gateway was down were replayed and delivered on restart (recovery=true); without the cursor/since_rowid they are skipped by imsg's self-fence. The iMessage bridge recovery can dispatch stale inbound backlog as fresh requests #89237 backlog (old date, live rowid) is still suppressed.
What was not tested: the spontaneous bridge-recovery Push-flush burst is not reproducible on demand; covered by unit tests + imsg source analysis. Remote-cliPath recovery is intentionally not implemented (no local db).
Proof limitations: [PROOF-...] instrumentation was added only on a throwaway local checkout to capture the traces and has been reverted; no host state was modified.

Tests and validation

Which commands did you run?

node scripts/run-vitest.mjs extensions/imessage (537 pass) + config-guard + config-validation
pnpm tsgo:core / tsgo:extensions / tsgo:extensions:test / tsgo:core:test (clean); oxlint / oxfmt clean; generated config artifacts + docs format in sync

What regression coverage was added or updated?

inbound-dedupe.test.ts (claim/commit/release, retry, in-flight duplicate, composite-key round-trip, age-fence threshold/fail-open), recovery-cursor.test.ts (load/advance/monotonic/per-account), doctor-contract-api.test.ts (catchup strip), monitor.last-route.test.ts (startup since_rowid = cursor, recovery replay delivered vs live old suppressed, stale-vs-fresh). Catchup tests deleted.

What failed before this fix, if known?

Stale backlog dispatched as fresh; no inbound dedupe; missed-during-downtime messages lost unless the opt-in catchup was enabled.

Risk checklist

Did user-visible behavior change? (Yes/No)

Yes. Inbound is deduped; the Push-flush backlog is suppressed (logged); downtime recovery is now automatic on local setups (was opt-in catchup).

Did config, environment, or migration behavior change? (Yes/No)

Yes. channels.imessage.catchup.* is retired; existing configs still load (key stripped before validation) and openclaw doctor --fix removes it via a new iMessage doctor contract that also reports it.

Did security, auth, secrets, network, or tool execution behavior change? (Yes/No)

No.

What is the highest-risk area?

The catchup config retirement and the claim/commit/release + rowid-boundary recovery wiring.

How is that risk mitigated?

Strip-before-validation (no load break) + doctor report/migrate; per-unit commit/release and the dual-threshold split are unit-tested; the dedupe backstops cursor imprecision; both behaviors were live-proven on real imsg/Messages.

Current review state

What is the next action?

Maintainer review; Greptile + CI.

What is still waiting on author, maintainer, CI, or external proof?

CI + Greptile.

Which bot or reviewer comments were addressed?

Codex autoreview flagged premature replay record, a check-then-record race, a coalesced GUID-less claim leak, and a startup-window message-loss window — all fixed; the branch-level Codex review came back clean.

clawsweeper · 2026-06-08T05:41:47Z

Codex review: needs real behavior proof before merge. Reviewed June 8, 2026, 3:46 AM ET / 07:46 UTC.

Summary
The PR adds iMessage persistent inbound dedupe, startup since_rowid downtime recovery, stale-backlog age fencing, catchup config/doctor cleanup, docs, and regression tests.

PR surface: Source +569, Tests +384, Docs -43, Generated 0. Total +910 across 18 files.

Reproducibility: yes. source-reproducible: current main's live iMessage monitor path has no persistent inbound replay guard or stale age fence, and upstream imsg starts from current max rowid when no cursor is supplied. I did not run a real Messages.app bridge recovery in this review.

Review metrics: 1 noteworthy metric.

iMessage config/default surface: 1 deprecated compatibility surface. channels.imessage.catchup.* remains schema-valid for enabled compatibility but disabled blocks are retired and removed by doctor, so upgrade behavior needs maintainer attention.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P1] Add redacted live remote cliPath recovery proof for the final since_rowid path.
Update the stale PR-body language that still says remote cliPath recovery is out of scope.
[P1] Get maintainer acceptance of the catchup compatibility/deprecation behavior before merge.

Proof guidance:

[P1] Needs stronger real behavior proof before merge: The PR has useful redacted live logs, but final-head remote cliPath since_rowid recovery still needs real proof; redact private identifiers, update the PR body, and ClawSweeper should re-review automatically or a maintainer can request @clawsweeper re-review.

Risk before merge

[P1] Final remote cliPath downtime recovery is not yet proven in a real remote setup on this head; the current proof covers remote age fencing and local downtime replay, not the final remote since_rowid replay path.
[P1] The PR changes a shipped channels.imessage.catchup.* surface: enabled catchup is preserved, but disabled blocks are retired and the new default remote recovery uses the narrower live fence unless legacy catchup is enabled.
[P1] The branch is stale relative to current main by one commit, so maintainers should refresh normal CI/merge checks before landing without treating stale-base noise as PR-owned code.

Maintainer options:

Require final remote recovery proof (recommended)
Ask for redacted live output showing a real remote cliPath gateway restarting from a persisted cursor and receiving the missed message through the final since_rowid path.
Accept the compatibility profile
Maintainers can accept that enabled catchup remains the wide-window compatibility path while disabled catchup blocks are retired by doctor and default remote recovery uses the 15-minute live fence.
Pause if remote defaults are not settled
If maintainers are not ready to narrow default remote recovery semantics, hold the PR until the remote catchup/deprecation policy is explicit.

Next step before merge

[P1] A maintainer needs to judge the compatibility/default behavior and require contributor-side real remote proof; this is not a safe automated repair lane.

Security
Cleared: The PR does not change workflows, dependencies, lockfiles, secrets handling, or other supply-chain surfaces; the security pass found no concrete concern.

Review details

Best possible solution:

Land the monitor-layer dedupe/recovery design only after the PR body is updated with final-head remote cliPath recovery proof and maintainers explicitly accept the catchup compatibility/deprecation behavior.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main's live iMessage monitor path has no persistent inbound replay guard or stale age fence, and upstream imsg starts from current max rowid when no cursor is supplied. I did not run a real Messages.app bridge recovery in this review.

Is this the best way to solve the issue?

Mostly yes: fixing before iMessage dispatch with since_rowid, persistent GUID dedupe, and an age fence is the right layer, and preserving enabled catchup is the safer compatibility shape. The remaining gap is proof and maintainer acceptance for final remote recovery/default semantics.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 538d36eaaaa6.

Label changes

Label changes:

add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

P1: The PR targets a P1 iMessage message-integrity bug where stale inbound backlog can be dispatched as fresh requests.
merge-risk: 🚨 compatibility: The PR changes shipped iMessage catchup configuration semantics and default recovery behavior for existing setups.
merge-risk: 🚨 message-delivery: The PR changes when iMessage inbound rows are replayed, suppressed, deduped, or delivered to agents.
rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR has useful redacted live logs, but final-head remote cliPath since_rowid recovery still needs real proof; redact private identifiers, update the PR body, and ClawSweeper should re-review automatically or a maintainer can request @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +569, Tests +384, Docs -43, Generated 0. Total +910 across 18 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	6	625	56	+569
Tests	7	722	338	+384
Docs	3	29	72	-43
Config	0	0	0	0
Generated	2	4	4	0
Other	0	0	0	0
Total	18	1380	470	+910

What I checked:

Repository policy applied: Root and scoped policy were read; iMessage config/default changes and message delivery changes are compatibility-sensitive and need proof beyond diff-only review. (AGENTS.md:1, 538d36eaaaa6)
Current main behavior: Current main has opt-in catchup and a local startup rowid watermark, but no persistent iMessage inbound claim/commit/release dedupe or live stale-backlog age fence before enqueue. (extensions/imessage/src/monitor/monitor-provider.ts:343, 538d36eaaaa6)
PR implementation shape: Final head computes watchSinceRowid from the recovery cursor, splits the age fence by local recovery boundary, commits suppressed stale rows to dedupe, claims inbound rows before enqueue, and keeps enabled legacy catchup as the compatibility path. (extensions/imessage/src/monitor/monitor-provider.ts:358, d84c84d5fa3c)
Regression coverage: The PR adds tests for remote cursor replay, enabled catchup preservation, local downtime replay using the wider recovery window, and live old-row suppression. (extensions/imessage/src/monitor.last-route.test.ts:608, d84c84d5fa3c)
Dependency contract: Upstream imsg documents since_rowid as an exclusive watch cursor, passes it through RPC, and self-fences to maxRowID() only when the watcher cursor is zero. (openclaw/imsg:Sources/IMsgCore/MessageWatcher.swift:100, 041b40686a6d)
Proof gap: The PR body gives redacted live logs for remote age-fence behavior and local downtime replay, but final head also implements remote cliPath since_rowid recovery while the body still describes remote recovery as out of scope. (d84c84d5fa3c)

Likely related people:

vincentkoc: Recent main history for iMessage monitor startup/retry and catchup-adjacent code includes multiple commits, and the final PR head commit is authored by Vincent Koc. (role: recent area contributor; confidence: high; commits: 8dff52958701, 35a784c16563, d84c84d5fa3c; files: extensions/imessage/src/monitor/monitor-provider.ts, extensions/imessage/src/monitor.last-route.test.ts)
steipete: Shortlog shows Peter Steinberger as the largest contributor across the central iMessage monitor/docs files, with repeated refactors around plugin/channel seams. (role: historical area contributor; confidence: medium; commits: d85b2a0e8148, e4b5027c5e29, 2766c27b2aa6; files: extensions/imessage/src/monitor/monitor-provider.ts, docs/channels/imessage.md)
omarshahine: Beyond opening this PR, Omar Shahine recently touched the same iMessage monitor/coalescing surface in the merged split-send coalescing work. (role: recent adjacent contributor; confidence: medium; commits: 9caff5f873cd, e95bf048af9c, 408d00a7adfa; files: extensions/imessage/src/monitor/monitor-provider.ts, extensions/imessage/src/monitor/coalesce.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Replaces the opt-in catchup subsystem with always-on inbound replay protection that brings iMessage in line with the other channels, and fixes #89237 (stale backlog dispatched as fresh after bridge recovery). - New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/ release) plus a stale-backlog age fence that suppresses live rows whose send date is materially older than arrival (logged, never silent). - monitor-provider: claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush / release on dispatch failure (per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the local startup since_rowid watermark so startup-window rows are not skipped. - Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the channels.imessage.catchup schema, cursor migration, and config-guard nag. Back-compat: strip the retired key before validation; new imessage doctor contract reports + removes it on doctor --fix. - Docs updated for the new recovery model. Net -947 prod LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Builds downtime recovery on the new inbound dedupe instead of restoring the old catchup subsystem. On startup the monitor passes the last dispatched rowid (a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg replays the messages that landed while the gateway was down, then tails live. The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping is needed. - recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid. - monitor-provider: since_rowid = cursor (capped to the most recent IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary so replayed rows (<= boundary) use the wider recovery window and live rows (> boundary) keep the tight #89237 fence; advance the cursor on commit. - Local only: remote SSH cliPath cannot read chat.db, so it tails from the current rowid (suppress-and-move-on) as before. Restores missed-message recovery that the catchup removal dropped, with no config and a fraction of the old LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

omarshahine · 2026-06-08T06:19:32Z

@clawsweeper re-review

Reshaped since the last review: instead of deprecating catchup outright, this now replaces the ~850-LOC catchup subsystem with downtime recovery built on the new inbound dedupe (startup since_rowid = persisted last-dispatched rowid; age fence split on the startup rowid boundary). Missed-during-downtime messages are recovered again, with no config. Live down→up→recovered proof and the #89237 age-fence proof are in the PR description. Net -710 prod LOC.

clawsweeper · 2026-06-08T06:19:35Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

…safe Addresses two cursor-state regressions in the downtime-recovery path: - Failed replay rows could be skipped forever: a released (failed) row keeps its dedupe claim for retry, but a later successful row in the same flush advanced the cursor past it, so the next startup's since_rowid skipped it. Hold a per-session floor at the lowest released rowid and never advance the cursor past it. - Suppressed live backlog could be re-delivered after a restart: a live row suppressed under the tight live fence was not recorded, so after a restart it fell under the wider recovery window (its rowid now below the new boundary) and was delivered. Commit its dedupe key on suppression so the recovery replay treats it as already handled. Both caught by Codex autoreview. Adds regression tests for the floor and the suppression record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Hash the composite fallback key's variable parts (conversation, sender, created_at, text) so the key is length-bounded regardless of message text. The persistent dedupe store already hashes keys internally, so this was not a live overflow, but the bounded key removes the dependency on that and keeps the fallback fail-open. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

omarshahine · 2026-06-08T06:41:24Z

@clawsweeper re-review

Your P1 ("users who enabled catchup would silently move to suppress-and-move-on and miss downtime messages") was reviewed against the pre-reshape commit e9b5b23. The PR has since been reshaped so that is no longer the behavior:

Catchup is replaced, not removed. Downtime recovery is now automatic on local setups: on startup the monitor passes the last dispatched rowid (a persisted cursor) to imsg watch.subscribe as since_rowid, so imsg replays the messages that landed while the gateway was down, then tails live. The dedupe drops anything already handled. A user who had catchup.enabled: true keeps the capability — now with no config.
Live proof of the exact concern is in the PR description (Real behavior proof): gateway stopped → 2 messages sent while down → gateway restarted → both replayed and delivered (recovery=true), end to end on real imsg/Messages.

On the merge-risk: compatibility / message-delivery labels: there is no upgrade risk to the config removal. Existing configs still load (the retired catchup key is stripped before validation), the key is inert, and recovery runs automatically — so a catchup-on user loses nothing on upgrade. openclaw doctor --fix removes the stale key and a doctor rule reports it.

Two cursor-state edges in the new recovery path (failed-replay cursor leapfrog, suppressed-row re-delivery after restart) and a bounded-key concern were caught by Codex autoreview and fixed in b3e66b83 / 2bf16dc5, with regression tests.

clawsweeper · 2026-06-08T06:41:26Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Superseded
Detail: A newer re-review for this item started before this run finished, so GitHub cancelled this older run. Check the latest ClawSweeper run for the current result.
Run: https://github.com/openclaw/clawsweeper/actions/runs/27120568240
Updated: 2026-06-08T06:49:09.234Z

The since_rowid replay runs over the imsg RPC client, so driving it from the persisted recovery cursor (not the local chat.db boundary) makes downtime recovery work for remote SSH cliPath gateways — the topology the old RPC-based catchup served and that the rowid-boundary-only version regressed. Local setups keep the wider, capped recovery window via the chat.db boundary; remote uses the live age-fence window. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…grade A one-time, self-cleaning migration: when the recovery cursor is empty on the first startup after upgrade, seed it from the retired imessage.catchup-cursors lastSeenRowid and consume the legacy entry. Without this a user who had catchup enabled would not replay messages missed across the upgrade restart. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vincentkoc · 2026-06-08T07:48:44Z

Maintainer fix pushed in d84c84d.

What changed:

Preserved channels.imessage.catchup.enabled: true as the legacy compatibility replay path, including remote/SSH setups.
Kept disabled/non-enabled catchup blocks retired and removable by doctor --fix.
Restored legacy catchup cursor sidecar detection/import so upgrade cursors are not lost.
Tightened recovery cursor advancement so failed or lower pending replay rows cannot be skipped.
Kept first-run startup-boundary rows under the live stale fence unless a real prior recovery cursor exists.
Restored coalesced catchup high-water metadata and removed raw NUL bytes from the inbound dedupe source.

Proof:

pnpm test:serial src/cli/program/config-guard.test.ts extensions/imessage/src/monitor.last-route.test.ts extensions/imessage/src/monitor/recovery-cursor.test.ts
pnpm test:serial extensions/imessage/src/monitor.last-route.test.ts extensions/imessage/src/monitor/coalesce.test.ts extensions/imessage/src/monitor/inbound-dedupe.test.ts extensions/imessage/src/monitor/catchup.test.ts extensions/imessage/src/monitor/catchup-bridge.test.ts extensions/imessage/src/doctor-contract-api.test.ts extensions/imessage/src/state-migrations.test.ts
pnpm config:channels:check
pnpm config:docs:check
git diff --check
.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main clean for accepted/actionable findings
OPENCLAW_TESTBOX=1 pnpm check:changed passed on Testbox tbx_01ktk2pr836sg06dx1tv2366cn (Actions run: https://github.com/openclaw/openclaw/actions/runs/27122967788). The wrapper exited 0; the trailing external runner portal sync warning was a coordinator 401 after completion.

@clawsweeper re-review

clawsweeper · 2026-06-08T07:48:46Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/27123453538
Updated: 2026-06-08T08:01:20.787Z

* feat(imessage): always-on inbound recovery, deprecate catchup Replaces the opt-in catchup subsystem with always-on inbound replay protection that brings iMessage in line with the other channels, and fixes openclaw#89237 (stale backlog dispatched as fresh after bridge recovery). - New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/ release) plus a stale-backlog age fence that suppresses live rows whose send date is materially older than arrival (logged, never silent). - monitor-provider: claim at ingestion, carry the exact claimed key on the debouncer entry, commit on successful flush / release on dispatch failure (per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the local startup since_rowid watermark so startup-window rows are not skipped. - Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the channels.imessage.catchup schema, cursor migration, and config-guard nag. Back-compat: strip the retired key before validation; new imessage doctor contract reports + removes it on doctor --fix. - Docs updated for the new recovery model. Net -947 prod LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(imessage): recover downtime messages via since_rowid replay Builds downtime recovery on the new inbound dedupe instead of restoring the old catchup subsystem. On startup the monitor passes the last dispatched rowid (a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg replays the messages that landed while the gateway was down, then tails live. The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping is needed. - recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid. - monitor-provider: since_rowid = cursor (capped to the most recent IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary so replayed rows (<= boundary) use the wider recovery window and live rows (> boundary) keep the tight openclaw#89237 fence; advance the cursor on commit. - Local only: remote SSH cliPath cannot read chat.db, so it tails from the current rowid (suppress-and-move-on) as before. Restores missed-message recovery that the catchup removal dropped, with no config and a fraction of the old LOC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): make recovery cursor advance failure- and suppression-safe Addresses two cursor-state regressions in the downtime-recovery path: - Failed replay rows could be skipped forever: a released (failed) row keeps its dedupe claim for retry, but a later successful row in the same flush advanced the cursor past it, so the next startup's since_rowid skipped it. Hold a per-session floor at the lowest released rowid and never advance the cursor past it. - Suppressed live backlog could be re-delivered after a restart: a live row suppressed under the tight live fence was not recorded, so after a restart it fell under the wider recovery window (its rowid now below the new boundary) and was delivered. Commit its dedupe key on suppression so the recovery replay treats it as already handled. Both caught by Codex autoreview. Adds regression tests for the floor and the suppression record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): bound the GUID-less replay key length Hash the composite fallback key's variable parts (conversation, sender, created_at, text) so the key is length-bounded regardless of message text. The persistent dedupe store already hashes keys internally, so this was not a live overflow, but the bounded key removes the dependency on that and keeps the fallback fail-open. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): recover downtime messages on remote cliPath setups too The since_rowid replay runs over the imsg RPC client, so driving it from the persisted recovery cursor (not the local chat.db boundary) makes downtime recovery work for remote SSH cliPath gateways — the topology the old RPC-based catchup served and that the rowid-boundary-only version regressed. Local setups keep the wider, capped recovery window via the chat.db boundary; remote uses the live age-fence window. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): seed recovery cursor from retired catchup cursor on upgrade A one-time, self-cleaning migration: when the recovery cursor is empty on the first startup after upgrade, seed it from the retired imessage.catchup-cursors lastSeenRowid and consume the legacy entry. Without this a user who had catchup enabled would not replay messages missed across the upgrade restart. Flagged by autoreview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(imessage): preserve catchup recovery on upgrade --------- Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

…26.6.6) (#1040) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.5` → `2026.6.6` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.6.6`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202666) [Compare Source](openclaw/openclaw@v2026.6.5...v2026.6.6) ##### Highlights - Security boundaries are substantially tighter across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP access, native search policy, elevated sender checks, deleted-agent ACP bypasses, loopback tools, Discord moderation, and Teams group actions; exec approvals now fail closed on timeout. ([#91529](openclaw/openclaw#91529), [#91618](openclaw/openclaw#91618), [#91615](openclaw/openclaw#91615), [#91619](openclaw/openclaw#91619), [#91741](openclaw/openclaw#91741), [#91745](openclaw/openclaw#91745), [#91746](openclaw/openclaw#91746), [#91748](openclaw/openclaw#91748), [#91749](openclaw/openclaw#91749), [#91750](openclaw/openclaw#91750), [#91751](openclaw/openclaw#91751), [#91752](openclaw/openclaw#91752), [#91763](openclaw/openclaw#91763), [#89938](openclaw/openclaw#89938)) Thanks [@joshavant](https://github.com/joshavant), [@pgondhi987](https://github.com/pgondhi987), [@mmaps](https://github.com/mmaps), [@eleqtrizit](https://github.com/eleqtrizit), [@shakkernerd](https://github.com/shakkernerd), and [@drobison00](https://github.com/drobison00). - Telegram delivery is safer and more coherent: account-scoped topics route to the right agent, streamed text survives tool calls, `/compact` works on generic ingress, callback handling uses concrete APIs, draft chunking is shared, durable dispatch dedupe moved into the SDK, and unauthorized DM text stays out of cache and prompt context. ([#91189](openclaw/openclaw#91189), [#88682](openclaw/openclaw#88682), [#89588](openclaw/openclaw#89588), [#90212](openclaw/openclaw#90212), [#91876](openclaw/openclaw#91876), [#91874](openclaw/openclaw#91874), [#91904](openclaw/openclaw#91904), [#91478](openclaw/openclaw#91478), [#91915](openclaw/openclaw#91915)) Thanks [@codysai001](https://github.com/codysai001), [@alexzhu0](https://github.com/alexzhu0), [@joelnishanth](https://github.com/joelnishanth), [@snowzlm](https://github.com/snowzlm), [@obviyus](https://github.com/obviyus), and [@sallyom](https://github.com/sallyom). - iMessage recovery and delivery now cover always-on inbound restart, durable echo markers, block streaming, idle approval discovery, hardened outbound transport, and actionable inbound startup diagnostics. ([#91335](openclaw/openclaw#91335), [#91449](openclaw/openclaw#91449), [#88969](openclaw/openclaw#88969), [#88530](openclaw/openclaw#88530), [#91783](openclaw/openclaw#91783), [#91785](openclaw/openclaw#91785)) Thanks [@omarshahine](https://github.com/omarshahine), [@jmissig](https://github.com/jmissig), and [@colmbrogan](https://github.com/colmbrogan). - Browser and MCP connectivity gained existing-session CDP support, discovered WebSocket validation, default-profile `cdpUrl` handling, safer browser-output boundaries, Streamable HTTP loopback transport, corrected OAuth/SSE authorization handling, and broader schema compatibility. ([#91422](openclaw/openclaw#91422), [#89851](openclaw/openclaw#89851), [#91736](openclaw/openclaw#91736), [#91747](openclaw/openclaw#91747), [#91451](openclaw/openclaw#91451), [#80143](openclaw/openclaw#80143)) Thanks [@pgondhi987](https://github.com/pgondhi987), [@anagnorisis2peripeteia](https://github.com/anagnorisis2peripeteia), [@lifuyue](https://github.com/lifuyue), [@eleqtrizit](https://github.com/eleqtrizit), [@LiuwqGit](https://github.com/LiuwqGit), and [@HemantSudarshan](https://github.com/HemantSudarshan). - Control UI startup and first-reply latency are lower through cached model metadata, removal of the startup catalog wait, lazy slash-command loading, and first-event tracing with slow-reply diagnostics. ([#91531](openclaw/openclaw#91531), [#91538](openclaw/openclaw#91538), [#91568](openclaw/openclaw#91568), [#91583](openclaw/openclaw#91583), [#91598](openclaw/openclaw#91598)) - Provider support expands with OpenRouter OAuth onboarding and Claude Fable 5 adaptive thinking, while Codex sessions keep correct compaction ownership, local models skip guardian review, dynamic tool progress normalizes cleanly, and Gemma 4 reasoning replay is preserved. ([#91830](openclaw/openclaw#91830), [#91882](openclaw/openclaw#91882), [#91590](openclaw/openclaw#91590), [#88630](openclaw/openclaw#88630), [#88768](openclaw/openclaw#88768), [#91696](openclaw/openclaw#91696)) Thanks [@Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@joshavant](https://github.com/joshavant), [@bdjben](https://github.com/bdjben), and [@Coder-Wangyankun](https://github.com/Coder-Wangyankun). ##### Changes - CLI progress: emit Claude CLI commentary progress events and bridge inter-tool commentary into channel progress without exposing internal protocol scaffolding. ([#89834](openclaw/openclaw#89834), [#90883](openclaw/openclaw#90883)) Thanks [@anagnorisis2peripeteia](https://github.com/anagnorisis2peripeteia). - Observability: allow trusted diagnostics channels to capture tool input/output content, add first-assistant-event traces, and warn on slow initial replies. ([#91256](openclaw/openclaw#91256), [#91568](openclaw/openclaw#91568), [#91583](openclaw/openclaw#91583)) Thanks [@amknight](https://github.com/amknight). - Plugins/ClawHub: dogfood reusable package publishing, let dry runs skip publish approval, allow declared installed trusted hooks, report managed plugin version drift, and warn instead of failing on retired Skill Workshop configuration. ([#91574](openclaw/openclaw#91574), [#91591](openclaw/openclaw#91591), [#90004](openclaw/openclaw#90004), [#90927](openclaw/openclaw#90927), [#90838](openclaw/openclaw#90838)) Thanks [@Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@brokemac79](https://github.com/brokemac79), and [@lonexreb](https://github.com/lonexreb). - Memory/providers: move the local llama.cpp runtime into its provider plugin, batch embeddings across files, persist the agent model catalog cache, and keep QMD JSON search one-shot while filtering stale REM recall previews. ([#91324](openclaw/openclaw#91324), [#89138](openclaw/openclaw#89138), [#90457](openclaw/openclaw#90457), [#91837](openclaw/openclaw#91837), [#91851](openclaw/openclaw#91851)) Thanks [@osolmaz](https://github.com/osolmaz), [@mushuiyu886](https://github.com/mushuiyu886), [@ai-hpc](https://github.com/ai-hpc), and [@TurboTheTurtle](https://github.com/TurboTheTurtle). - Channels/mobile: add the QQBot group mention toggle, improve iPad and iPhone control surfaces, and expose the active connection host in the TUI footer. ([#91423](openclaw/openclaw#91423), [#91557](openclaw/openclaw#91557), [#89909](openclaw/openclaw#89909)) Thanks [@cxyhhhhh](https://github.com/cxyhhhhh), [@Solvely-Colin](https://github.com/Solvely-Colin), and [@baskduf](https://github.com/baskduf). - Performance: prewarm TUI runtime plugins, deduplicate plugin auto-enable fanout, trim dense text-delta snapshots, and reuse prepared startup model metadata. ([#90782](openclaw/openclaw#90782), [#89978](openclaw/openclaw#89978), [#91580](openclaw/openclaw#91580), [#91531](openclaw/openclaw#91531)) Thanks [@RomneyDa](https://github.com/RomneyDa) and [@ai-hpc](https://github.com/ai-hpc). ##### Fixes - Agent/session recovery: drop stale approval follow-ups after session rebind, remove drained reply-queue items by identity, recover stale main and visible replies, preserve Codex context-engine compaction ownership, lower the default compaction timeout to 180 seconds while respecting explicit configuration, and keep provider-failure terminal lifecycle state correct. ([#85679](openclaw/openclaw#85679), [#91450](openclaw/openclaw#91450), [#91566](openclaw/openclaw#91566), [#91840](openclaw/openclaw#91840), [#91590](openclaw/openclaw#91590), [#91361](openclaw/openclaw#91361), [#91895](openclaw/openclaw#91895)) Thanks [@openperf](https://github.com/openperf), [@yetval](https://github.com/yetval), [@joshavant](https://github.com/joshavant), [@wangmiao0668000666](https://github.com/wangmiao0668000666), and [@TurboTheTurtle](https://github.com/TurboTheTurtle). - User-visible content boundaries: suppress Codex/Harmony protocol artifacts, neutralize browser and LanceDB memory media directives, redact transcript images, and preserve native `/compact` replies through source suppression. ([#89151](openclaw/openclaw#89151), [#91422](openclaw/openclaw#91422), [#91425](openclaw/openclaw#91425), [#91529](openclaw/openclaw#91529), [#90212](openclaw/openclaw#90212)) Thanks [@joelnishanth](https://github.com/joelnishanth), [@pgondhi987](https://github.com/pgondhi987), [@joshavant](https://github.com/joshavant), and [@snowzlm](https://github.com/snowzlm). - Channel delivery: keep WhatsApp captured replies attached to the successor controller after restart, retry Feishu rate limits, preserve Mattermost thread replies, canonicalize LINE webhook paths, restore Discord reply hydration and runtime timeout exports, and show OpenAI Realtime WebRTC assistant transcripts. ([#85823](openclaw/openclaw#85823), [#89659](openclaw/openclaw#89659), [#91684](openclaw/openclaw#91684), [#91649](openclaw/openclaw#91649), [#90263](openclaw/openclaw#90263), [#91686](openclaw/openclaw#91686), [#90426](openclaw/openclaw#90426)) Thanks [@itsuzef](https://github.com/itsuzef), [@ladygege](https://github.com/ladygege), [@jacobtomlinson](https://github.com/jacobtomlinson), [@fuller-stack-dev](https://github.com/fuller-stack-dev), and [@shushushv](https://github.com/shushushv). - Cron: cancel active task runs cleanly, preserve terminal timeout/cancel state, and recover no-deliver tool warnings instead of silently losing the outcome. ([#90666](openclaw/openclaw#90666), [#90678](openclaw/openclaw#90678)) Thanks [@ai-hpc](https://github.com/ai-hpc). - Gateway/config/auth: share the approval runtime socket token, replace arrays explicitly in `config.patch`, skip the deleted-agent guard only for valid ACP harness sessions, surface headless LaunchAgent state, verify SQLite auth migration before cleanup, and arm QMD startup maintenance. ([#87105](openclaw/openclaw#87105), [#91551](openclaw/openclaw#91551), [#91219](openclaw/openclaw#91219), [#91614](openclaw/openclaw#91614), [#91740](openclaw/openclaw#91740), [#91978](openclaw/openclaw#91978)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev) and [@scotthuang](https://github.com/scotthuang). - Providers/Codex: clarify quota errors, restore the Codex synthetic usage line, canonicalize Codex protocol assets, require API-key auth for realtime voice, normalize ACP model refs, preserve Gemma 4 `reasoning_content`, and avoid guardian review for local models. ([#91390](openclaw/openclaw#91390), [#91709](openclaw/openclaw#91709), [#91507](openclaw/openclaw#91507), [#91567](openclaw/openclaw#91567), [#88630](openclaw/openclaw#88630), [#91696](openclaw/openclaw#91696)) Thanks [@hxy91819](https://github.com/hxy91819), [@brokemac79](https://github.com/brokemac79), [@RomneyDa](https://github.com/RomneyDa), [@joshavant](https://github.com/joshavant), and [@Coder-Wangyankun](https://github.com/Coder-Wangyankun). - Updates/builds: recover package Gateway restarts after refresh failure, expose plugin convergence repair, fall back to Corepack in PATH-less pnpm environments, seed the correct Docker store packages, and keep ClawHub dry-run and publish paths reusable. ([#91581](openclaw/openclaw#91581), [#91599](openclaw/openclaw#91599), [#91547](openclaw/openclaw#91547), [#91591](openclaw/openclaw#91591)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev), [@sallyom](https://github.com/sallyom), and [@Patrick-Erichsen](https://github.com/Patrick-Erichsen). - UI: require explicit user intent before opening chat sessions and drain restored chat queues after session switches. ([#91480](openclaw/openclaw#91480)) Thanks [@TurboTheTurtle](https://github.com/TurboTheTurtle). - Android: avoid the `dataSync` foreground-service type for persistent nodes. ([#80082](openclaw/openclaw#80082)) Thanks [@davelutztx](https://github.com/davelutztx). - Native hooks: bound relay lifetimes so abandoned native hook connections cannot linger indefinitely. ([#91550](openclaw/openclaw#91550)) Thanks [@joshavant](https://github.com/joshavant). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/1040

openclaw-barnacle Bot added docs Improvements or additions to documentation channel: imessage Channel integration: imessage gateway Gateway runtime cli CLI command changes size: XL maintainer Maintainer-authored PR labels Jun 8, 2026

omarshahine and others added 2 commits June 8, 2026 06:11

omarshahine force-pushed the fix/imessage-89237-inbound-recovery branch from e9b5b23 to 408d00a Compare June 8, 2026 06:18

omarshahine changed the title ~~fix(imessage): always-on inbound recovery, deprecate catchup~~ fix(imessage): inbound dedupe + downtime recovery, replace catchup subsystem Jun 8, 2026

omarshahine changed the title ~~fix(imessage): inbound dedupe + downtime recovery, replace catchup subsystem~~ fix(imessage): always-on inbound recovery and dedupe Jun 8, 2026

omarshahine and others added 2 commits June 8, 2026 06:32

omarshahine and others added 2 commits June 8, 2026 06:48

clawsweeper Bot removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels Jun 8, 2026

fix(imessage): preserve catchup recovery on upgrade

d84c84d

clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 8, 2026

vincentkoc merged commit fc6400e into main Jun 8, 2026
291 of 296 checks passed

vincentkoc deleted the fix/imessage-89237-inbound-recovery branch June 8, 2026 07:54

github-actions Bot mentioned this pull request Jun 8, 2026

📡 Upstream Digest — 2026-06-08 08:46 UTC curtismercier/openclaw-mods#1037

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(imessage): always-on inbound recovery and dedupe#91335

fix(imessage): always-on inbound recovery and dedupe#91335
vincentkoc merged 7 commits into
mainfrom
fix/imessage-89237-inbound-recovery

omarshahine commented Jun 8, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

omarshahine commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

omarshahine commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

vincentkoc commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

omarshahine commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linked context

Real behavior proof (required for external PRs)

Tests and validation

Risk checklist

Current review state

Uh oh!

clawsweeper Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omarshahine commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omarshahine commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentkoc commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omarshahine commented Jun 8, 2026 •

edited

Loading

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading