Skip to content

fix(imessage): self-explaining private-API failures and dedicated send timeout#91041

Merged
omarshahine merged 1 commit into
mainfrom
fix/imessage-macos26-resilience
Jun 7, 2026
Merged

fix(imessage): self-explaining private-API failures and dedicated send timeout#91041
omarshahine merged 1 commit into
mainfrom
fix/imessage-macos26-resilience

Conversation

@omarshahine

@omarshahine omarshahine commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Two macOS-26-resilience fixes for the iMessage channel:

  1. Surface why the private API is unavailable. The probe now carries imsg's own status --json message (SIP / library validation / macOS 26 AMFI gate) into IMessagePrivateApiStatus.statusMessage, and blocked actions append it. Operators on macOS 26.5 stop getting the misleading "run imsg launch" with no reason.

  2. Decouple the send timeout from the probe timeout. Sends inherited the 10s probe default via client.request. On macOS 26 the private-API bridge intermittently stalls up to ~124s; the 10s abort dropped attachment/reply sends (non-recoverable shapes). New DEFAULT_IMESSAGE_SEND_TIMEOUT_MS = 150_000 for sends only (covers the observed upper bound + headroom); probes/health checks stay fast. No new config surface — explicit opts/probeTimeoutMs still win. Mirrors the BlueBubbles send-timeout fix (fix(bluebubbles): configurable sendTimeoutMs, bump send default to 30s #69193).

  3. Companion imsg bridge fix. ClawSweeper correctly found that OpenClaw's 150s client wait was incomplete while upstream imsg bridge sends still defaulted to 10s. Companion PR open: fix: use longer bridge timeout for send actions imsg#139 at 4c85d165c31f4711f75d7bfda4c31723219ebadd changes send-style bridge actions (send-message, send-multipart, send-attachment, send-poll, send-reaction) to use a 150s bridge response timeout while keeping non-send RPC waits at 10s and preserving explicit timeout overrides. send-reaction is included because tapbacks build an IMMessage and dispatch through sendMessage:; notify-anyways intentionally stays at 10s because it is a message-item mutation path.

Verification

  • pnpm tsgo:extensions clean.
  • Full imessage extension suite: 548 tests pass, incl. 2 new tests (statusMessage propagation; 150s send default).
  • node scripts/run-vitest.mjs extensions/imessage/src/status.test.ts extensions/imessage/src/actions.test.ts extensions/imessage/src/send.test.ts (3 files, 81 tests passed).
  • Codex $autoreview clean (it caught an initial 60s-too-short default vs the documented 124s stall window; fixed to 150s).
  • Dependency source inspected: local sibling imsg source had IMsgBridgeProtocol.defaultResponseTimeout = 10.0, and RPCServer / CLI bridge send paths called IMsgBridgeClient.shared.invoke without a longer send timeout. Companion fix: use longer bridge timeout for send actions imsg#139 fixes that dependency-layer timeout contract.
  • Companion imsg verification for fix: use longer bridge timeout for send actions imsg#139 (4c85d165c31f4711f75d7bfda4c31723219ebadd):
    • swift format lint Sources/IMsgCore/IMsgBridgeProtocol.swift Tests/IMsgCoreTests/IMsgBridgeProtocolTests.swift
    • DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer swift test --filter IMsgBridgeProtocolTests (7 tests passed)
    • DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer make test (325 tests in 2 suites passed)
    • GitHub CI: macos and linux-read-core passed
  • Timeout tradeoff: The 150s default is intentional for send-style private API bridge actions because the observed macOS 26 stalls reached about 124s. A bridge send that never returns can now hold the caller up to 150s; probes, health checks, explicit overrides, and non-send bridge actions keep the short timeout.
  • Live macOS 26.4.1 healthy-host status proof on this machine: imsg status --json returned message: "Connected to Messages.app. IMCore features available.", v2_ready: true, and bridge_version: 2; pnpm openclaw channels status --probe imessage --json reported iMessage configured/running with probe.ok: true and privateApi.available: true.
  • Live macOS 26.5.1 blocked-action proof in a clean VM: OpenClaw built from this PR was installed and run as a gateway with imsg 0.11.0 present but the private-API bridge intentionally not injected. A real CLI openclaw message react --channel imessage ... dispatch returned the blocked-action error with imsg's own reason appended: imsg reports: SIP is disabled and the helper dylib is present, but Messages.app is not currently injected. Run \imsg launch` to enable advanced IMCore features.` See the proof comment: fix(imessage): self-explaining private-API failures and dedicated send timeout #91041 (comment)
  • Live regression smoke on macOS 26.4.1 (lobster): built 2026.6.1 + this fix, deployed to the gateway, restarted (PID bounced), iMessage + Telegram both healthy, no startup errors.

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added channel: imessage Channel integration: imessage size: S maintainer Maintainer-authored PR labels Jun 7, 2026
@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 7, 2026, 5:04 PM ET / 21:04 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +28, Tests +46. Total +74 across 7 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3b6bcbfb5045.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove proof: sufficient: Current real behavior proof status is not_applicable, not sufficient.
  • remove P2: Current review triage priority is none.
  • remove rating: 🦞 diamond lobster: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 message-delivery: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 availability: Current PR review selected no merge-risk labels.
  • remove status: 👀 ready for maintainer look: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +28, Tests +46. Total +74 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 5 31 3 +28
Tests 2 46 0 +46
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 77 3 +74

What I checked:

  • failure reason: timeout.
  • codex failure detail: Codex review failed for this PR: spawnSync codex ETIMEDOUT.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@blacksmith-sh

This comment has been minimized.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 7, 2026
@omarshahine

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Added the missing dependency-layer proof from your review:

  • Opened companion upstream imsg PR: fix: use longer bridge timeout for send actions imsg#139
  • That PR changes send-style bridge actions to use a 150s bridge response timeout while keeping non-send RPC waits at 10s and preserving explicit timeout overrides.
  • Verified imsg locally with focused protocol tests, RPC bridge/send tests, and full make test (325 tests in 2 suites passed).
  • Re-ran the focused OpenClaw iMessage tests here: node scripts/run-vitest.mjs extensions/imessage/src/status.test.ts extensions/imessage/src/actions.test.ts extensions/imessage/src/send.test.ts (3 files, 81 tests passed).
  • Updated this PR body with the companion PR, source-inspection notes, and live macOS 26.4.1 status proof.

The local host is healthy (imsg status --json reports Messages.app connected and OpenClaw probe reports privateApi available), so I cannot honestly produce the blocked-action error path from this machine without intentionally breaking the bridge; that path remains covered by the focused unit tests.

@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 7, 2026
@omarshahine

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Updated this PR body with the latest companion imsg state:

No OpenClaw code changes were needed after the latest review; the remaining gap is still live delayed-send or blocked-action proof.

@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@omarshahine

Copy link
Copy Markdown
Contributor Author

Live behavior proof — statusMessage on a blocked private-API action

Beyond the unit tests, here is the statusMessage change demonstrated end-to-end against a real bridge-down imsg.

Environment: OpenClaw built from this PR (2026.6.1 + this commit, 71fb7f9) installed and running as a gateway in a clean macOS 26.5.1 VM. imsg 0.11.0 present but the private-API bridge intentionally not injected (SIP off, no DisableLibraryValidation), so imsg status reports a reason.

Command (real CLI dispatch, no mocks):

openclaw message react --channel imessage --target "+15555550123" --message-id "p:0/test-guid" --emoji "👍"

Result — the blocked-action error now carries imsg's own status message:

[imessage] react blocked: private API bridge unavailable (accountId=default, cliPath=/Users/lume/0.11.0/libexec/imsg).
Run `imsg launch` to re-inject the dylib, then `openclaw channels status` to refresh.
imsg reports: SIP is disabled and the helper dylib is present, but Messages.app is not currently injected.
Run `imsg launch` to enable advanced IMCore features.

Note: macOS 26/Tahoe can still block advanced IMCore features through
library validation or imagent private entitlement checks...

Error: iMessage react requires the imsg private API bridge. ... imsg reports: SIP is disabled and the helper dylib is present, but Messages.app is not currently injected. ...

The imsg reports: <...> suffix is the change in this PR. Before it, the error stopped at the generic "Run imsg launch, then openclaw channels status" with no reason — so an operator on macOS 26 couldn't tell why the bridge was down (SIP / library validation / not injected). Now the real reason from imsg is surfaced on both the WARN log line and the thrown error.

The 150s send-timeout half stays unit-asserted (send.test.ts: client.request is called with timeoutMs: 150_000) — a live demo would require inducing a real 60–124s bridge stall on demand, which isn't practical.

@omarshahine

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

The live proof gap changed after the prior re-review started. I updated the PR body to include the new macOS 26.5.1 VM proof from #91041 (comment).

What the new proof covers:

  • Real OpenClaw CLI dispatch against iMessage, not a unit mock.
  • imsg 0.11.0 present with the private-API bridge intentionally not injected.
  • A blocked message react action now includes imsg's own private-API status message in the user-facing OpenClaw error.

The remaining timeout tradeoff is still explicit in the body: send-style private API bridge actions may wait up to 150s, and the companion imsg PR is open at openclaw/imsg#139 with green CI.

@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels Jun 7, 2026
@omarshahine omarshahine force-pushed the fix/imessage-macos26-resilience branch from 7175ea0 to 82f8eeb Compare June 7, 2026 03:47
@omarshahine omarshahine force-pushed the fix/imessage-macos26-resilience branch from 82f8eeb to f6338db Compare June 7, 2026 20:03
@omarshahine

Copy link
Copy Markdown
Contributor Author

Maintainer proof waiver + merge rationale

Recording a maintainer waiver on the delayed-send proof gate, with the dependency behavior verified directly against imsg source.

Why a waiver: the delayed-send timeout targets the 60–124s macOS 26 stuck-send tail, which can't be reproduced on demand — it's inherently source/unit-provable, not live-reproducible. The blocked-action statusMessage half is separately live-proven.

Verified no-regression (read IMsgBridgeClient.swift on imsg main): invokeV2 polls the bridge outbox until deadline = now + timeout and on expiry throw IMsgBridgeError.timeout(action:). So on the current released imsg (10s default), a slow send fails at ~10s and OpenClaw surfaces that error then. This PR's 150s client wait is therefore dormant with released imsg: no hang, no behavior change for the normal path. The longer wait only engages once an imsg build with the companion bridge timeout (openclaw/imsg#139) is installed.

Availability tradeoff (accepted): the only new behavior shipping ahead of #139 is that a fully wedged imsg RPC (no reply at all, not merely a slow send) would hold the send path up to 150s instead of 10s. Rare, and worth the macOS 26 send-reliability and diagnostics gains.

Dependency state: openclaw/imsg#139 is green + mergeable but lives in an upstream repo I don't own; it merges/releases on its own schedule. The timeout half activates automatically when it ships — no further OpenClaw change needed.

Checks: rebased onto latest main, all required checks green (0 failures), mergeStateStatus: CLEAN.

Landing as the partial mitigation (diagnostics now, timeout latent until #139 releases).

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 7, 2026
@clawsweeper clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label Jun 7, 2026
@omarshahine

omarshahine commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

(Prior re-review run was cancelled mid-flight, not a review failure. Maintainer waiver + verified no-regression rationale already recorded above; head is f6338db, all checks green.)

@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 7, 2026
@omarshahine omarshahine merged commit 6c35c0d into main Jun 7, 2026
199 of 212 checks passed
@omarshahine omarshahine deleted the fix/imessage-macos26-resilience branch June 7, 2026 21:07
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 8, 2026
…d timeout (openclaw#91041)

Append imsg's own status message (SIP / library validation / macOS 26 AMFI gate)
to iMessage private-API blocked-action errors so operators see the real blocker
instead of a generic "run imsg launch". Add a dedicated 150s default timeout for
iMessage send RPCs (explicit opts and probeTimeoutMs still win) so macOS 26
bridge stalls are not aborted mid-send.

Staged mitigation: the longer wait fully activates once the companion bridge
timeout (openclaw/imsg#139) ships; on current imsg the bridge still returns at
its own 10s, so there is no regression. Diagnostics half is live-proven; the
delayed-send timeout is covered by source + unit proof + maintainer waiver.
wangmiao0668000666 pushed a commit to wangmiao0668000666/openclaw that referenced this pull request Jun 9, 2026
…d timeout (openclaw#91041)

Append imsg's own status message (SIP / library validation / macOS 26 AMFI gate)
to iMessage private-API blocked-action errors so operators see the real blocker
instead of a generic "run imsg launch". Add a dedicated 150s default timeout for
iMessage send RPCs (explicit opts and probeTimeoutMs still win) so macOS 26
bridge stalls are not aborted mid-send.

Staged mitigation: the longer wait fully activates once the companion bridge
timeout (openclaw/imsg#139) ships; on current imsg the bridge still returns at
its own 10s, so there is no regression. Diagnostics half is live-proven; the
delayed-send timeout is covered by source + unit proof + maintainer waiver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: imessage Channel integration: imessage maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P2 Normal backlog priority with limited blast radius. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant