feat(outbox): add write-ahead outbox, recovery worker, and plugin compat layer#29148
feat(outbox): add write-ahead outbox, recovery worker, and plugin compat layer#29148nohat wants to merge 11 commits intoopenclaw:mainfrom
Conversation
Replace unbounded file-based delivery queue with queryable SQLite message_outbox table. Adds TTL/expiry for stale entries, delivery outcome retention, and one-time legacy file queue import on startup. Closes openclaw#23777, openclaw#16555, openclaw#29128
…pat layer Write-ahead delivery pattern: enqueue outbox entry before sending, ack on success, retry on failure. Continuous outbox worker replaces one-shot recovery. Plugin channels get durable delivery guarantees via v1/v2 adapter compat layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR implements a write-ahead outbox pattern for durable message delivery, replacing the previous fire-and-forget approach with persistent, crash-recoverable delivery semantics. Key Changes
ArchitectureThe outbox worker runs continuously with a time-budgeted recovery pass (75% of interval). Entries go through states: Test CoverageComprehensive test suite in
All 948 existing tests pass, and E2E tests verify write-ahead delivery, continuous recovery, startup cutoff filtering, and outbox pruning. Confidence Score: 5/5
Last reviewed commit: 8db9185 |
There was a problem hiding this comment.
💡 Codex Review
openclaw/src/infra/outbound/deliver.ts
Lines 268 to 271 in 8db9185
This wrapper only marks hadPartialFailure when a caller provides onError, but deliverOutboundPayloadsCore can still swallow per-payload failures when bestEffort is true and onError is absent. In that case the outbox row is acked as delivered even though one or more sends failed, so recovery will never retry those failed payloads (for example, current bestEffort: true call sites that do not pass onError).
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the P1 Track best-effort send failures without relying on onError from the Codex review. Change: When
Commit: 2c37968 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c379683c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…low v2 sendFinal-only plugin adapters (Codex P1+P2 openclaw#29148)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e5d3dfe1c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b34946507
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/infra/outbound/deliver.ts
Outdated
| sendPayload: outbound.sendFinal | ||
| ? async (payload, overrides) => | ||
| outbound.sendPayload!({ | ||
| outbound.sendFinal({ | ||
| ...resolveCtx(overrides), |
There was a problem hiding this comment.
Preserve chunking/media fan-out for inferred sendFinal adapters
createPluginHandler now enables sendPayload whenever outbound.sendFinal exists, but normalizeChannelOutboundAdapter synthesizes sendFinal for plain sendText/sendMedia adapters too. That means any payload with channelData now takes the sendPayload fast path and skips the normal chunking and per-media loop in deliverOutboundPayloadsCore; with the inferred implementation this can drop all but the first mediaUrl (and bypass text chunking), so channelData messages with attachments or long text can be delivered incorrectly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in aabc291: sendPayload is now only set when normalized.contract === "v2" (native sendFinal). Inferred sendFinal from v1 sendText/sendMedia adapters no longer takes the sendPayload fast path, so channelData payloads continue through the normal chunking and per-media fan-out loop.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aabc291c50
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/infra/outbound/deliver.ts
Outdated
| mediaUrl, | ||
| }), | ||
| sendPayload: | ||
| isNativeV2 && outbound.sendFinal |
There was a problem hiding this comment.
Preserve sendPayload path for legacy adapters
Restricting sendPayload to normalized.contract === "v2" drops channel-data delivery for existing v1 adapters that still implement sendPayload (for example extensions/line/src/channel.ts handles quick replies/templates via payload.channelData but does not declare outboundContract: "v2"). In that configuration, deliverOutboundPayloadsCore no longer calls the adapter’s sendPayload and falls back to sendText/sendMedia, which silently strips LINE-specific rich message metadata.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. Fixed in 377a635: createPluginHandler now also exposes sendPayload when the original v1 adapter implements it natively (e.g. LINE). The isNativeV2 guard still prevents inferred sendFinal from taking the fast path, but v1 adapters with their own sendPayload (like LINE's channelData handler) are passed through directly via hasNativeSendPayload.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 377a635868
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| queueId = await enqueueDelivery({ | ||
| channel: deliveryQueueContext.channel, | ||
| to: deliveryQueueContext.to, | ||
| accountId: deliveryQueueContext.accountId, | ||
| payloads: [normalized], |
There was a problem hiding this comment.
Preserve dispatch kind when queuing dispatcher payloads
These rows enqueue every normalized payload without recording whether it was a tool, block, or final send, but crash recovery replays all queued rows through deliverOutboundPayloads (see src/gateway/server-message-lifecycle.ts). That means channels that intentionally suppress non-final sends in their live dispatcher (for example src/web/auto-reply/monitor/process-message.ts returns early when info.kind !== "final") can leak those suppressed payloads after a crash if the process dies between enqueueDelivery and ackDelivery; on restart, recovery will treat them as normal outbound messages and send them to users.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 46581bc. enqueueDelivery now accepts and stores dispatchKind ("tool" / "block" / "final") in the outbox payload JSON. The reply dispatcher passes kind through at the call site.
During recoverPendingDeliveries, after loading pending entries, non-final entries (those with dispatchKind set and not "final") are marked failed_terminal with terminal_reason='non_final_recovery_skip' and excluded from recovery. This is safe because the turn recovery worker replays the entire turn, regenerating tool/block outputs — recovering them from the outbox would bypass channel-specific kind filtering (e.g. web channel suppresses non-final sends in the live path).
Entries without dispatchKind (legacy rows, or entries from deliverOutboundPayloads) are treated as final and recovered normally.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 46581bc267
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| export const slackOutbound: ChannelOutboundAdapter = { | ||
| deliveryMode: "direct", | ||
| outboundContract: "v2", |
There was a problem hiding this comment.
Keep text/media adapters on v1 until sendFinal is native
Fresh evidence in this commit is that core adapters like Slack are now marked outboundContract: "v2" even though they only implement sendText/sendMedia. That makes createPluginHandler treat them as native v2 and route any channelData payload through sendPayload/sendFinal; for inferred sendFinal, compat.ts only forwards the first media URL, so multi-media and chunked deliveries can be truncated when channelData is present (the fast path is taken whenever effectivePayload.channelData exists). Keep these adapters on v1 until they provide a real sendFinal that preserves fan-out/chunk semantics.
Useful? React with 👍 / 👎.
| } catch (err) { | ||
| const code = | ||
| err && typeof err === "object" && "code" in err | ||
| ? String((err as { code?: unknown }).code) | ||
| : null; | ||
| if (code !== "ENOENT") { | ||
| throw err; | ||
| } | ||
| // Already removed — no-op. | ||
| logVerbose(`delivery-queue: ackDelivery failed: ${String(err)}`); | ||
| } |
There was a problem hiding this comment.
Propagate ackDelivery failures instead of swallowing them
ackDelivery now catches and suppresses every DB error, but recovery still counts the entry as recovered immediately afterward. If the SQLite update fails (for example SQLITE_BUSY during concurrent access), the row remains pending and will be replayed on later worker ticks, producing duplicate outbound sends while logs report success. Letting this error propagate (or returning an explicit failure) would allow recovery to treat it as a failed attempt instead of silently requeueing duplicates.
Useful? React with 👍 / 👎.
…l and separate ackDelivery errors in recovery
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
7 similar comments
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
12 similar comments
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 03ced37e59
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| channel, | ||
| to, | ||
| accountId: params.ctx.AccountId?.trim() || undefined, | ||
| threadId: params.ctx.MessageThreadId, |
There was a problem hiding this comment.
Queue the effective thread target for crash recovery
In resolveDeliveryQueueContext, the queued threadId is copied only from ctx.MessageThreadId, but some channels compute the actual reply thread at send time; for example Slack with replyToMode: "first" resolves the first reply thread via replyPlan.nextThreadTs() in src/slack/monitor/message-handler/dispatch.ts while MessageThreadId remains unset for top-level messages. If the process crashes after enqueue and before send, recovery replays through deliverOutboundPayloads without a thread id and the recovered reply is posted to the channel root instead of the intended thread.
Useful? React with 👍 / 👎.
Summary
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
outboundContract: "v2"get write-ahead delivery; v1 plugins continue to work via compat layeroutboundContractfield appears in channel status/schemaSecurity Impact (required)
Repro + Verification
Environment
Steps
Expected
Actual
Evidence
Human Verification (required)
pnpm build,pnpm test(948 pass),pnpm checkCompatibility / Migration
Failure Recovery (if this breaks)
Risks and Mitigations
attempt_count > 0orlast_attempt_at IS NOT NULLare always includedoutboundContractfield on channel config; fallback to v1 wrapping when absentE2E Test Results
All 4 tests pass ✅
status='delivered'before=0, after=1attempt_count: 1→2(both entries)A=queued, B=failed_retryable (attempts: 1→2)Test script: https://gist.github.com/nohat/d98c213adfeaebf196e4137ed7d693c0