[codex] add durable message lifecycle delivery#77205
Conversation
|
@clawsweeper autofix |
4fd6f75 to
af88d5e
Compare
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: not applicable. as a current-main bug reproduction: this PR is a broad feature/refactor. The PR body proves one dedupe code path, but not a current-head live durable delivery or recovery scenario. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this after a maintainer SDK/channel review and current-head runtime proof for at least one representative durable final-send or recovery/live-preview path. Do we have a high-confidence way to reproduce the issue? Not applicable as a current-main bug reproduction: this PR is a broad feature/refactor. The PR body proves one dedupe code path, but not a current-head live durable delivery or recovery scenario. Is this the best way to solve the issue? Unclear for merge readiness: the source and tests now cover the prior lifecycle correctness concerns, but this broad public SDK/channel change still needs maintainer review and current-head real behavior proof before it is the best path to merge. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against fcb188a41a48. |
|
🦞🔧 Source: I will update this PR branch, or open a safe credited replacement, if the repair worker finds a narrow CI fix. Automerge progress:
|
|
🦞🧹
This is fix-only; I will not merge this PR. |
ed178d3 to
50de0e6
Compare
50de0e6 to
dc20eb3
Compare
a68c719 to
40092a1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bd79a61b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ...deliveryParams, | ||
| payloads: rendered.payloads, | ||
| renderedBatchPlan: rendered.plan, | ||
| queuePolicy: "required", |
There was a problem hiding this comment.
Respect requested best-effort durability in send context
When withDurableMessageSendContext is created with durability: "best_effort", the actual send still hard-codes queuePolicy: "required", so queue-write failures throw instead of falling back to direct send. This makes the runtime behavior contradict the context’s declared durability and can unexpectedly fail deliveries in environments where the durable queue is temporarily unavailable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc4c9488f6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (queueId) { | ||
| if (hadPartialFailure) { | ||
| await failDelivery(queueId, "partial delivery failure (bestEffort)").catch(() => {}); | ||
| } else { | ||
| await ackDelivery(queueId).catch(() => {}); // Best-effort cleanup. | ||
| const acked = await ackDelivery(queueId) |
There was a problem hiding this comment.
Run afterCommit hooks for non-queued sends
When delivery falls back to direct send (queueId is null, e.g. queuePolicy: "best_effort" with queue write failure or explicit skipQueue), this branch skips runOutboundDeliveryCommitHooks(...) entirely, so channel send.lifecycle.afterCommit never runs even though the platform send succeeded. Plugins that use afterCommit to clear pending state can be left inconsistent specifically in fallback/recovery paths.
Useful? React with 👍 / 👎.
| await params.lifecycle.afterSendFailure?.({ | ||
| ...params.ctx, |
There was a problem hiding this comment.
Preserve original send error when failure cleanup fails
In the send lifecycle catch path, afterSendFailure is awaited without isolation, so if cleanup itself throws, that new error replaces the original send failure. This only appears when a send fails and failure cleanup also errors, but it obscures root-cause diagnostics and can misclassify queue failure reasons for retries/recovery.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3978dbbbc3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (result.status !== "failed") { | ||
| await ctx.commit(result.receipt); | ||
| } else { | ||
| await ctx.fail(result.error); |
There was a problem hiding this comment.
Preserve original send failure when failure hook throws
When outbound delivery fails, this path awaits ctx.fail(result.error) and will throw if onSendFailure itself errors, which replaces the original delivery failure and changes sendDurableMessageBatch from returning { status: "failed", error } to rejecting with cleanup-hook errors. This only appears when both send and failure cleanup fail, but it obscures root cause and breaks callers that rely on the structured failed result for retry/recovery decisions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54ac28e396
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const adapter = await loadChannelMessageAdapter(opts.entry.channel); | ||
| if (adapter?.durableFinal?.capabilities?.reconcileUnknownSend !== true) { | ||
| return null; |
There was a problem hiding this comment.
Bootstrap message adapter before unknown-send reconciliation
Recovery now calls loadChannelMessageAdapter directly for unknown_after_send entries, but this loader only reads the active registry and does not bootstrap bundled channel plugins. On startup, if the channel plugin has not been hydrated yet, this returns undefined, reconciliation is skipped, and the entry is moved to failed/ without replay, which can permanently drop deliveries after a crash. This path should bootstrap or otherwise ensure the channel message adapter is loaded before deciding reconciliation is unavailable.
Useful? React with 👍 / 👎.
| export { | ||
| createChannelReplyPipeline as createChannelMessageReplyPipeline, | ||
| createReplyPrefixContext, | ||
| resolveChannelSourceReplyDeliveryMode as resolveChannelMessageSourceReplyDeliveryMode, | ||
| } from "./channel-reply-core.js"; |
There was a problem hiding this comment.
Re-export reply-prefix helpers from channel-message facade
The new openclaw/plugin-sdk/channel-message facade re-exports createReplyPrefixContext but omits createReplyPrefixOptions and createTypingCallbacks, even though those helpers still exist in the message pipeline surface. Plugin code migrating off the deprecated channel-reply-pipeline subpath will hit missing-export compile/runtime errors for these helpers, creating an avoidable SDK compatibility regression.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cee371bf8a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const requiredCapabilities = | ||
| params.requiredCapabilities ?? | ||
| deriveDurableFinalDeliveryRequirements({ | ||
| payload: params.payload, | ||
| replyToId, | ||
| threadId, | ||
| silent: params.silent, |
There was a problem hiding this comment.
Require unknown-send reconciliation for required durability
deliverInboundReplyWithMessageSendContext derives durable requirements without requesting reconcileUnknownSend, so channels can pass preflight and enqueue queuePolicy: "required" sends even when they cannot reconcile unknown_after_send. In this commit, queue entries are marked unknown_after_send once platform send starts and recovery refuses blind replay without reconciliation (moving the entry to failed/), so a crash between send-start and ack can permanently drop a required delivery. This is currently reachable because no bundled production adapter declares reconcileUnknownSend capability.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c29246647
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } catch (ackErr) { | ||
| if (getErrnoCode(ackErr) === "ENOENT") { | ||
| return "already-gone"; | ||
| } | ||
| return "failed"; | ||
| } |
There was a problem hiding this comment.
Record retry metadata when ack after reconciliation fails
When reconcileUnknownSend returns sent but ackDelivery(...) throws a non-ENOENT error, this branch returns "failed" immediately without calling failDelivery(...). That leaves retryCount/lastAttemptAt unchanged, so the same entry is treated as first-attempt eligible on every recovery run and can loop indefinitely without backoff or max-retry cutoff if ack keeps failing (for example, persistent FS permission issues).
Useful? React with 👍 / 👎.
| function resolveThreadId( | ||
| params: DurableInboundReplyDeliveryParams, | ||
| ): string | number | null | undefined { | ||
| return params.threadId ?? params.ctxPayload.MessageThreadId; |
There was a problem hiding this comment.
Preserve explicit null thread override in durable delivery
This uses ?? to resolve the thread id, so an explicit threadId: null is discarded and replaced with ctxPayload.MessageThreadId. That prevents callers from intentionally clearing thread targeting in durable mode, and replies can be forced back into the inbound thread even when the delivery adapter requested no thread.
Useful? React with 👍 / 👎.
|
🦞✅ Source: I added |
|
🦞✅ Source: I added |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa7545e803
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| result, | ||
| ...(attemptToken !== undefined ? { attemptToken } : {}), | ||
| }; | ||
| await params.lifecycle.afterSendSuccess?.(successCtx); |
There was a problem hiding this comment.
Do not fail delivery when afterSendSuccess hook throws
If params.send() succeeds but afterSendSuccess throws, this path rethrows as if the platform send failed, so callers receive a failed send even though the message may already be delivered. In queued flows this can trigger unnecessary retries/recovery state changes, and in direct flows callers may retry manually, producing duplicate outbound messages; the hook failure should be isolated from the actual send outcome.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c33613d03d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await ackDelivery(entry.id, opts.stateDir); | ||
| opts.onRecovered?.(entry); | ||
| opts.log.info(`Delivery entry ${entry.id} reconciled unknown_after_send as already sent`); | ||
| return "recovered"; |
There was a problem hiding this comment.
Run afterCommit lifecycle on reconciled-sent recovery
When reconcileUnknownSend returns status: "sent", this branch acks the queue entry and returns immediately, but it never executes any commit lifecycle (afterCommit) for that delivery. That means adapters that rely on commit-time cleanup (for example, clearing pending outbound state after a crash) are skipped specifically in the send_attempt_started / unknown_after_send recovery path, leaving plugin state inconsistent even though recovery marks the delivery as recovered.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2dbc6958eb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fallbackOnMissingDependency: true, | ||
| fallbackOnNativeError: true, | ||
| }); |
There was a problem hiding this comment.
Preserve native loader errors instead of falling back unconditionally
Passing fallbackOnNativeError: true here causes every native require failure (including plugin runtime exceptions) to be treated as a cache miss and retried through the source-transform loader. When a plugin module executes side effects before throwing at top level, this path can execute those side effects twice and leave duplicate registrations or partially-mutated process state before ultimately failing. Restricting fallback to resolution/format errors avoids re-running modules that already failed during execution.
Useful? React with 👍 / 👎.
|
🦞✅ Source: I added |
Summary
This PR completes the message lifecycle refactor for durable final replies, message-tool sends, live-preview finalization, receipts, and receive acknowledgement policy. It moves bundled channel delivery paths onto
openclaw/plugin-sdk/channel-message, so core owns the shared lifecycle and plugins own platform-native I/O.Core now owns queue intent, retry/recovery state, rendered batch plans, message-sending hooks, receipts, commit/failure hooks, and SDK compatibility shims. Channel plugins own target normalization, native send/edit/delete calls, threading, quotes, silent delivery, pending IDs, account state, and platform-specific receipt construction.
Public API
The primary SDK surface is
openclaw/plugin-sdk/channel-message.Channel plugins express delivery through
defineChannelMessageAdapter(...):durableFinal.capabilities: declares the final-delivery features this adapter preserves.send.text(...),send.media(...), and optional payload/batch senders: perform platform I/O and return normalized receipts.send.lifecycle.beforeSendAttempt(...): runs before platform I/O and may return an opaque attempt token.send.lifecycle.afterSendSuccess(...): runs after successful platform I/O with the result, receipt, and attempt token.send.lifecycle.afterSendFailure(...): runs after failed/canceled sends without replacing the primary provider error.send.lifecycle.afterCommit(...): runs after queued ack, recovered commit, or direct best-effort fallback success.live.finalizer: owns live preview edit/delete/final-send transitions.receive: exposes receive context and acknowledgement policy. Plugins that keep platform acknowledgement outside core declaremanualpolicy.Capability keys include
text,media,replyTo,thread,silent,nativeQuote,messageSendingHooks,afterSendSuccess,afterCommit,payload,batch, andreconcileUnknownSend.Default final-reply delivery and prepared
message(action="send")use the shared lifecycle withbest_effortqueueing unless a caller explicitly requests required durability. Required durable delivery must preflightreconcileUnknownSendand the adapter must both declare the capability and implementdurableFinal.reconcileUnknownSend(...). This keeps bundled adapters reachable through the new lifecycle without falsely promising no-duplicate crash recovery.Compatibility
Existing SDK entry points remain importable for third-party plugins, but are deprecated compatibility wrappers:
openclaw/plugin-sdk/channel-reply-pipeline: useopenclaw/plugin-sdk/channel-message.createChannelTurnReplyPipeline(...): usecreateChannelMessageReplyPipeline(...)or amessageadapter.deliverDurableInboundReplyPayload(...): usedeliverInboundReplyWithMessageSendContext(...).dispatchInboundReplyWithBase(...): usedispatchChannelMessageReplyWithBase(...)for compatibility dispatchers.recordInboundSessionAndDispatchReply(...): userecordChannelMessageReplyDispatch(...)where the old shape is still needed.resolveChannelSourceReplyDeliveryMode(...): useresolveChannelMessageSourceReplyDeliveryMode(...).deliverFinalizableDraftPreview(...): usedefineFinalizableLivePreviewAdapter(...),deliverFinalizableLivePreview(...), ordeliverWithFinalizableLivePreviewAdapter(...).DraftPreviewFinalizerDraft/DraftPreviewFinalizerResult: useLivePreviewFinalizerDraft/LivePreviewFinalizerResult.createReplyPrefixContext(...),createReplyPrefixOptions(...), andcreateTypingCallbacks(...)are available through the message facade for compatibility dispatchers.Runtime Behavior
Real behavior proof
message(action="send")payloads should use the shared channel-message lifecycle without requiring every bundled adapter to implement unknown-send reconciliation; generic same-target reply dedupe should not hang the auto-reply shard by loading Feishu just to comparemessageplaceholder routes.codex/durable-message-lifecycle-design, PR head3c92b086b13a90cbcad0bd66455f252207e656f5.messageplaceholder route without loading a channel plugin for normalization, and Telegram's bundled reply-suppression fallback still matches topic replies when the active plugin registry omits Telegram.Validation
Additional proof on the current branch:
Testbox was warmed, but the run stayed queued and was stopped cleanly; the broad changed proof above was run locally with Testbox delegation disabled.
Not Tested
Live credentials for bundled messaging platforms were not exercised manually. Proof is from a real local code-path command, SDK checks, channel contract tests, owner-unit tests, durable queue tests, docs checks, and the changed gate.