Skip to content

feat(outbox): SQLite outbox with write-ahead delivery, recovery worker, and sendFinal-required adapters#29953

Closed
nohat wants to merge 11 commits intoopenclaw:mainfrom
nohat:lifecycle/write-ahead-outbox
Closed

feat(outbox): SQLite outbox with write-ahead delivery, recovery worker, and sendFinal-required adapters#29953
nohat wants to merge 11 commits intoopenclaw:mainfrom
nohat:lifecycle/write-ahead-outbox

Conversation

@nohat
Copy link
Contributor

@nohat nohat commented Feb 28, 2026

Summary

Combines the scope of old PRs #29147 and #29148 into a single, cleaned-up PR.

Key changes from the old PRs:

  • outboundContract v1/v2 concept removed entirely — sendFinal is now required on all ChannelOutboundAdapters
  • Plugin compat layer simplified (no inference, no legacy warnings)
  • src/plugin-sdk/outbound-adapter.ts deleted
  • All 22+ extension adapters updated with sendFinal
  • ackDelivery errors separated from send errors in recovery

Carries forward from old PRs:

  • SQLite migration for delivery queue (file-based → WAL-backed outbox table)
  • Write-ahead pattern: outbox entry created before AI streaming, marked delivered on success
  • Continuous recovery worker with exponential backoff
  • Startup cutoff filter (only recover entries from current boot)
  • Comprehensive test coverage for outbox, recovery, and adapter compat

Closes #22376, #9208, #14827, #29126, #23777, #16555, #29128
Related: #28941

Test plan

  • pnpm build passes
  • pnpm test passes (outbox, delivery, recovery, plugin compat tests)
  • Manual: send messages through multiple channels, verify delivery
  • Manual: kill gateway mid-stream, restart, verify recovery picks up pending messages
  • Verify extension adapters compile with sendFinal required

🤖 Generated with Claude Code

nohat and others added 11 commits February 28, 2026 09:34
Replace unbounded file-based delivery queue with queryable SQLite
message_outbox table. Adds TTL/expiry for stale entries, delivery
outcome retention, and one-time legacy file queue import on startup.

Closes openclaw#23777, openclaw#16555, openclaw#29128
…pat layer

Write-ahead delivery pattern: enqueue outbox entry before sending, ack on
success, retry on failure. Continuous outbox worker replaces one-shot
recovery. Plugin channels get durable delivery guarantees via v1/v2
adapter compat layer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l and separate ackDelivery errors in recovery
@openclaw-barnacle openclaw-barnacle bot added channel: bluebubbles Channel integration: bluebubbles channel: discord Channel integration: discord channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: signal Channel integration: signal channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser gateway Gateway runtime commands Command implementations labels Feb 28, 2026
@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

19 similar comments
@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link

Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f4421ef55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 161 to 163
} catch (err) {
const code =
err && typeof err === "object" && "code" in err
? String((err as { code?: unknown }).code)
: null;
if (code !== "ENOENT") {
throw err;
}
// Already removed — no-op.
logVerbose(`delivery-queue: ackDelivery failed: ${String(err)}`);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Surface ackDelivery write failures to recovery loop

When SQLite writes fail during ackDelivery (for example, a transient SQLITE_BUSY from another process), this catch block logs and returns success, so the recovery pass treats the send as recovered while the row stays queued. On the next outbox pass, the same pre-start entry is still eligible and gets sent again, causing duplicate outbound messages until an ack eventually succeeds. The recovery path already tries to handle ack failures separately, but this swallow prevents that path from running.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

This PR implements a SQLite-based outbox pattern with write-ahead delivery, recovery worker, and makes sendFinal required on all channel adapters. Major architectural improvement for reliable message delivery.

Key Changes

  • Outbox table: Migrated from file-based queue to SQLite with WAL mode for durability
  • Recovery worker: Continuous background worker with exponential backoff and startup cutoff filter
  • Adapter contract: sendFinal now required on all ChannelOutboundAdapters (22+ extensions updated)
  • Error handling: Separated ackDelivery errors from send errors in recovery path

Critical Issue Found

  • Double queueing bug (reply-dispatcher.ts:175-215): Messages sent through dispatcher are enqueued twice - once by dispatcher, once by deliverOutboundPayloads. This creates duplicate outbox entries and wastes storage/processing.

Performance Concerns

  • dispatchKind stored in JSON payload instead of table column, requiring all pending entries to be loaded and parsed during recovery. Adding a column would enable SQL-level filtering.

Positive Notes

  • Write-ahead logging pattern correctly separates ack failures from send failures (delivery-queue.ts:548-554)
  • Non-final entries properly filtered during recovery to avoid duplicate sends (delivery-queue.ts:471-496)
  • Comprehensive test coverage for core outbox and recovery logic
  • Recovery worker properly integrated with gateway lifecycle

Confidence Score: 2/5

  • This PR has a critical double-queueing bug that will create duplicate outbox entries for every message
  • Score reflects the critical double-queueing bug in reply-dispatcher that causes every message to be enqueued twice. While the architecture is sound and test coverage is good, this bug will cause storage bloat and potential delivery issues in production.
  • Pay close attention to src/auto-reply/reply/reply-dispatcher.ts for the double queueing issue and verify the fix before merging

Last reviewed commit: 7f4421e

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

65 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +477 to +496
for (const entry of pending) {
if (entry.dispatchKind && entry.dispatchKind !== "final") {
try {
db2
.prepare(
`UPDATE message_outbox
SET status='failed_terminal',
error_class='terminal',
terminal_reason='non_final_recovery_skip',
completed_at=?
WHERE id=?`,
)
.run(Date.now(), entry.id);
} catch {
// non-fatal
}
continue;
}
finalOnly.push(entry);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatchKind is stored in JSON payload, not as a column. recovery loads ALL pending entries, then filters in memory. storing dispatchKind as a column with an index would allow filtering in SQL: WHERE (dispatch_kind IS NULL OR dispatch_kind = 'final')

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/outbound/delivery-queue.ts
Line: 477-496

Comment:
`dispatchKind` is stored in JSON payload, not as a column. recovery loads ALL pending entries, then filters in memory. storing `dispatchKind` as a column with an index would allow filtering in SQL: `WHERE (dispatch_kind IS NULL OR dispatch_kind = 'final')`

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +48 to +66
CREATE TABLE IF NOT EXISTS message_outbox (
id TEXT NOT NULL PRIMARY KEY,
turn_id TEXT,
channel TEXT NOT NULL,
account_id TEXT NOT NULL DEFAULT '',
target TEXT NOT NULL DEFAULT '',
payload TEXT NOT NULL,
queued_at INTEGER NOT NULL,
status TEXT NOT NULL DEFAULT 'queued',
attempt_count INTEGER NOT NULL DEFAULT 0,
next_attempt_at INTEGER NOT NULL,
last_attempt_at INTEGER,
last_error TEXT,
error_class TEXT,
delivered_at INTEGER,
terminal_reason TEXT,
completed_at INTEGER,
idempotency_key TEXT
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema missing dispatch_kind column but the field is used in delivery-queue.ts (stored in JSON payload). consider adding: dispatch_kind TEXT for efficient filtering during recovery

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/message-lifecycle/db.ts
Line: 48-66

Comment:
schema missing `dispatch_kind` column but the field is used in delivery-queue.ts (stored in JSON payload). consider adding: `dispatch_kind TEXT` for efficient filtering during recovery

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Additional Comments (1)

src/auto-reply/reply/reply-dispatcher.ts
dispatcher enqueues here, then calls options.deliver which eventually calls deliverOutboundPayloads (deliver.ts:274-288), which enqueues again unless skipQueue: true. this creates duplicate outbox entries for every message sent through the dispatcher

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/reply-dispatcher.ts
Line: 175-215

Comment:
dispatcher enqueues here, then calls `options.deliver` which eventually calls `deliverOutboundPayloads` (deliver.ts:274-288), which enqueues again unless `skipQueue: true`. this creates duplicate outbox entries for every message sent through the dispatcher

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: bluebubbles Channel integration: bluebubbles channel: discord Channel integration: discord channel: feishu Channel integration: feishu channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: irc channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: signal Channel integration: signal channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: twitch Channel integration: twitch channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser commands Command implementations gateway Gateway runtime size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Telegram sendMessage fails during gateway restart — messages lost with no retry

1 participant