fix(outbound): preserve retries for budget-deferred deliveries by 849261680 · Pull Request #91241 · openclaw/openclaw

849261680 · 2026-06-07T20:32:58Z

Summary

Keep recovery-budget-deferred outbound deliveries pending without calling failDelivery.
Preserve retry accounting for real attempted delivery failures; only the unattempted budget-deferral path changes.
Update the recovery regression so maxRecoveryMs exhaustion does not consume retry budget.

Addresses the budget-deferral retry-accounting part of #91212. The broader Feishu transport-readiness restart behavior remains intentionally open for maintainer follow-up.

Verification

node scripts/run-vitest.mjs src/infra/outbound/delivery-queue.recovery.test.ts — 1 file passed, 22 tests passed.
pnpm format:check src/infra/outbound/delivery-queue-recovery.ts src/infra/outbound/delivery-queue.recovery.test.ts — passed.
node scripts/run-oxlint.mjs src/infra/outbound/delivery-queue-recovery.ts src/infra/outbound/delivery-queue.recovery.test.ts — passed.
.agents/skills/autoreview/scripts/autoreview --mode local — clean, no accepted/actionable findings.

Scope note: this intentionally fixes only the source-proven retry-accounting bug for unattempted budget deferrals. It does not add a fixed startup sleep, reset retry counts across restarts, or change Feishu transport-readiness policy.

Real behavior proof

Behavior addressed: pending outbound deliveries that are deferred because the startup recovery wall-clock budget is exhausted no longer have retryCount or lastAttemptAt changed when no delivery attempt ran.
Real environment tested: Local OpenClaw source checkout using the production delivery queue APIs with a temporary state directory and real queue persistence.
Exact steps or command run after this patch: node --import tsx script that enqueued two Feishu outbound deliveries, called recoverPendingDeliveries({ maxRecoveryMs: 0 }), and printed the pending queue state before and after recovery.
Evidence after fix: Captured terminal output:

{
  "result": {
    "recovered": 0,
    "failed": 0,
    "skippedMaxRetries": 0,
    "deferredBackoff": 0
  },
  "before": [
    {
      "id": "f8481ba9-9b34-4ee6-94ec-84f2dc2d791a",
      "retryCount": 0,
      "lastAttemptAt": null
    },
    {
      "id": "1f442292-c726-4591-8387-540b440e863e",
      "retryCount": 0,
      "lastAttemptAt": null
    }
  ],
  "after": [
    {
      "id": "f8481ba9-9b34-4ee6-94ec-84f2dc2d791a",
      "retryCount": 0,
      "lastAttemptAt": null
    },
    {
      "id": "1f442292-c726-4591-8387-540b440e863e",
      "retryCount": 0,
      "lastAttemptAt": null
    }
  ],
  "deliverCallCount": 0,
  "warning": "Recovery time budget exceeded — remaining entries deferred to next startup"
}

Observed result after fix: no delivery function was called, both queued entries remained pending, and their retry metadata stayed unchanged.
What was not tested: a live Feishu WebSocket restart scenario; this PR avoids transport readiness policy and fixes the generic queue retry-accounting path that is source-reproducible.

clawsweeper · 2026-06-07T20:35:13Z

Codex review: passed. Reviewed June 7, 2026, 9:41 PM ET / 01:41 UTC.

Summary
The branch removes the budget-deferred failDelivery path from outbound recovery and updates the maxRecoveryMs regression expectation so unattempted deliveries keep retry counts at zero.

PR surface: Source -11, Tests -1. Total -12 across 2 files.

Reproducibility: yes. at source level: current main reaches failDelivery from the exhausted recovery-budget branch before any delivery attempt. I did not run tests in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

[P1] This PR only fixes generic retry accounting for budget-deferred outbound recovery; the live Feishu transport-readiness restart behavior remains tracked in [Bug]: delivery-recovery 0 recovered / N failed after gateway restart — recovery starts before channel transport ready #91212.

Maintainer options:

Decide the mitigation before merge
Land this narrow outbound queue accounting fix and keep the linked Feishu transport-readiness issue open for the remaining startup/readiness behavior.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P2] No repair lane is needed because the PR has no actionable findings; the remaining action is normal checks plus automerge or maintainer landing.

Security
Cleared: The diff changes only outbound recovery control flow and a focused test; it does not touch dependencies, workflows, secrets, package metadata, or external code execution surfaces.

Review details

Best possible solution:

Land this narrow outbound queue accounting fix and keep the linked Feishu transport-readiness issue open for the remaining startup/readiness behavior.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main reaches failDelivery from the exhausted recovery-budget branch before any delivery attempt. I did not run tests in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Is this the best way to solve the issue?

Yes. Removing the no-attempt failDelivery call is the narrowest owner-boundary fix for retry accounting; broader transport readiness should stay separate from this generic queue repair.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8b03fd1f5f83.

Label changes

Label changes:

add status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The PR body includes after-fix terminal output from a local source checkout using production delivery queue APIs and real queue persistence, showing no delivery call and unchanged retry metadata.
remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

P1: The PR addresses a restart recovery path that can consume retry budget for queued outbound messages without any send attempt, contributing to real message-loss workflows.
rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The PR body includes after-fix terminal output from a local source checkout using production delivery queue APIs and real queue persistence, showing no delivery call and unchanged retry metadata.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a local source checkout using production delivery queue APIs and real queue persistence, showing no delivery call and unchanged retry metadata.

Evidence reviewed

PR surface:

Source -11, Tests -1. Total -12 across 2 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	3	14	-11
Tests	1	2	3	-1
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	2	5	17	-12

What I checked:

Current main increments retries for no-attempt budget deferrals: On current main, the recovery deadline branch calls deferRemainingEntriesForBudget(pending.slice(i), opts.stateDir), and that helper maps each remaining entry to failDelivery, so entries can consume retry budget without any delivery attempt. (src/infra/outbound/delivery-queue-recovery.ts:632, 8b03fd1f5f83)
Retry accounting side effect: failDelivery is the storage operation that increments retryCount, records lastAttemptAt, and sets lastError, which makes the current-main budget deferral look like a real failed delivery attempt. (src/infra/outbound/delivery-queue-storage.ts:139, 8b03fd1f5f83)
PR head preserves pending entries without consuming retries: At the PR head, the deadline branch logs the deferral and breaks without calling failDelivery; attempted failures still flow through drainQueuedEntry and failDelivery later in the same module. (src/infra/outbound/delivery-queue-recovery.ts:618, aff2b9d16ea0)
Focused regression coverage: The updated recovery regression exercises maxRecoveryMs: 0, verifies no delivery function call, keeps all entries pending, and now expects retry counts [0, 0, 0]. (src/infra/outbound/delivery-queue.recovery.test.ts:618, aff2b9d16ea0)
Runtime caller checked: Gateway startup imports recoverPendingDeliveries and passes the real outbound delivery function, so the changed accounting path is on the startup recovery surface rather than a test-only helper. (src/gateway/server-runtime-services.ts:142, 8b03fd1f5f83)
History provenance for removed behavior: Commit b07312c55b5dba4cec37e339a3ed3d30310876cc introduced the deadline-deferred failDelivery retry increment; the current PR is a narrow reversal of that accounting choice for entries that were not attempted. (src/infra/outbound/delivery-queue.ts, b07312c55b5d)

Likely related people:

Stephen Schoettler: Authored the commit that added failDelivery for deadline-deferred outbound entries, which is the behavior this PR changes. (role: introduced budget-deferral retry accounting; confidence: high; commits: b07312c55b5d, 4e92807f10d1, 20f758d4cb13; files: src/infra/outbound/delivery-queue.ts, src/infra/outbound/delivery-queue-recovery.ts)
steipete: Committed the budget-deferral retry change and authored the later split from delivery-queue.ts into storage and recovery modules. (role: area refactor owner and merger/committer; confidence: high; commits: b07312c55b5d, 562e4a1791fb; files: src/infra/outbound/delivery-queue.ts, src/infra/outbound/delivery-queue-recovery.ts)
nabbilkhan: Authored the initial outbound delivery crash-recovery implementation that introduced the recovery surface this PR modifies. (role: original feature contributor; confidence: medium; commits: 207e2c5affa9; files: src/infra/outbound/delivery-queue.ts)
vincentkoc: Recent current-main history/blame for the recovered split files points to work touching the outbound recovery module and tests shortly before this PR. (role: recent area contributor; confidence: medium; commits: f36e54cd688b; files: src/infra/outbound/delivery-queue-recovery.ts, src/infra/outbound/delivery-queue.recovery.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Takhoffman · 2026-06-08T01:22:25Z

@clawsweeper automerge

clawsweeper · 2026-06-08T01:22:28Z

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=aff2b9d16ea0d365e20914b7bb04c97e13a2a650)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-08T01:43:03Z
Merge commit: 75c1790b504c

What merged:

The branch removes the budget-deferred failDelivery path from outbound recovery and updates the maxRecoveryMs regression expectation so unattempted deliveries keep retry counts at zero.
PR surface: Source -11, Tests -1. Total -12 across 2 files.
Reproducibility: yes. at source level: current main reaches failDelivery from the exhausted recovery-budge ... in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Automerge notes:

PR branch already contained follow-up commit before automerge: fix(outbound): preserve retries for budget-deferred deliveries

The automerge loop is complete.

Automerge progress:

2026-06-08 01:22:25 UTC review queued f3fa2a022e4b (queued)

2026-06-08 01:23:22 UTC repair queued f3fa2a022e4b (autonomous) Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288

2026-06-08 01:25:09 UTC repair started (running) in 1s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 automerge-openclaw-openclaw-91241

2026-06-08 01:25:31 UTC validation plan (passed) in 23s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 pnpm check:changed; pnpm lint; pnpm check:test-types

2026-06-08 01:25:59 UTC Codex write preflight (passed) in 51s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 danger-full-access

2026-06-08 01:31:38 UTC Codex edit 1 752ae02a649d (complete) in 6m 30s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 exit 0

2026-06-08 01:34:37 UTC validation and review 1 aff2b9d16ea0 (base moved) in 9m 29s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 rebased

2026-06-08 01:34:43 UTC repair completed aff2b9d16ea0 (branch updated) in 9m 35s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 initial automerge rebase is delegated to Codex repair

2026-06-08 01:34:43 UTC review queued aff2b9d16ea0 (after repair)

2026-06-08 01:43:15 UTC automerge wait aff2b9d16ea0 (merged) in 18m 7s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 pull request already merged

2026-06-08 01:41:28 UTC review passed aff2b9d16ea0 (structured ClawSweeper verdict: pass (sha=aff2b9d16ea0d365e20914b7bb04c97e13a2a...)

2026-06-08 01:43:05 UTC merged aff2b9d16ea0 (merged by ClawSweeper automerge)

2026-06-08 01:43:16 UTC repair finished aff2b9d16ea0 (pushed) in 18m 8s Run: https://github.com/openclaw/clawsweeper/actions/runs/27110873288 repair_contributor_branch

…law#91241) Summary: - The branch removes the budget-deferred `failDelivery` path from outbound recovery and updates the `maxRecoveryMs` regression expectation so unattempted deliveries keep retry counts at zero. - PR surface: Source -11, Tests -1. Total -12 across 2 files. - Reproducibility: yes. at source level: current main reaches `failDelivery` from the exhausted recovery-budge ... in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(outbound): preserve retries for budget-deferred deliveries Validation: - ClawSweeper review passed for head aff2b9d. - Required merge gates passed before the squash merge. Prepared head SHA: aff2b9d Review: openclaw#91241 (comment) Co-authored-by: 宇宙熊Yzx <53250620+849261680@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>

openclaw-barnacle Bot added size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 7, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 7, 2026

clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label Jun 8, 2026

849261680 and others added 2 commits June 8, 2026 01:34

fix(outbound): preserve retries for budget-deferred deliveries

1858898

fix(outbound): preserve retries for budget-deferred deliveries

aff2b9d

clawsweeper Bot force-pushed the fix/91212-delivery-budget-deferral branch from f3fa2a0 to aff2b9d Compare June 8, 2026 01:34

clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 8, 2026

clawsweeper Bot merged commit 75c1790 into openclaw:main Jun 8, 2026
166 of 168 checks passed

Haderach-Ram mentioned this pull request Jun 8, 2026

Ecosystem Digest — 2026-06-08 Haderach-Ram/openclaw-radar#32

Open

github-actions Bot mentioned this pull request Jun 8, 2026

📡 Upstream Digest — 2026-06-08 02:48 UTC curtismercier/openclaw-mods#1036

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(outbound): preserve retries for budget-deferred deliveries#91241

fix(outbound): preserve retries for budget-deferred deliveries#91241
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
849261680:fix/91212-delivery-budget-deferral

849261680 commented Jun 7, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Takhoffman commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

849261680 commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Takhoffman commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

849261680 commented Jun 7, 2026 •

edited

Loading

clawsweeper Bot commented Jun 7, 2026 •

edited

Loading

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading