Skip to content

fix(outbound): preserve retries for budget-deferred deliveries#91241

Merged
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
849261680:fix/91212-delivery-budget-deferral
Jun 8, 2026
Merged

fix(outbound): preserve retries for budget-deferred deliveries#91241
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
849261680:fix/91212-delivery-budget-deferral

Conversation

@849261680

@849261680 849261680 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Keep recovery-budget-deferred outbound deliveries pending without calling failDelivery.
  • Preserve retry accounting for real attempted delivery failures; only the unattempted budget-deferral path changes.
  • Update the recovery regression so maxRecoveryMs exhaustion does not consume retry budget.

Addresses the budget-deferral retry-accounting part of #91212. The broader Feishu transport-readiness restart behavior remains intentionally open for maintainer follow-up.

Verification

  • node scripts/run-vitest.mjs src/infra/outbound/delivery-queue.recovery.test.ts — 1 file passed, 22 tests passed.
  • pnpm format:check src/infra/outbound/delivery-queue-recovery.ts src/infra/outbound/delivery-queue.recovery.test.ts — passed.
  • node scripts/run-oxlint.mjs src/infra/outbound/delivery-queue-recovery.ts src/infra/outbound/delivery-queue.recovery.test.ts — passed.
  • .agents/skills/autoreview/scripts/autoreview --mode local — clean, no accepted/actionable findings.

Scope note: this intentionally fixes only the source-proven retry-accounting bug for unattempted budget deferrals. It does not add a fixed startup sleep, reset retry counts across restarts, or change Feishu transport-readiness policy.

Real behavior proof

Behavior addressed: pending outbound deliveries that are deferred because the startup recovery wall-clock budget is exhausted no longer have retryCount or lastAttemptAt changed when no delivery attempt ran.
Real environment tested: Local OpenClaw source checkout using the production delivery queue APIs with a temporary state directory and real queue persistence.
Exact steps or command run after this patch: node --import tsx script that enqueued two Feishu outbound deliveries, called recoverPendingDeliveries({ maxRecoveryMs: 0 }), and printed the pending queue state before and after recovery.
Evidence after fix: Captured terminal output:

{
  "result": {
    "recovered": 0,
    "failed": 0,
    "skippedMaxRetries": 0,
    "deferredBackoff": 0
  },
  "before": [
    {
      "id": "f8481ba9-9b34-4ee6-94ec-84f2dc2d791a",
      "retryCount": 0,
      "lastAttemptAt": null
    },
    {
      "id": "1f442292-c726-4591-8387-540b440e863e",
      "retryCount": 0,
      "lastAttemptAt": null
    }
  ],
  "after": [
    {
      "id": "f8481ba9-9b34-4ee6-94ec-84f2dc2d791a",
      "retryCount": 0,
      "lastAttemptAt": null
    },
    {
      "id": "1f442292-c726-4591-8387-540b440e863e",
      "retryCount": 0,
      "lastAttemptAt": null
    }
  ],
  "deliverCallCount": 0,
  "warning": "Recovery time budget exceeded — remaining entries deferred to next startup"
}

Observed result after fix: no delivery function was called, both queued entries remained pending, and their retry metadata stayed unchanged.
What was not tested: a live Feishu WebSocket restart scenario; this PR avoids transport readiness policy and fixes the generic queue retry-accounting path that is source-reproducible.

@openclaw-barnacle openclaw-barnacle Bot added size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 7, 2026
@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Codex review: passed. Reviewed June 7, 2026, 9:41 PM ET / 01:41 UTC.

Summary
The branch removes the budget-deferred failDelivery path from outbound recovery and updates the maxRecoveryMs regression expectation so unattempted deliveries keep retry counts at zero.

PR surface: Source -11, Tests -1. Total -12 across 2 files.

Reproducibility: yes. at source level: current main reaches failDelivery from the exhausted recovery-budget branch before any delivery attempt. I did not run tests in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

Maintainer options:

  1. Decide the mitigation before merge
    Land this narrow outbound queue accounting fix and keep the linked Feishu transport-readiness issue open for the remaining startup/readiness behavior.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] No repair lane is needed because the PR has no actionable findings; the remaining action is normal checks plus automerge or maintainer landing.

Security
Cleared: The diff changes only outbound recovery control flow and a focused test; it does not touch dependencies, workflows, secrets, package metadata, or external code execution surfaces.

Review details

Best possible solution:

Land this narrow outbound queue accounting fix and keep the linked Feishu transport-readiness issue open for the remaining startup/readiness behavior.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main reaches failDelivery from the exhausted recovery-budget branch before any delivery attempt. I did not run tests in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Is this the best way to solve the issue?

Yes. Removing the no-attempt failDelivery call is the narrowest owner-boundary fix for retry accounting; broader transport readiness should stay separate from this generic queue repair.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8b03fd1f5f83.

Label changes

Label changes:

  • add status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The PR body includes after-fix terminal output from a local source checkout using production delivery queue APIs and real queue persistence, showing no delivery call and unchanged retry metadata.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

  • P1: The PR addresses a restart recovery path that can consume retry budget for queued outbound messages without any send attempt, contributing to real message-loss workflows.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The PR body includes after-fix terminal output from a local source checkout using production delivery queue APIs and real queue persistence, showing no delivery call and unchanged retry metadata.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a local source checkout using production delivery queue APIs and real queue persistence, showing no delivery call and unchanged retry metadata.
Evidence reviewed

PR surface:

Source -11, Tests -1. Total -12 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 3 14 -11
Tests 1 2 3 -1
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 5 17 -12

What I checked:

  • Current main increments retries for no-attempt budget deferrals: On current main, the recovery deadline branch calls deferRemainingEntriesForBudget(pending.slice(i), opts.stateDir), and that helper maps each remaining entry to failDelivery, so entries can consume retry budget without any delivery attempt. (src/infra/outbound/delivery-queue-recovery.ts:632, 8b03fd1f5f83)
  • Retry accounting side effect: failDelivery is the storage operation that increments retryCount, records lastAttemptAt, and sets lastError, which makes the current-main budget deferral look like a real failed delivery attempt. (src/infra/outbound/delivery-queue-storage.ts:139, 8b03fd1f5f83)
  • PR head preserves pending entries without consuming retries: At the PR head, the deadline branch logs the deferral and breaks without calling failDelivery; attempted failures still flow through drainQueuedEntry and failDelivery later in the same module. (src/infra/outbound/delivery-queue-recovery.ts:618, aff2b9d16ea0)
  • Focused regression coverage: The updated recovery regression exercises maxRecoveryMs: 0, verifies no delivery function call, keeps all entries pending, and now expects retry counts [0, 0, 0]. (src/infra/outbound/delivery-queue.recovery.test.ts:618, aff2b9d16ea0)
  • Runtime caller checked: Gateway startup imports recoverPendingDeliveries and passes the real outbound delivery function, so the changed accounting path is on the startup recovery surface rather than a test-only helper. (src/gateway/server-runtime-services.ts:142, 8b03fd1f5f83)
  • History provenance for removed behavior: Commit b07312c55b5dba4cec37e339a3ed3d30310876cc introduced the deadline-deferred failDelivery retry increment; the current PR is a narrow reversal of that accounting choice for entries that were not attempted. (src/infra/outbound/delivery-queue.ts, b07312c55b5d)

Likely related people:

  • Stephen Schoettler: Authored the commit that added failDelivery for deadline-deferred outbound entries, which is the behavior this PR changes. (role: introduced budget-deferral retry accounting; confidence: high; commits: b07312c55b5d, 4e92807f10d1, 20f758d4cb13; files: src/infra/outbound/delivery-queue.ts, src/infra/outbound/delivery-queue-recovery.ts)
  • steipete: Committed the budget-deferral retry change and authored the later split from delivery-queue.ts into storage and recovery modules. (role: area refactor owner and merger/committer; confidence: high; commits: b07312c55b5d, 562e4a1791fb; files: src/infra/outbound/delivery-queue.ts, src/infra/outbound/delivery-queue-recovery.ts)
  • nabbilkhan: Authored the initial outbound delivery crash-recovery implementation that introduced the recovery surface this PR modifies. (role: original feature contributor; confidence: medium; commits: 207e2c5affa9; files: src/infra/outbound/delivery-queue.ts)
  • vincentkoc: Recent current-main history/blame for the recovered split files points to work touching the outbound recovery module and tests shortly before this PR. (role: recent area contributor; confidence: medium; commits: f36e54cd688b; files: src/infra/outbound/delivery-queue-recovery.ts, src/infra/outbound/delivery-queue.recovery.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. labels Jun 7, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 7, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. labels Jun 7, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=aff2b9d16ea0d365e20914b7bb04c97e13a2a650)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-08T01:43:03Z
Merge commit: 75c1790b504c

What merged:

  • The branch removes the budget-deferred failDelivery path from outbound recovery and updates the maxRecoveryMs regression expectation so unattempted deliveries keep retry counts at zero.
  • PR surface: Source -11, Tests -1. Total -12 across 2 files.
  • Reproducibility: yes. at source level: current main reaches failDelivery from the exhausted recovery-budge ... in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: fix(outbound): preserve retries for budget-deferred deliveries

The automerge loop is complete.

Automerge progress:

  • 2026-06-08 01:22:25 UTC review queued f3fa2a022e4b (queued)
  • 2026-06-08 01:34:43 UTC review queued aff2b9d16ea0 (after repair)
  • 2026-06-08 01:41:28 UTC review passed aff2b9d16ea0 (structured ClawSweeper verdict: pass (sha=aff2b9d16ea0d365e20914b7bb04c97e13a2a...)
  • 2026-06-08 01:43:05 UTC merged aff2b9d16ea0 (merged by ClawSweeper automerge)

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label Jun 8, 2026
@clawsweeper clawsweeper Bot force-pushed the fix/91212-delivery-budget-deferral branch from f3fa2a0 to aff2b9d Compare June 8, 2026 01:34
@clawsweeper clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 8, 2026
@clawsweeper clawsweeper Bot merged commit 75c1790 into openclaw:main Jun 8, 2026
166 of 168 checks passed
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 8, 2026
…law#91241)

Summary:
- The branch removes the budget-deferred `failDelivery` path from outbound recovery and updates the `maxRecoveryMs` regression expectation so unattempted deliveries keep retry counts at zero.
- PR surface: Source -11, Tests -1. Total -12 across 2 files.
- Reproducibility: yes. at source level: current main reaches `failDelivery` from the exhausted recovery-budge ...  in this read-only review, but the PR body also supplies terminal output showing the after-fix queue state.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(outbound): preserve retries for budget-deferred deliveries

Validation:
- ClawSweeper review passed for head aff2b9d.
- Required merge gates passed before the squash merge.

Prepared head SHA: aff2b9d
Review: openclaw#91241 (comment)

Co-authored-by: 宇宙熊Yzx <53250620+849261680@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: XS status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants