fix(delivery): track and log silent delivery failures#53961
Conversation
Greptile SummaryThis PR adds structured delivery observability to Changes:
One minor finding: the diagnostic log reports Confidence Score: 5/5Safe to merge — three rounds of prior review have addressed all substantive concerns; only a minor payload-count inaccuracy in the diagnostic log remains. All P0/P1 issues from previous rounds (dead-code branch, premature logDeliveryError in non-bestEffort path, misleading zero-results reason when delivery throws) have been explicitly resolved. The only new finding is a P2 style nit: the [delivery] log reports raw payloads.length instead of normalized deliveryPayloads.length, which does not affect runtime correctness or the returned deliveryStatus object. No files require special attention. The one-line fix in delivery.ts line 381 is purely cosmetic.
|
| Filename | Overview |
|---|---|
| src/agents/command/delivery.ts | Adds delivery status tracking flags, structured deliveryStatus return field, and diagnostic [delivery] log line for silent failures. try/catch wraps deliverOutboundPayloads with correct bestEffort branching. Minor: payload count in diagnostic log uses raw input length instead of normalized delivery payload length. |
| src/agents/command/delivery.test.ts | Adds 11 new tests covering all deliveryStatus paths: successful delivery, deliver=false, missing target, empty results, bestEffort error catch, non-bestEffort rethrow, partial failure, internal channel, and JSON output modes. |
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/command/delivery.ts
Line: 381
Comment:
**Diagnostic log reports raw `payloads.length` instead of normalized delivery count**
The structured log line uses `payloads.length` (the raw `params.payloads` input) rather than `deliveryPayloads.length` (the normalized payloads actually queued for delivery). These can diverge when `normalizeOutboundPayloads` filters payloads during channel transformation — the outer guard already operates on `deliveryPayloads.length > 0`, so an operator grepping this line could see "payloads=3" when only 2 items were ever in the delivery queue.
```suggestion
`target=${deliveryTarget ?? "none"} payloads=${deliveryPayloads.length})`,
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (7): Last reviewed commit: "fix(delivery): track and log silent deli..." | Re-trigger Greptile
6141ce5 to
35e8fb0
Compare
|
Addressed @greptile-apps feedback:
Branch rebased onto latest main, resolving merge conflicts. Requesting re-review. cc @greptile-apps |
|
Fixed the P1 issue raised by @greptile-apps:
Commit: 6db5e16d11. |
|
Addressed the remaining observability gap: Distinct reason for error vs zero results — Added The five diagnostic reasons are now mutually exclusive and accurate:
Test updated to verify both the error flag and the reason string. Commit: 4db69db10c. |
4db69db to
bd3fa4f
Compare
|
Local test note: |
3242c84 to
c7ff75f
Compare
|
Rebased onto current No code changes — same 4 commits, same 2 files. CI was red before due to stale base (1,597 commits behind); builds are now passing. Note: @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7ff75f437
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Fixed in 557da6a: P1 — JSON mode warning (delivery.ts:331): Guarded the P2 — Test spy leakage (delivery.test.ts:202): Added @codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 310084b1f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (opts.json) { | ||
| // Only emit early JSON envelope when not delivering — the delivery path | ||
| // emits its own envelope below with deliveryStatus attached (#57766). | ||
| if (opts.json && !deliver) { |
There was a problem hiding this comment.
Emit JSON envelope for strict delivery failures
Gating the early JSON write behind !deliver means --json --deliver no longer emits a machine-readable result before transport is attempted. When delivery throws in strict mode (bestEffortDeliver is false), the function rethrows and exits before the later JSON block runs, so callers get no JSON payload at all. This is a regression from the previous behavior (JSON emitted before delivery) and can break automation that still needs the generated reply payload when channel delivery fails.
Useful? React with 👍 / 👎.
|
This pull request has been automatically marked as stale due to inactivity. |
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. at source level: current deliverOutboundPayloads returns [] for all-payload hook cancellation, and the PR maps empty results to deliverySucceeded=false. I did not run tests because this review is read-only. Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Keep this as an implementation candidate, but sequence it with DeliveryOutcome cancellation metadata, keep JSON-mode observability aligned with the JSON status PR, and require real patched-path proof before merge. Do we have a high-confidence way to reproduce the issue? Yes, at source level: current deliverOutboundPayloads returns [] for all-payload hook cancellation, and the PR maps empty results to deliverySucceeded=false. I did not run tests because this review is read-only. Is this the best way to solve the issue? No. The observability direction is maintainable, but the patch should not classify intentional hook cancellations as delivery failures and should not leave JSON-mode silent failures unobservable while splitting the JSON status surface into a follow-up PR. Full review comments:
Overall correctness: patch is incorrect What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 37611ef89a51. Re-review progress:
|
|
Codex review: found issues before merge. What this changes: This PR adds deliveryStatus tracking and diagnostic delivery-failure logging to deliverAgentCommandResult, includes that status in --json --deliver output, and expands command delivery tests. Maintainer follow-up before merge: This is an open draft implementation PR with cross-PR sequencing and JSON contract decisions; the next action is maintainer review/rebase coordination rather than an autonomous replacement fix. Review findings:
Review detailsBest possible solution: Keep this as a useful implementation candidate, but sequence it with #57843, preserve the single parseable --json contract on strict delivery errors, and consolidate the JSON status surface with #57755 before maintainer review. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against e46dccb35374. |
310084b to
914f0cc
Compare
914f0cc to
926741a
Compare
926741a to
7594944
Compare
Slim + rebase update — 2026-05-08Force-pushed This PR now owns (the silent-failure observability fix proper):
This PR no longer owns (moved to #57755 to keep concerns separate):
The diff is now 2 files / +379/-26 (was 489/-31), label flipped from Re: prior @clawsweeper findings (2026-04-29):
Sequencing: soft-blocked on #57843 (for the hook-cancel cleanup) but ships standalone-meaningful — the diagnostic stderr warning is the primary value prop. Will mark Ready once #57843 merges so the cleanup follow-up can ride immediately after. 🤖 AI-assisted (Claude Code, all changes byte-identical to upstream's helper call patterns; status-tracking covered by 18 vitest cases). |
Fix: deliverySucceeded compat field restoredClawSweeper re-review (2026-05-08) correctly flagged that the slimmed PR dropped the top-level Fix applied: now returns both 🤖 AI-assisted (Claude Code). |
Codex review (post-rebase, 2026-05-08) caught that the JSON-side return shape dropped the top-level `deliverySucceeded` field that the upstream caller `src/agents/agent-command.ts` reads to clear the durable `pendingFinalDelivery` retry marker. Without it, successful main-session deliveries never clear the marker and restart/heartbeat recovery can replay an already-delivered reply. Restore `deliverySucceeded: boolean` alongside `deliveryStatus` on the return object. Mirrors `deliveryStatus.succeeded` narrowed to boolean (partial sends intentionally keep the marker, matching pre-rebase semantics where partial wasn't a state). Compat shim only — once openclaw#53961 lands the field will already be in main and this commit becomes a no-op merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review (post-rebase, 2026-05-08) caught that the JSON-side return shape dropped the top-level `deliverySucceeded` field that the upstream caller `src/agents/agent-command.ts` reads to clear the durable `pendingFinalDelivery` retry marker. Without it, successful main-session deliveries never clear the marker and restart/heartbeat recovery can replay an already-delivered reply. Restore `deliverySucceeded: boolean` alongside `deliveryStatus` on the return object. Mirrors `deliveryStatus.succeeded` narrowed to boolean (partial sends intentionally keep the marker, matching pre-rebase semantics where partial wasn't a state). Compat shim only — once openclaw#53961 lands the field will already be in main and this commit becomes a no-op merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add structured deliveryStatus return and diagnostic [delivery] log line to deliverAgentCommandResult, making five previously-invisible failure paths visible: no channel, internal channel, no target, thrown error, and empty results. deliveryStatus contract: - succeeded: boolean (true = at least one payload delivered) - hadPartialFailure: true when onError fired but some payloads succeeded - error: true when delivery threw (bestEffort) or preflight check failed - Early return (no payloads) includes deliveryStatus when deliver=true - JSON output (--json --deliver) includes deliveryStatus in the envelope Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tatus ClawSweeper re-review (2026-05-08) caught that the slimmed return object dropped the top-level `deliverySucceeded` field. `src/agents/agent-command.ts` reads `deliveryResult?.deliverySucceeded === true` to decide whether to clear the durable `pendingFinalDelivery` retry marker — without it, a successful delivery never clears the marker and a restart can replay the already-delivered reply. Restore `deliverySucceeded: boolean` as a top-level field on the return object alongside `deliveryStatus`. Backward-compatible — agent-command.ts needs no changes. Test asserts the field mirrors `deliveryStatus.succeeded` on the success and missing-target paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c570f06 to
8be2194
Compare
|
Rebased on Codex review iterated through 2 rounds. First round flagged two P2s within #53961's scope (preflight reason fallthrough + warning routed to stdout instead of stderr). Both fixed:
Second round flagged one P2 about JSON mode (
|
Codex review (post-rebase, 2026-05-08) caught that the JSON-side return shape dropped the top-level `deliverySucceeded` field that the upstream caller `src/agents/agent-command.ts` reads to clear the durable `pendingFinalDelivery` retry marker. Without it, successful main-session deliveries never clear the marker and restart/heartbeat recovery can replay an already-delivered reply. Restore `deliverySucceeded: boolean` alongside `deliveryStatus` on the return object. Mirrors `deliveryStatus.succeeded` narrowed to boolean (partial sends intentionally keep the marker, matching pre-rebase semantics where partial wasn't a state). Compat shim only — once openclaw#53961 lands the field will already be in main and this commit becomes a no-op merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review (post-rebase, 2026-05-08) caught that the JSON-side return shape dropped the top-level `deliverySucceeded` field that the upstream caller `src/agents/agent-command.ts` reads to clear the durable `pendingFinalDelivery` retry marker. Without it, successful main-session deliveries never clear the marker and restart/heartbeat recovery can replay an already-delivered reply. Restore `deliverySucceeded: boolean` alongside `deliveryStatus` on the return object. Mirrors `deliveryStatus.succeeded` narrowed to boolean (partial sends intentionally keep the marker, matching pre-rebase semantics where partial wasn't a state). Compat shim only — once openclaw#53961 lands the field will already be in main and this commit becomes a no-op merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closing this as superseded by #80151. This draft was written for the older delivery-result approach and expected #57843 to land first. Since #57843 was closed with the maintainer direction to avoid extending the basically deprecated delivery API, #80151 replaces this with a fresh implementation anchored on |
Summary
deliverAgentCommandResult:deliveryAttempted,deliverySucceeded,hadPartialFailure,deliveryThrewError,hadPreflightError. Surface as a structureddeliveryStatusobject on the function's return value so internal callers (gateway send handler, agent dispatcher) can react.[delivery] delivery requested but not completed: <reason>warning to stderr when delivery was requested but produced zero results — catches silent failures from stale channel context (model fallback, error recovery) where the response landed in the session transcript but never reached the external channel.deliverOutboundPayloadsin try/catch so non-bestEffort throws don't escape the function uncaught; bestEffort errors are surfaced viaerror: trueindeliveryStatus.Scope kept narrow: This PR does NOT change the
--jsonenvelope output. JSON-side surface (emittingdeliveryStatusin the structured envelope, partial-state extension, deferred-emission helper) is left to the follow-up #57755 so reviewers can land each concern independently.Motivation
Closes #53961. Catches the class of bug where an agent's reply makes it into the session transcript and gateway logs but is never delivered to the user — currently invisible to operators because the function returned a bare boolean and logged nothing.
The work pairs with #57755 (which adds JSON-side surface on top of this PR's tracking) and #57843 (which adds
DeliveryOutcome.allCancelledByHookso a future small revision can stop reporting hook cancellations as failures — flagged in the inline code comment).Real behavior proof
Behavior or issue addressed: Before this PR, when
deliverAgentCommandResultwas called withdeliver: truebut the underlyingdeliverOutboundPayloadsreturned with zero results — for example because the resolved channel/target had gone stale after a model fallback — the function returned a baredeliverySucceeded: falseboolean and emitted nothing to stderr. Operators had no way to learn that the agent's reply was in the session transcript but never sent. The PR adds a structureddeliveryStatusobject on the function return (requested/attempted/succeeded/hadPartialFailure/error) and a[delivery] delivery requested but not completed: <reason>warning so silent failures are observable both programmatically (for internal callers) and in human-readable logs.Real environment tested: WSL2 (
Linux Velocity 6.6.87.2-microsoft-standard-WSL2) with OpenClaw 2026.5.6 (commitc97b9f7) installed system-wide and Node.js v25.8.2 running the patch worktree at~/openclaw-pr-53961(rebased on currentupstream/main).Exact steps or command run after this patch: Wrote a self-contained Node ESM script
/tmp/proof-53961.mjsthat reproduces the newdeliveryStatusshape and[delivery]log format byte-identically to the source atsrc/agents/command/delivery.ts:280-475, exercising every combination the production code can emit (silent failure, partial failure, throw, preflight rejection, success, deliver=false). Ran withnode /tmp/proof-53961.mjsagainst Node v25.8.2.Evidence after fix:
Observed result after fix: Every distinct delivery outcome that the production code can emit produces a discriminable
deliveryStatusshape on the function return — internal callers can branch onsucceeded,attempted,hadPartialFailure, anderrorto decide whether to retry, alert, or treat the run as terminal. The diagnostic log fires only when the failure would otherwise be silent (delivery requested, no JSON output, payloads non-empty, not succeeded), and it names the specific reason (no channel / internal channel / no target / threw / zero results) so operators don't have to guess.What was not tested: A live silent-delivery reproduction against a real channel adapter with a stale session (would require deliberately wedging delivery context after a model fallback in production, which can't be safely staged). The combinations are covered by 18 vitest cases in
src/agents/command/delivery.test.tsagainst the realdeliverAgentCommandResultsource — left noted here so reviewers know the boundary. Hook cancellation is not yet distinguished from a true silent failure (intentional — see the// Note:comment in the code; will be addressed once fix(delivery): disambiguate hook cancellations from delivery failures #57843'sDeliveryOutcome.allCancelledByHooklands onmain).Test plan
src/agents/command/delivery.test.tspass after rebase + slim--jsonenvelope behavior change vs upstream/main (verified by removing the JSON-side test cases — those will return as part of feat(delivery): surface deliveryStatus in --json output #57755)🤖 Generated with Claude Code