Summary
When cron announce delivery fails for multiple consecutive runs (e.g., due to misconfigured delivery target), the failed entries accumulate in the subagent registry. When the configuration is fixed, these entries drain one per heartbeat/trigger instead of being flushed sequentially in a single pass. Each trigger processes one entry, forcing the user to manually trigger N times to clear the backlog.
Reproduction
- Configure an isolated cron job with a broken delivery target (e.g.,
user:D... instead of channel:D...)
- Let 6+ scheduled runs execute — each produces output but delivery fails
- Fix the delivery target
- Observe: each subsequent trigger delivers one old result, not the full queue
- Must trigger 6+ times to see all past results + fresh output
Expected Behavior
When delivery starts working again, the system should flush the entire pending queue in one pass, oldest-to-newest. In normal operation the queue is 1 (the latest run). When delivery was broken, the queue accumulates — and once delivery recovers, all entries should be delivered sequentially in a single pass, preserving chronological order.
This is NOT a staleness problem — every run's output is valuable (it was the delivery that failed, not the content). The user expects to receive all missed results in order once the delivery issue is resolved.
Root Cause
In src/agents/subagent-registry.ts:
- One-at-a-time processing:
retryDeferredCompletedAnnounces (line 750-776) triggers resumeSubagentRun for entries, but only one entry gets processed per cycle
- Requires external trigger: Each retry needs a heartbeat tick or manual
cron run to process the next entry
- No queue drain loop: There's no mechanism to say "delivery is working now, flush all pending entries for this job"
Proposed Fix
When a deferred announce succeeds (delivery confirmed), immediately check for additional pending entries for the same cron job and process them in a loop until the queue is empty:
retryDeferredCompletedAnnounces:
for each pending entry (oldest first):
attempt delivery
if success → continue to next entry
if failure → stop (delivery still broken)
This gives the correct behavior:
- Normal operation: Queue is 1, delivers immediately
- Recovery after outage: Queue is N, flushes all N entries oldest-to-newest in one pass
- Partial recovery: If delivery breaks again mid-flush, stops at the failure point
Related
Summary
When cron announce delivery fails for multiple consecutive runs (e.g., due to misconfigured delivery target), the failed entries accumulate in the subagent registry. When the configuration is fixed, these entries drain one per heartbeat/trigger instead of being flushed sequentially in a single pass. Each trigger processes one entry, forcing the user to manually trigger N times to clear the backlog.
Reproduction
user:D...instead ofchannel:D...)Expected Behavior
When delivery starts working again, the system should flush the entire pending queue in one pass, oldest-to-newest. In normal operation the queue is 1 (the latest run). When delivery was broken, the queue accumulates — and once delivery recovers, all entries should be delivered sequentially in a single pass, preserving chronological order.
This is NOT a staleness problem — every run's output is valuable (it was the delivery that failed, not the content). The user expects to receive all missed results in order once the delivery issue is resolved.
Root Cause
In
src/agents/subagent-registry.ts:retryDeferredCompletedAnnounces(line 750-776) triggersresumeSubagentRunfor entries, but only one entry gets processed per cyclecron runto process the next entryProposed Fix
When a deferred announce succeeds (delivery confirmed), immediately check for additional pending entries for the same cron job and process them in a loop until the queue is empty:
This gives the correct behavior:
Related