Fix in-session reminder delivery confirmation (current-session reminders)#1387
Merged
Aaronontheweb merged 5 commits intoJun 11, 2026
Merged
Conversation
DeliveryRequired current-session reminders confirmed delivery via a fire-and-forget EventStream broadcast that the execution actor awaited inside RunTask. RunTask suspends the actor mailbox until the task completes, so the actor could never process the ReminderDeliveryObserved message it was waiting for. Existing tests passed only weakly (asserting "no failure yet" at t0) or via the timeout, so this was never caught. The result: every such reminder fell through to the 1-hour timeout, reported failure, and auto-disabled after 5 consecutive "failures" — regardless of whether the post actually succeeded. Separately, Discord and Mattermost set their per-turn delivered flag unconditionally, even when the post threw, so a failed post was reported as a successful delivery. Slack and SignalR already gated on real success. Replace the EventStream broadcast (the codebase's only use of it) with a point-to-point ReminderDeliveryResult told directly to the execution actor, carried as MessageSource.DeliveryObserver. The result reports success OR failure, so a failed delivery redelivers immediately instead of waiting out the timeout, which is demoted to a crash backstop. The execution actor now waits via actor state + a scheduled backstop message rather than awaiting inside RunTask, so the confirmation is actually received. - Gate Discord/Mattermost delivered flag on real post success - Add MessageSource.DeliveryObserver (ephemeral, never persisted) - All four binding actors tell ReminderDeliveryResult on turn completion - Cross-channel contract tests for both success and failed-post delivery - Execution-actor fast-fail test (explicit failure, not timeout) Known gap: SignalR's new observer tell has no dedicated test (not in the contract harness; faked IHubContext fixture is disproportionate).
Addresses defects found reviewing the point-to-point delivery-confirmation change: - Restart loop: a current-session reminder re-run from the deferred queue runs with a null envelope (the concurrency gate already acked+dropped it). The success path's TryAckEnvelopeAsync called AckAsync(null), which threw inside an unguarded RunTask and escalated to a supervisor restart, re-posting the reminder in a loop. Skip the ack when the envelope is null, and wrap the ack RunTask so a fault reports failure instead of restarting. - Concurrent-reminder clobber: the single per-actor _reminderDeliveryObserver field was overwritten when two reminders targeted the same session before the first turn completed, misrouting the first result and stranding both on the backstop. Key observers by reminder delivery id (Dictionary) in all four channels so each turn resolves its own observer on TurnCompleted. - SignalR result drop: HandleDeliveryResult gated on an exact ChannelType match layered on an already-unique point-to-point Self ref. A cold SignalR actor reports its default Tui channel, mismatching the origin SignalR and silently dropping a valid result until the backstop. Correlate on the delivery key alone. - Pipeline reinit: reset per-turn delivery flags (Slack leaked a stale _postedThisTurn into the next turn, falsely reporting a later reminder delivered) and tell in-flight observers Delivered=false so they redeliver immediately instead of stalling on the backstop. Adds a cross-channel regression test for concurrent same-session reminders.
Discord and Mattermost posted the "I didn't manage to produce a reply" fallback whenever a turn ended without a successful post — conflating a genuinely empty turn (model produced nothing) with a failed post (model produced a reply, transport rejected it). On a failed post the message is misleading (a reply WAS produced, and the session was already notified via DeliveryFailed) and, for reminders, doubles up with the reply that arrives on redelivery. Track _postFailedThisTurn (set when a content post/upload throws) and post the fallback only for genuinely empty turns (!delivered && !postFailed) — matching the behavior Slack already had via _lastFailedPost. Adds a cross-channel regression test (Failed_content_post_does_not_post_ empty_turn_fallback) plus a SetReplyClientThrowsOnce harness hook that fails one post and recovers, so the fallback attempt is observable.
The session deduplicates Akka.Reminders redeliveries by the delivery key
({reminderId}:{fireTimestampMs}), but the key was built from _dispatchedAt
— a wall-clock captured fresh in each ReminderExecutionActor. Every
redelivery spawns a new execution actor with a new _dispatchedAt, so the
key drifted and never matched the processed key: the dedup never fired and
the reminder was delivered again (duplicate).
Build the key from the envelope's scheduled fire time (DueTimeUtc), which
is identical across redeliveries of the same fire — matching the key's
documented {reminderId}:{fireTimestampMs} definition. No new construct;
just sources the timestamp from the right place. Deferred re-runs carry no
envelope and are never redelivered, so the dispatch time remains a safe
fallback there.
Adds a regression test asserting the dispatched key is derived from the
envelope's fire time rather than the dispatch wall-clock.
Aaronontheweb
commented
Jun 11, 2026
|
|
||
| var resultB = await observerB.ExpectMsgAsync<ReminderDeliveryResult>( | ||
| TimeSpan.FromSeconds(5), cancellationToken: ct); | ||
| Assert.Equal(keyB, resultB.ReminderDeliveryKey); |
| binding.Tell(new DeliverTrustedSessionTurn(sid, "run the reminder", CreateReminderSource(reminderKey, observer.Ref))); | ||
|
|
||
| CreateBindingActor(sid, pipeline, detector); | ||
| var result = await observer.ExpectMsgAsync<ReminderDeliveryResult>( |
| TimeSpan.FromSeconds(5), cancellationToken: ct); | ||
| Assert.Equal(reminderKey, result.ReminderDeliveryKey); | ||
| Assert.Equal(ExpectedChannelType, result.ChannelType); | ||
| Assert.False(result.Delivered); |
| // A distinctive fire time far from "now": a key built from the dispatch | ||
| // wall-clock would not match it. | ||
| var fireTime = DateTimeOffset.FromUnixTimeMilliseconds(1_700_000_000_000); | ||
| var envelope = new ReminderEnvelope<ReminderPayload>( |
| /// <see cref="IActorRef"/> is not serializable and <see cref="MessageSource"/> | ||
| /// is never persisted. | ||
| /// </summary> | ||
| public IActorRef? DeliveryObserver { get; init; } |
Collaborator
Author
There was a problem hiding this comment.
not happy about how many properties are on MessageSource now but I guess that's not the end of the world since most are optional. Probably should refactor this into something resembling a DU.
| return; | ||
| _awaitingDeliveryResult = true; | ||
| _expectedReminderDeliveryKey = reminderDeliveryKey; | ||
| _deliveryTimeoutCancelable = Context.System.Scheduler.ScheduleTellOnceCancelable( |
Collaborator
Author
There was a problem hiding this comment.
have to use this instead of IWithTimers because akkadotnet/akka.net#7630 is not yet implemented
| public sealed record ReminderDeliveryResult( | ||
| string ReminderDeliveryKey, | ||
| Channels.ChannelType ChannelType, | ||
| bool Delivered, |
Collaborator
Author
There was a problem hiding this comment.
LGTM, disambiguates failed deliveries from successful ones.
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In-session reminders (
delivery_kind=current_session,DeliveryRequired=true) did not reliably confirm delivery, and could silently stop firing.Two root causes:
ReminderExecutionActorawaited the delivery signal insideRunTask, which suspends the actor mailbox until the task completes — so the actor could never process theReminderDeliveryObservedmessage it was waiting for. EveryDeliveryRequiredcurrent-session reminder therefore fell through to the 1-hour timeout, reported failure, and auto-disabled after 5 consecutive "failures" regardless of whether the post actually succeeded. Existing tests only passed weakly (assert-at-t0) or via the timeout, so it was never caught.Approach
Replace the fire-and-forget
EventStreambroadcast (the codebase's only use of it) with a point-to-pointReminderDeliveryResulttold directly to the dispatching execution actor, carried as a transient, never-persistedMessageSource.DeliveryObserverref. The result reports success or failure, so a failed delivery redelivers immediately instead of waiting out the (now backstop-only) timeout. The execution actor waits via actor state + a scheduled backstop message rather than awaiting insideRunTask, so the confirmation is actually received.What's included
Core fix
SafeReplyAsyncreturnsTask<bool>).MessageSource.DeliveryObserver(ephemeral) + point-to-pointReminderDeliveryResult.ReminderExecutionActor(no await-in-RunTask).Hardening (from a multi-angle review)
AckAsync(null)inside an unguardedRunTask, escalating to a supervisor restart + duplicate-post loop. Null-guard the ack and wrap theRunTask.Dictionary) in all four channels.ChannelTypeguard that failed closed (a cold SignalR actor reports its defaultTui); the point-to-point ref + unique key already correlate.Delivered=falseso they redeliver fast.Empty-vs-failed fallback
Testing
Delivered=false, concurrent same-session reminders each get their own result, and failed posts don't trigger the empty-turn fallback.Known gap: SignalR's binding lacks a dedicated contract test (it's not in the shared harness; a faked
IHubContextfixture is disproportionate). Its change mirrors the three covered channels.