Skip to content

Fix in-session reminder delivery confirmation (current-session reminders)#1387

Merged
Aaronontheweb merged 5 commits into
netclaw-dev:devfrom
Aaronontheweb:fix/reminder-delivery-ack
Jun 11, 2026
Merged

Fix in-session reminder delivery confirmation (current-session reminders)#1387
Aaronontheweb merged 5 commits into
netclaw-dev:devfrom
Aaronontheweb:fix/reminder-delivery-ack

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Problem

In-session reminders (delivery_kind=current_session, DeliveryRequired=true) did not reliably confirm delivery, and could silently stop firing.

Two root causes:

  1. Confirmation never reached the execution actor. ReminderExecutionActor awaited the delivery signal inside RunTask, which suspends the actor mailbox until the task completes — so the actor could never process the ReminderDeliveryObserved message it was waiting for. Every DeliveryRequired current-session reminder therefore fell through to the 1-hour timeout, reported failure, and auto-disabled after 5 consecutive "failures" regardless of whether the post actually succeeded. Existing tests only passed weakly (assert-at-t0) or via the timeout, so it was never caught.
  2. Discord/Mattermost reported false success. Both set their per-turn "delivered" flag unconditionally, even when the post threw — so a failed post was reported as a successful delivery (silent loss). Slack and SignalR already gated on real success.

Approach

Replace the fire-and-forget EventStream broadcast (the codebase's only use of it) with a point-to-point ReminderDeliveryResult told directly to the dispatching execution actor, carried as a transient, never-persisted MessageSource.DeliveryObserver ref. The result reports success or failure, so a failed delivery redelivers immediately instead of waiting out the (now backstop-only) timeout. The execution actor waits via actor state + a scheduled backstop message rather than awaiting inside RunTask, so the confirmation is actually received.

What's included

Core fix

  • Gate Discord/Mattermost delivered flag on real post success (SafeReplyAsync returns Task<bool>).
  • MessageSource.DeliveryObserver (ephemeral) + point-to-point ReminderDeliveryResult.
  • Message-driven wait in ReminderExecutionActor (no await-in-RunTask).

Hardening (from a multi-angle review)

  • Restart loop: a deferred-queue re-run has a null envelope; the success path called AckAsync(null) inside an unguarded RunTask, escalating to a supervisor restart + duplicate-post loop. Null-guard the ack and wrap the RunTask.
  • Concurrent-reminder clobber: a single observer field was overwritten when two reminders targeted the same session; key observers by reminder delivery id (Dictionary) in all four channels.
  • SignalR result drop: removed a redundant exact-ChannelType guard that failed closed (a cold SignalR actor reports its default Tui); the point-to-point ref + unique key already correlate.
  • Pipeline reinit: reset per-turn delivery flags (Slack leaked a stale flag → false success) and tell in-flight observers Delivered=false so they redeliver fast.

Empty-vs-failed fallback

  • Discord/Mattermost posted "I didn't manage to produce a reply" on a failed post (a reply was produced; transport failed) — misleading and doubled up with the redelivered reply. Now suppressed for failed posts (matching Slack's existing behavior); the fallback only fires for genuinely empty turns.

Testing

  • New cross-channel contract tests: delivery success, failed-post reports Delivered=false, concurrent same-session reminders each get their own result, and failed posts don't trigger the empty-turn fallback.
  • New execution-actor test: explicit delivery failure reports fast (not via timeout).
  • Full reminders/channels/sessions and daemon-gateway suites pass; slopwatch clean; copyright headers present.

Known gap: SignalR's binding lacks a dedicated contract test (it's not in the shared harness; a faked IHubContext fixture is disproportionate). Its change mirrors the three covered channels.

DeliveryRequired current-session reminders confirmed delivery via a
fire-and-forget EventStream broadcast that the execution actor awaited
inside RunTask. RunTask suspends the actor mailbox until the task
completes, so the actor could never process the ReminderDeliveryObserved
message it was waiting for. Existing tests passed only weakly (asserting
"no failure yet" at t0) or via the timeout, so this was never caught.
The result: every such reminder fell through to the 1-hour timeout,
reported failure, and auto-disabled after 5 consecutive "failures" —
regardless of whether the post actually succeeded.

Separately, Discord and Mattermost set their per-turn delivered flag
unconditionally, even when the post threw, so a failed post was reported
as a successful delivery. Slack and SignalR already gated on real
success.

Replace the EventStream broadcast (the codebase's only use of it) with a
point-to-point ReminderDeliveryResult told directly to the execution
actor, carried as MessageSource.DeliveryObserver. The result reports
success OR failure, so a failed delivery redelivers immediately instead
of waiting out the timeout, which is demoted to a crash backstop. The
execution actor now waits via actor state + a scheduled backstop message
rather than awaiting inside RunTask, so the confirmation is actually
received.

- Gate Discord/Mattermost delivered flag on real post success
- Add MessageSource.DeliveryObserver (ephemeral, never persisted)
- All four binding actors tell ReminderDeliveryResult on turn completion
- Cross-channel contract tests for both success and failed-post delivery
- Execution-actor fast-fail test (explicit failure, not timeout)

Known gap: SignalR's new observer tell has no dedicated test (not in the
contract harness; faked IHubContext fixture is disproportionate).
Addresses defects found reviewing the point-to-point delivery-confirmation
change:

- Restart loop: a current-session reminder re-run from the deferred queue
  runs with a null envelope (the concurrency gate already acked+dropped it).
  The success path's TryAckEnvelopeAsync called AckAsync(null), which threw
  inside an unguarded RunTask and escalated to a supervisor restart, re-posting
  the reminder in a loop. Skip the ack when the envelope is null, and wrap the
  ack RunTask so a fault reports failure instead of restarting.

- Concurrent-reminder clobber: the single per-actor _reminderDeliveryObserver
  field was overwritten when two reminders targeted the same session before the
  first turn completed, misrouting the first result and stranding both on the
  backstop. Key observers by reminder delivery id (Dictionary) in all four
  channels so each turn resolves its own observer on TurnCompleted.

- SignalR result drop: HandleDeliveryResult gated on an exact ChannelType match
  layered on an already-unique point-to-point Self ref. A cold SignalR actor
  reports its default Tui channel, mismatching the origin SignalR and silently
  dropping a valid result until the backstop. Correlate on the delivery key
  alone.

- Pipeline reinit: reset per-turn delivery flags (Slack leaked a stale
  _postedThisTurn into the next turn, falsely reporting a later reminder
  delivered) and tell in-flight observers Delivered=false so they redeliver
  immediately instead of stalling on the backstop.

Adds a cross-channel regression test for concurrent same-session reminders.
Discord and Mattermost posted the "I didn't manage to produce a reply"
fallback whenever a turn ended without a successful post — conflating a
genuinely empty turn (model produced nothing) with a failed post (model
produced a reply, transport rejected it). On a failed post the message is
misleading (a reply WAS produced, and the session was already notified via
DeliveryFailed) and, for reminders, doubles up with the reply that arrives
on redelivery.

Track _postFailedThisTurn (set when a content post/upload throws) and post
the fallback only for genuinely empty turns (!delivered && !postFailed) —
matching the behavior Slack already had via _lastFailedPost.

Adds a cross-channel regression test (Failed_content_post_does_not_post_
empty_turn_fallback) plus a SetReplyClientThrowsOnce harness hook that fails
one post and recovers, so the fallback attempt is observable.
@Aaronontheweb Aaronontheweb added bug Something isn't working channels Discord, Slack, and other channels. reminders Reminder scheduling, execution, and history labels Jun 10, 2026
The session deduplicates Akka.Reminders redeliveries by the delivery key
({reminderId}:{fireTimestampMs}), but the key was built from _dispatchedAt
— a wall-clock captured fresh in each ReminderExecutionActor. Every
redelivery spawns a new execution actor with a new _dispatchedAt, so the
key drifted and never matched the processed key: the dedup never fired and
the reminder was delivered again (duplicate).

Build the key from the envelope's scheduled fire time (DueTimeUtc), which
is identical across redeliveries of the same fire — matching the key's
documented {reminderId}:{fireTimestampMs} definition. No new construct;
just sources the timestamp from the right place. Deferred re-runs carry no
envelope and are never redelivered, so the dispatch time remains a safe
fallback there.

Adds a regression test asserting the dispatched key is derived from the
envelope's fire time rather than the dispatch wall-clock.

@Aaronontheweb Aaronontheweb left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


var resultB = await observerB.ExpectMsgAsync<ReminderDeliveryResult>(
TimeSpan.FromSeconds(5), cancellationToken: ct);
Assert.Equal(keyB, resultB.ReminderDeliveryKey);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

binding.Tell(new DeliverTrustedSessionTurn(sid, "run the reminder", CreateReminderSource(reminderKey, observer.Ref)));

CreateBindingActor(sid, pipeline, detector);
var result = await observer.ExpectMsgAsync<ReminderDeliveryResult>(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

TimeSpan.FromSeconds(5), cancellationToken: ct);
Assert.Equal(reminderKey, result.ReminderDeliveryKey);
Assert.Equal(ExpectedChannelType, result.ChannelType);
Assert.False(result.Delivered);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// A distinctive fire time far from "now": a key built from the dispatch
// wall-clock would not match it.
var fireTime = DateTimeOffset.FromUnixTimeMilliseconds(1_700_000_000_000);
var envelope = new ReminderEnvelope<ReminderPayload>(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// <see cref="IActorRef"/> is not serializable and <see cref="MessageSource"/>
/// is never persisted.
/// </summary>
public IActorRef? DeliveryObserver { get; init; }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not happy about how many properties are on MessageSource now but I guess that's not the end of the world since most are optional. Probably should refactor this into something resembling a DU.

return;
_awaitingDeliveryResult = true;
_expectedReminderDeliveryKey = reminderDeliveryKey;
_deliveryTimeoutCancelable = Context.System.Scheduler.ScheduleTellOnceCancelable(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to use this instead of IWithTimers because akkadotnet/akka.net#7630 is not yet implemented

public sealed record ReminderDeliveryResult(
string ReminderDeliveryKey,
Channels.ChannelType ChannelType,
bool Delivered,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, disambiguates failed deliveries from successful ones.

@Aaronontheweb Aaronontheweb merged commit 5f4a8f9 into netclaw-dev:dev Jun 11, 2026
15 checks passed
@Aaronontheweb Aaronontheweb deleted the fix/reminder-delivery-ack branch June 11, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working channels Discord, Slack, and other channels. reminders Reminder scheduling, execution, and history

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant