Skip to content

fix(channels): close delivery feedback loop so Slack errors reach the LLM session#311

Merged
Aaronontheweb merged 2 commits into
devfrom
claude-wt-stuck-netclaw-sessions
Mar 20, 2026
Merged

fix(channels): close delivery feedback loop so Slack errors reach the LLM session#311
Aaronontheweb merged 2 commits into
devfrom
claude-wt-stuck-netclaw-sessions

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

  • Slack delivery failures (timeouts, generic exceptions, permission errors) were silently lost — PostResult lacked a FailureKind and the channel adapter only reported 3 of 6 failure kinds to the session
  • The LLM session never learned about the error and could not retry or inform the user, causing the agent to appear non-responsive (observed in sessions D0AC6CKBK5K/1773983518.003979 and D0AC6CKBK5K/1773981376.795159)
  • Root cause was 3 bugs in SlackThreadBindingActor: missing FailureKind on timeouts/generic exceptions, selective ShouldNotifySession gate, and silent exception swallowing in NotifyDeliveryFailedAsync

Changes

SlackThreadBindingActor.cs:

  • Assign DeliveryFailureKind.TransportFailure to timeout errors and Unknown to generic exceptions (both SafePostAsync and SafeUploadFileAsync)
  • Widen ShouldNotifySession from 3 specific kinds to FailureKind is not null — session decides retryability
  • Remove dead else if branch for non-retryable failures (now all failures have a kind)
  • Rethrow in NotifyDeliveryFailedAsync so broken pipelines trigger OutputStreamTerminated → reinit
  • Record ErrorOutput post failures in _lastFailedPost for consistency with other output types

LlmSessionActor.cs:

  • Non-retryable failures in Ready state inject a nudge via AddSystemNudge() (visible on next user turn) instead of being silently dropped
  • Non-retryable failures during Processing also inject nudge instead of being dropped
  • Stale-turn check moved before retryable check for correct log messages
  • Explicit nudge guidance for TransportFailure, PermissionDenied, and Unknown

Test plan

  • 4 new integration tests covering the full feedback loop
  • Transport_failure_injects_nudge_without_triggering_retry — session receives TransportFailure, no LLM retry, nudge visible on next turn
  • Unknown_delivery_failure_injects_nudge_without_triggering_retry — same for Unknown kind
  • Timeout_during_post_sends_transport_failure_feedback_to_session — channel adapter correctly classifies and reports timeouts
  • Generic_exception_during_post_sends_unknown_failure_feedback_to_session — channel adapter correctly classifies and reports generic exceptions
  • All 6 existing delivery feedback tests still pass
  • Full suite: 1,125 tests pass, 0 failures
  • dotnet slopwatch analyze — 0 new violations

… LLM session

Slack delivery failures (timeouts, generic exceptions, permission errors)
were silently lost because PostResult lacked a FailureKind and the
channel adapter only reported 3 of 6 failure kinds to the session.
The session never learned about the error and could not retry or inform
the user, causing the agent to appear non-responsive.

Changes:
- Assign FailureKind to all PostResult failures (TransportFailure for
  timeouts, Unknown for generic exceptions)
- Widen ShouldNotifySession to report all failure kinds to the session
  instead of only ContentRejected/MessageTooLarge/UnsupportedContent
- Rethrow in NotifyDeliveryFailedAsync so broken pipelines trigger
  reinit instead of silently swallowing the feedback
- Handle non-retryable failures in LlmSessionActor by injecting a
  nudge visible on the next turn (transport errors can't be fixed by
  changing output, but the LLM can acknowledge the issue)
- Add explicit nudge guidance for TransportFailure and PermissionDenied
- Record ErrorOutput post failures in _lastFailedPost for consistency
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) March 20, 2026 15:19
@Aaronontheweb Aaronontheweb merged commit 080911a into dev Mar 20, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the claude-wt-stuck-netclaw-sessions branch March 20, 2026 15:27
Aaronontheweb added a commit that referenced this pull request Mar 20, 2026
…econnect, and init race fix

Bump VersionPrefix to 0.7.1 and update PackageReleaseNotes in
Directory.Build.props. Add 0.7.1 entry to RELEASE_NOTES.md covering:

- Slack delivery failures (timeouts, permission errors) now propagate
  back to the LLM session (#311)
- netclaw stats token counts fixed for OpenAI-compatible providers (#303)
- MCP OAuth servers auto-reconnect after authorization; mcp list uses
  daemon-side statuses for OAuth-protected servers (#301)
- netclaw init no longer triggers a restart race on existing installs (#300)
@Aaronontheweb Aaronontheweb mentioned this pull request Mar 20, 2026
3 tasks
Aaronontheweb added a commit that referenced this pull request Mar 20, 2026
…econnect, and init race fix (#313)

Bump VersionPrefix to 0.7.1 and update PackageReleaseNotes in
Directory.Build.props. Add 0.7.1 entry to RELEASE_NOTES.md covering:

- Slack delivery failures (timeouts, permission errors) now propagate
  back to the LLM session (#311)
- netclaw stats token counts fixed for OpenAI-compatible providers (#303)
- MCP OAuth servers auto-reconnect after authorization; mcp list uses
  daemon-side statuses for OAuth-protected servers (#301)
- netclaw init no longer triggers a restart race on existing installs (#300)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant