fix: prevent duplicate messages on partial stream delivery by trevorgordon981 · Pull Request #4871 · NousResearch/hermes-agent

trevorgordon981 · 2026-04-03T21:15:47Z

Problem

When streaming responses to platforms (Slack, Telegram, etc.), if a connection error occurs after partial delivery, the retry logic would create a NEW message instead of recovering the original. This resulted in duplicate messages being sent to users.

Solution

Partial delivery guard: Mark streaming errors with flag
Skip retry on partial delivery: Check this flag before retrying API calls
Return early: If partial delivery occurred, persist session and return with instead of retrying

Additional Changes

Increased from 3 to 6 for better resilience
Added debug logging for response validation failures
Fixed when extracting response attributes from non-dict responses

Testing

Tested with Slack platform adapter during connection interruptions. No duplicate messages observed after fix.

Impact

Users: No more duplicate messages during transient errors
Platforms: Reduced API calls and message spam
Backwards compatible: Yes, only affects error handling path

- Mark streaming errors with _partial_stream_delivered flag to prevent retry logic from sending duplicate messages to platforms - Increase max_retries from 3 to 6 for better resilience - Add debug logging for response validation failures - Handle TypeError when extracting response attributes This fixes the issue where connection errors after partial delivery would cause the retry loop to send the same message twice to Slack/Telegram.

@trevorgordon981

When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR #4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary.

teknium1 · 2026-04-03T21:43:55Z

Thanks for identifying this bug @trevorgordon981 — the partial stream delivery → outer retry loop → duplicate message path is real and we confirmed it by tracing the code.

We fixed it with a different approach in PR #4878: instead of monkey-patching exception objects with _partial_stream_delivered, we check deltas_were_sent directly in _interruptible_streaming_api_call and return a stub response rather than re-raising. This keeps the fix within the streaming call boundary.

The other changes in your PR (max_retries bump, DEBUG_RESP logging, vars() TypeError) were not included — the max_retries change is a separate concern and the debug logging shouldn't ship to production. If you'd like to submit those individually, happy to review.

Closing in favor of #4878 — your contribution is credited in the commit message.

@trevorgordon981

#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR #4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

When streaming fails after tokens are already delivered to the platform, the agent now attempts to continue the response: Option A: append partial content as an assistant message and make a non-streaming API call — the model sees its previous partial output and naturally continues from where it left off. Option B (fallback): if trailing assistant is rejected, inject a user 'continue' instruction and retry — explicitly asks the model to resume without repeating. Last resort: if both fail, return the partial content as the final response (user sees what was delivered, no duplicate). Tested with real Sonnet and Opus models via both Anthropic native API and OpenRouter — continuation works seamlessly on all providers. Also adds partial text accumulation to the Anthropic streaming path (previously only chat_completions tracked deltas_were_sent). Inspired by PR #4871 (@trevorgordon981) which identified the bug.

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

When streaming fails after tokens are already delivered to the platform, the agent now attempts to continue the response: Option A: append partial content as an assistant message and make a non-streaming API call — the model sees its previous partial output and naturally continues from where it left off. Option B (fallback): if trailing assistant is rejected, inject a user 'continue' instruction and retry — explicitly asks the model to resume without repeating. Last resort: if both fail, return the partial content as the final response (user sees what was delivered, no duplicate). Tested with real Sonnet and Opus models via both Anthropic native API and OpenRouter — continuation works seamlessly on all providers. Also adds partial text accumulation to the Anthropic streaming path (previously only chat_completions tracked deltas_were_sent). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

teknium1 mentioned this pull request Apr 3, 2026

fix: prevent duplicate messages — gateway dedup + partial stream guard #4878

Merged

teknium1 closed this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate messages on partial stream delivery#4871

fix: prevent duplicate messages on partial stream delivery#4871
trevorgordon981 wants to merge 1 commit into
NousResearch:mainfrom
trevorgordon981:fix/duplicate-message-streaming-retry

trevorgordon981 commented Apr 3, 2026

Uh oh!

teknium1 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

trevorgordon981 commented Apr 3, 2026

Problem

Solution

Additional Changes

Testing

Impact

Uh oh!

teknium1 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants