Skip to content

fix: prevent duplicate messages on partial stream delivery#4871

Closed
trevorgordon981 wants to merge 1 commit into
NousResearch:mainfrom
trevorgordon981:fix/duplicate-message-streaming-retry
Closed

fix: prevent duplicate messages on partial stream delivery#4871
trevorgordon981 wants to merge 1 commit into
NousResearch:mainfrom
trevorgordon981:fix/duplicate-message-streaming-retry

Conversation

@trevorgordon981

Copy link
Copy Markdown

Problem

When streaming responses to platforms (Slack, Telegram, etc.), if a connection error occurs after partial delivery, the retry logic would create a NEW message instead of recovering the original. This resulted in duplicate messages being sent to users.

Solution

  1. Partial delivery guard: Mark streaming errors with flag
  2. Skip retry on partial delivery: Check this flag before retrying API calls
  3. Return early: If partial delivery occurred, persist session and return with instead of retrying

Additional Changes

  • Increased from 3 to 6 for better resilience
  • Added debug logging for response validation failures
  • Fixed when extracting response attributes from non-dict responses

Testing

Tested with Slack platform adapter during connection interruptions. No duplicate messages observed after fix.

Impact

  • Users: No more duplicate messages during transient errors
  • Platforms: Reduced API calls and message spam
  • Backwards compatible: Yes, only affects error handling path

- Mark streaming errors with _partial_stream_delivered flag to prevent
  retry logic from sending duplicate messages to platforms
- Increase max_retries from 3 to 6 for better resilience
- Add debug logging for response validation failures
- Handle TypeError when extracting response attributes

This fixes the issue where connection errors after partial delivery
would cause the retry loop to send the same message twice to Slack/Telegram.
teknium1 added a commit that referenced this pull request Apr 3, 2026
When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR #4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.
@teknium1

teknium1 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Thanks for identifying this bug @trevorgordon981 — the partial stream delivery → outer retry loop → duplicate message path is real and we confirmed it by tracing the code.

We fixed it with a different approach in PR #4878: instead of monkey-patching exception objects with _partial_stream_delivered, we check deltas_were_sent directly in _interruptible_streaming_api_call and return a stub response rather than re-raising. This keeps the fix within the streaming call boundary.

The other changes in your PR (max_retries bump, DEBUG_RESP logging, vars() TypeError) were not included — the max_retries change is a separate concern and the debug logging shouldn't ship to production. If you'd like to submit those individually, happy to review.

Closing in favor of #4878 — your contribution is credited in the commit message.

@teknium1 teknium1 closed this Apr 3, 2026
teknium1 added a commit that referenced this pull request Apr 4, 2026
#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR #4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
teknium1 added a commit that referenced this pull request Apr 4, 2026
When streaming fails after tokens are already delivered to the platform,
the agent now attempts to continue the response:

  Option A: append partial content as an assistant message and make a
  non-streaming API call — the model sees its previous partial output
  and naturally continues from where it left off.

  Option B (fallback): if trailing assistant is rejected, inject a user
  'continue' instruction and retry — explicitly asks the model to
  resume without repeating.

  Last resort: if both fail, return the partial content as the final
  response (user sees what was delivered, no duplicate).

Tested with real Sonnet and Opus models via both Anthropic native API
and OpenRouter — continuation works seamlessly on all providers.

Also adds partial text accumulation to the Anthropic streaming path
(previously only chat_completions tracked deltas_were_sent).

Inspired by PR #4871 (@trevorgordon981) which identified the bug.
saxster pushed a commit to saxster/hermes-agent that referenced this pull request Apr 8, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
When streaming fails after tokens are already delivered to the platform,
the agent now attempts to continue the response:

  Option A: append partial content as an assistant message and make a
  non-streaming API call — the model sees its previous partial output
  and naturally continues from where it left off.

  Option B (fallback): if trailing assistant is rejected, inject a user
  'continue' instruction and retry — explicitly asks the model to
  resume without repeating.

  Last resort: if both fail, return the partial content as the final
  response (user sees what was delivered, no duplicate).

Tested with real Sonnet and Opus models via both Anthropic native API
and OpenRouter — continuation works seamlessly on all providers.

Also adds partial text accumulation to the Anthropic streaming path
(previously only chat_completions tracked deltas_were_sent).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants