fix: prevent duplicate messages — gateway dedup + partial stream guard by teknium1 · Pull Request #4878 · NousResearch/hermes-agent

teknium1 · 2026-04-03T21:43:13Z

Summary

Two complementary fixes for duplicate message bugs:

1. Gateway message deduplication (Discord + Slack)

Cherry-picked from PR #4866 by @Mibayy (closes #4777).

Discord RESUME replays events after reconnects (~7/day in production). Slack Socket Mode can redeliver events on reconnect. Adds _seen_messages dict with 5-min TTL and 2000-entry cap to both adapters, matching WeCom/Mattermost/Feishu/DingTalk.

2. Partial stream recovery (run_agent.py)

Inspired by PR #4871 by @trevorgordon981 who identified the bug.

When streaming fails after tokens are already delivered to the platform, instead of retrying (duplicate) or giving up (truncated), the agent now attempts to continue the response:

Option A: Append partial content as an assistant message, make a non-streaming API call. The model sees its previous partial output in conversation history and naturally continues from where it left off.
Option B (fallback): If trailing assistant is rejected by the provider, inject a user "continue" instruction and retry.
Last resort: If both fail, return partial content as the final response.

Tested with real Sonnet and Opus models via both Anthropic native API and OpenRouter — the model seamlessly continues mid-sentence in all cases.

Also fixes: Anthropic streaming path was missing deltas_were_sent flag and partial text accumulation.

Test results

test_run_agent.py: 221 passed
tests/gateway/: 1858 passed (6 pre-existing failures, unrelated)
E2E: Option A recovery verified, A→B fallback verified, both-fail partial return verified
Live model tests: Sonnet + Opus continuation works on Anthropic native + OpenRouter

Files changed

gateway/platforms/discord.py — dedup cache (+19 lines)
gateway/platforms/slack.py — dedup cache (+20 lines)
run_agent.py — partial stream recovery with A→B fallback (+137 lines)

…#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR #4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary.

britrik · 2026-04-03T22:32:44Z

Code Review: PR #4878

Thanks for addressing the duplicate message issue! Here's my review:

Positive Aspects

Clean implementation of deduplication with proper TTL and size limits
Good defensive coding with cache pruning to prevent unbounded growth
The partial stream guard in run_agent.py is a smart solution to prevent retries from creating duplicates

Suggestions

1. Discord adapter - variable capture concern
The handler uses from an outer closure. Consider explicitly passing self or using a clearer pattern:

This works but could be confusing to future maintainers.

2. run_agent.py - misleading finish_reason
The stub response uses , which implies natural completion rather than error recovery. Consider using a more descriptive approach:

This would make debugging easier and be more accurate.

3. Minor: import statement
The import is added to slack.py but not discord.py - double-check if discord.py already imports it or if it's needed there too.

4. Edge case consideration
What happens if both dedup checks pass but the message should still be ignored (e.g., channel/role permissions)? The current flow is correct, but worth documenting that dedup happens before permission checks.

Overall: Solid implementation that addresses the core issue well. The suggestions above are minor improvements rather than blockers.

Resolved conflicts: - plugins/memory/openviking/__init__.py: kept our is_available() .env fallback, adopted upstream env-var defaults for account/user - plugins/memory/byterover/__init__.py: adopted upstream synchronous prefetch (replaces threaded queue_prefetch pattern) - hermes_cli/main.py: adopted upstream find_gateway_pids() approach for multi-gateway restart (replaces per-service-type restart block) - hermes_cli/__init__.py + pyproject.toml: version 0.7.0 (auto-resolved) Key upstream features: - Memory provider tools routed in sequential execution (NousResearch#4803) - Clean user message for memory operations (NousResearch#5099) - Duplicate message prevention (gateway dedup) (NousResearch#4878) - O(n^2) regex backtracking fix (831067c) - OpenViking tenant-scoping headers (NousResearch#4825)

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trevorgordon981

NousResearch#4878) * fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777) Discord RESUME replays events after reconnects (~7/day observed), and Slack Socket Mode can redeliver events if the ack was lost. Neither adapter tracked which messages were already processed, causing duplicate bot responses. Add _seen_messages dedup cache (message ID → timestamp) with 5-min TTL and 2000-entry cap to both adapters, matching the pattern already used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email. The check goes at the very top of the message handler, before any other logic, so replayed events are silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent duplicate messages on partial stream delivery When streaming fails after tokens are already delivered to the platform, _interruptible_streaming_api_call re-raised the error into the outer retry loop, which would make a new API call — creating a duplicate message. Now checks deltas_were_sent before re-raising: if partial content was already streamed, returns a stub response instead. The outer loop treats the turn as complete (no retry, no fallback, no duplicate). Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug. This implementation avoids monkey-patching exception objects and keeps the fix within the streaming call boundary. --------- Co-authored-by: Mibayy <mibayy@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mibayy and others added 2 commits April 3, 2026 14:27

This was referenced Apr 3, 2026

fix(gateway): add message deduplication to Discord and Slack adapters #4866

Closed

fix: prevent duplicate messages on partial stream delivery #4871

Closed

teknium1 merged commit cee761e into main Apr 4, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate messages — gateway dedup + partial stream guard#4878

fix: prevent duplicate messages — gateway dedup + partial stream guard#4878
teknium1 merged 2 commits into
mainfrom
hermes/hermes-1e1d81f5

teknium1 commented Apr 3, 2026 •

edited

Loading

Uh oh!

britrik commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Gateway message deduplication (Discord + Slack)

2. Partial stream recovery (run_agent.py)

Test results

Files changed

Uh oh!

britrik commented Apr 3, 2026

Code Review: PR #4878

Positive Aspects

Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teknium1 commented Apr 3, 2026 •

edited

Loading