Skip to content

fix: prevent duplicate messages — gateway dedup + partial stream guard#4878

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-1e1d81f5
Apr 4, 2026
Merged

fix: prevent duplicate messages — gateway dedup + partial stream guard#4878
teknium1 merged 2 commits into
mainfrom
hermes/hermes-1e1d81f5

Conversation

@teknium1

@teknium1 teknium1 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Two complementary fixes for duplicate message bugs:

1. Gateway message deduplication (Discord + Slack)

Cherry-picked from PR #4866 by @Mibayy (closes #4777).

Discord RESUME replays events after reconnects (~7/day in production). Slack Socket Mode can redeliver events on reconnect. Adds _seen_messages dict with 5-min TTL and 2000-entry cap to both adapters, matching WeCom/Mattermost/Feishu/DingTalk.

2. Partial stream recovery (run_agent.py)

Inspired by PR #4871 by @trevorgordon981 who identified the bug.

When streaming fails after tokens are already delivered to the platform, instead of retrying (duplicate) or giving up (truncated), the agent now attempts to continue the response:

  • Option A: Append partial content as an assistant message, make a non-streaming API call. The model sees its previous partial output in conversation history and naturally continues from where it left off.
  • Option B (fallback): If trailing assistant is rejected by the provider, inject a user "continue" instruction and retry.
  • Last resort: If both fail, return partial content as the final response.

Tested with real Sonnet and Opus models via both Anthropic native API and OpenRouter — the model seamlessly continues mid-sentence in all cases.

Also fixes: Anthropic streaming path was missing deltas_were_sent flag and partial text accumulation.

Test results

  • test_run_agent.py: 221 passed
  • tests/gateway/: 1858 passed (6 pre-existing failures, unrelated)
  • E2E: Option A recovery verified, A→B fallback verified, both-fail partial return verified
  • Live model tests: Sonnet + Opus continuation works on Anthropic native + OpenRouter

Files changed

  • gateway/platforms/discord.py — dedup cache (+19 lines)
  • gateway/platforms/slack.py — dedup cache (+20 lines)
  • run_agent.py — partial stream recovery with A→B fallback (+137 lines)

Mibayy and others added 2 commits April 3, 2026 14:27
…#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR #4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.
@britrik

britrik commented Apr 3, 2026

Copy link
Copy Markdown

Code Review: PR #4878

Thanks for addressing the duplicate message issue! Here's my review:

Positive Aspects

  • Clean implementation of deduplication with proper TTL and size limits
  • Good defensive coding with cache pruning to prevent unbounded growth
  • The partial stream guard in run_agent.py is a smart solution to prevent retries from creating duplicates

Suggestions

1. Discord adapter - variable capture concern
The handler uses from an outer closure. Consider explicitly passing self or using a clearer pattern:

This works but could be confusing to future maintainers.

2. run_agent.py - misleading finish_reason
The stub response uses , which implies natural completion rather than error recovery. Consider using a more descriptive approach:

This would make debugging easier and be more accurate.

3. Minor: import statement
The import is added to slack.py but not discord.py - double-check if discord.py already imports it or if it's needed there too.

4. Edge case consideration
What happens if both dedup checks pass but the message should still be ignored (e.g., channel/role permissions)? The current flow is correct, but worth documenting that dedup happens before permission checks.

Overall: Solid implementation that addresses the core issue well. The suggestions above are minor improvements rather than blockers.

@teknium1 teknium1 merged commit cee761e into main Apr 4, 2026
5 of 6 checks passed
zebster-cmd added a commit to zebster-cmd/hermes-agent that referenced this pull request Apr 4, 2026
Resolved conflicts:
- plugins/memory/openviking/__init__.py: kept our is_available() .env
  fallback, adopted upstream env-var defaults for account/user
- plugins/memory/byterover/__init__.py: adopted upstream synchronous
  prefetch (replaces threaded queue_prefetch pattern)
- hermes_cli/main.py: adopted upstream find_gateway_pids() approach
  for multi-gateway restart (replaces per-service-type restart block)
- hermes_cli/__init__.py + pyproject.toml: version 0.7.0 (auto-resolved)

Key upstream features:
- Memory provider tools routed in sequential execution (NousResearch#4803)
- Clean user message for memory operations (NousResearch#5099)
- Duplicate message prevention (gateway dedup) (NousResearch#4878)
- O(n^2) regex backtracking fix (831067c)
- OpenViking tenant-scoping headers (NousResearch#4825)
saxster pushed a commit to saxster/hermes-agent that referenced this pull request Apr 8, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
NousResearch#4878)

* fix(gateway): add message deduplication to Discord and Slack adapters (NousResearch#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR NousResearch#4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(gateway): Discord and Slack adapters missing message deduplication

3 participants