fix(base): retry transient send failures and notify user on exhaustion#3108
Closed
Mibayy wants to merge 1 commit into
Closed
fix(base): retry transient send failures and notify user on exhaustion#3108Mibayy wants to merge 1 commit into
Mibayy wants to merge 1 commit into
Conversation
Fixes silent delivery failures where a network error during send()
left the user with no feedback, appearing as a hang or crash.
## Changes
BasePlatformAdapter now has two new helpers:
_is_retryable_error(error: str) -> bool
Detects transient network errors by matching known substrings
(ConnectError, timeout, ConnectionReset, BrokenPipe, etc.)
_send_with_retry(chat_id, content, ..., max_retries=2, base_delay=2.0)
- Success on first attempt: returns immediately (no overhead)
- Transient error (network): retries up to max_retries times with
exponential backoff + jitter. On exhaustion, sends the user a
delivery-failure notice so they know to retry rather than waiting.
- Permanent error (formatting/permission): falls back to plain-text
version immediately, without entering the retry loop.
- SendResult.retryable=True respected for platform-specific retryable
errors that don't match string patterns.
handle_message() now calls _send_with_retry() instead of send() directly.
## User experience
Before: network blip during send → silent failure, user waits 1+ hour
After: network blip → 2 automatic retries → if still failing, user
receives '⚠️ Message delivery failed after multiple attempts.
Please try again — your request was processed but the response
could not be sent.'
Closes NousResearch#2910
teknium1
pushed a commit
that referenced
this pull request
Mar 26, 2026
…tion When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR #3108 by Mibayy.
teknium1
added a commit
that referenced
this pull request
Mar 27, 2026
…tion (#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR #3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Contributor
|
Merged via PR #3288. Your commit was cherry-picked onto current main with authorship preserved. A few improvements on top: removed unused |
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…tion (NousResearch#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (NousResearch#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR NousResearch#3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…tion (NousResearch#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (NousResearch#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR NousResearch#3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…tion (NousResearch#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (NousResearch#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR NousResearch#3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…tion (NousResearch#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (NousResearch#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR NousResearch#3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…tion (NousResearch#3288) When send() fails due to a network error (ConnectError, ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang. In one reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver (NousResearch#2910). Adds _send_with_retry() to BasePlatformAdapter: - Transient errors: retry up to 2x with exponential backoff + jitter - On exhaustion: send delivery-failure notice so user knows to retry - Permanent errors: fall back to plain-text version (preserves existing behavior) - SendResult.retryable flag for platform-specific transient errors All adapters benefit automatically via BasePlatformAdapter inheritance. Cherry-picked from PR NousResearch#3108 by Mibayy. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2910
Problem
When
send()fails due to a network error (ConnectError,ReadTimeout, etc.), the failure was silently logged and the user received no feedback — appearing as a hang or crash. In the reported case, a user waited 1+ hour for a response that had already been generated but failed to deliver.Fix
Two additions to
BasePlatformAdapter:_is_retryable_error(error)— detects transient network failures by matching substrings (connecterror,timeout,connectionreset,broken pipe, etc.)_send_with_retry()— wrapssend()with three code paths:handle_message()now calls_send_with_retry()instead ofsend()directly. All existing adapters benefit automatically — no per-adapter changes needed.Platform adapters can also set
SendResult.retryable=Truefor platform-specific transient errors that don't match string patterns.User experience
Before: network blip during send → silent failure, user waits indefinitely
After: network blip → 2 automatic retries (2s, 4s backoff) → if still failing:
Tests
26 new tests in
tests/gateway/test_send_retry.pycovering all paths. 1431 existing tests pass.