fix(signal): back off sendTyping spam for unreachable recipients#12118
Merged
Conversation
base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.
Contributor
|
zebster-cmd
added a commit
to zebster-cmd/hermes-agent
that referenced
this pull request
Apr 18, 2026
…#12113, NousResearch#12116, NousResearch#12118, NousResearch#12123) - fix(signal): back off sendTyping spam for unreachable recipients - docs(terminal): warn against stacking watch_patterns + notify_on_complete - feat(steer): /steer <prompt> injects mid-run note after next tool call - docs(browser): improve /browser connect setup guidance Resolved conflicts: - run_agent.py: keep budget pressure injection + tool-repeat hint + /steer
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
…sResearch#12118) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.
aj-nt
pushed a commit
to aj-nt/hermes-agent
that referenced
this pull request
May 1, 2026
…sResearch#12118) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…sResearch#12118) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…sResearch#12118) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…sResearch#12118) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Signal
send_typingbacks off after repeated NETWORK_FAILUREs so an unreachable recipient stops producing a WARNING log every 2 seconds for as long as the agent is busy.Problem
base.py::_keep_typingrefreshes the typing indicator every ~2s while the agent processes a turn. When signal-cli returns NETWORK_FAILURE (recipient offline, unroutable, group membership lost), the unmitigated_rpcpath logs WARNING on every single refresh. A user report showed 1048 warnings in 41 minutes for one offline contact, with a matching volume of pointless RPC traffic to signal-cli.Changes
gateway/platforms/signal.py_rpc()takeslog_failures: bool = True;send_typing()tracks consecutive failures per chat and short-circuits the RPC during an exponential cooldown (16s → 32s → 60s cap) after 3 failures; first failure still logs WARNING, subsequent ones log DEBUG; success resets counters;_stop_typing_indicator()clears the backoff statetests/gateway/test_signal.pyTestSignalTypingBackoff— 5 tests covering log-level demotion, 3-failure cooldown engagement, per-chat isolation, success reset, stop-typing cleanupValidation
tests/gateway/test_signal.pyE2E simulation replays
base.py::_keep_typingcallingsend_typingevery 2s for the reported 41-minute duration against a stub_rpcthat returns the exact NETWORK_FAILURE shape from the user's log.Notes
Salvages the
_rpc(log_failures=...)kwarg idea from #12056 (credits @kshitijk4poor). The broader restructure in that PR — a second nested per-chat loop insidesend_typinginteracting with base.py's_keep_typingvia asyncio.Task cleanup — is avoided here in favour of stateful backoff that preserves the existing_keep_typingarchitecture. Closing #12056 in favour of this narrower fix; the session_search serialization half of that PR is unrelated to the reported incident (logs show aux timeouts, not 429s) and isn't included here.