Skip to content

fix(signal): back off sendTyping spam for unreachable recipients#12118

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1a3c1633
Apr 18, 2026
Merged

fix(signal): back off sendTyping spam for unreachable recipients#12118
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1a3c1633

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Signal send_typing backs off after repeated NETWORK_FAILUREs so an unreachable recipient stops producing a WARNING log every 2 seconds for as long as the agent is busy.

Problem

base.py::_keep_typing refreshes the typing indicator every ~2s while the agent processes a turn. When signal-cli returns NETWORK_FAILURE (recipient offline, unroutable, group membership lost), the unmitigated _rpc path logs WARNING on every single refresh. A user report showed 1048 warnings in 41 minutes for one offline contact, with a matching volume of pointless RPC traffic to signal-cli.

Changes

File What
gateway/platforms/signal.py _rpc() takes log_failures: bool = True; send_typing() tracks consecutive failures per chat and short-circuits the RPC during an exponential cooldown (16s → 32s → 60s cap) after 3 failures; first failure still logs WARNING, subsequent ones log DEBUG; success resets counters; _stop_typing_indicator() clears the backoff state
tests/gateway/test_signal.py TestSignalTypingBackoff — 5 tests covering log-level demotion, 3-failure cooldown engagement, per-chat isolation, success reset, stop-typing cleanup

Validation

Before After
RPCs in 41-min offline window 1230 45 (-96%)
WARNING log lines 1048 1
DEBUG log lines 0 44
tests/gateway/test_signal.py 57 passed 62 passed

E2E simulation replays base.py::_keep_typing calling send_typing every 2s for the reported 41-minute duration against a stub _rpc that returns the exact NETWORK_FAILURE shape from the user's log.

Notes

Salvages the _rpc(log_failures=...) kwarg idea from #12056 (credits @kshitijk4poor). The broader restructure in that PR — a second nested per-chat loop inside send_typing interacting with base.py's _keep_typing via asyncio.Task cleanup — is avoided here in favour of stateful backoff that preserves the existing _keep_typing architecture. Closing #12056 in favour of this narrower fix; the session_search serialization half of that PR is unrelated to the reported incident (logs show aux timeouts, not 429s) and isn't included here.

base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.

- _rpc() accepts log_failures=False so callers can route repeated
  expected failures (typing) to DEBUG while keeping send/receive at
  WARNING.
- send_typing() tracks consecutive failures per chat. First failure
  still logs WARNING so transport issues remain visible; subsequent
  failures log at DEBUG. After three consecutive failures we skip the
  RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
  hammering signal-cli for a recipient it can't deliver to. A
  successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
  turn starts fresh.

E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.

Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1 teknium1 merged commit 9527707 into main Apr 18, 2026
4 of 5 checks passed
@teknium1 teknium1 deleted the hermes/hermes-1a3c1633 branch April 18, 2026 11:13
zebster-cmd added a commit to zebster-cmd/hermes-agent that referenced this pull request Apr 18, 2026
…#12113, NousResearch#12116, NousResearch#12118, NousResearch#12123)

- fix(signal): back off sendTyping spam for unreachable recipients
- docs(terminal): warn against stacking watch_patterns + notify_on_complete
- feat(steer): /steer <prompt> injects mid-run note after next tool call
- docs(browser): improve /browser connect setup guidance

Resolved conflicts:
- run_agent.py: keep budget pressure injection + tool-repeat hint + /steer
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…sResearch#12118)

base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.

- _rpc() accepts log_failures=False so callers can route repeated
  expected failures (typing) to DEBUG while keeping send/receive at
  WARNING.
- send_typing() tracks consecutive failures per chat. First failure
  still logs WARNING so transport issues remain visible; subsequent
  failures log at DEBUG. After three consecutive failures we skip the
  RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
  hammering signal-cli for a recipient it can't deliver to. A
  successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
  turn starts fresh.

E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.

Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…sResearch#12118)

base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.

- _rpc() accepts log_failures=False so callers can route repeated
  expected failures (typing) to DEBUG while keeping send/receive at
  WARNING.
- send_typing() tracks consecutive failures per chat. First failure
  still logs WARNING so transport issues remain visible; subsequent
  failures log at DEBUG. After three consecutive failures we skip the
  RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
  hammering signal-cli for a recipient it can't deliver to. A
  successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
  turn starts fresh.

E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.

Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…sResearch#12118)

base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.

- _rpc() accepts log_failures=False so callers can route repeated
  expected failures (typing) to DEBUG while keeping send/receive at
  WARNING.
- send_typing() tracks consecutive failures per chat. First failure
  still logs WARNING so transport issues remain visible; subsequent
  failures log at DEBUG. After three consecutive failures we skip the
  RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
  hammering signal-cli for a recipient it can't deliver to. A
  successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
  turn starts fresh.

E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.

Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…sResearch#12118)

base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.

- _rpc() accepts log_failures=False so callers can route repeated
  expected failures (typing) to DEBUG while keeping send/receive at
  WARNING.
- send_typing() tracks consecutive failures per chat. First failure
  still logs WARNING so transport issues remain visible; subsequent
  failures log at DEBUG. After three consecutive failures we skip the
  RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
  hammering signal-cli for a recipient it can't deliver to. A
  successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
  turn starts fresh.

E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.

Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…sResearch#12118)

base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.

- _rpc() accepts log_failures=False so callers can route repeated
  expected failures (typing) to DEBUG while keeping send/receive at
  WARNING.
- send_typing() tracks consecutive failures per chat. First failure
  still logs WARNING so transport issues remain visible; subsequent
  failures log at DEBUG. After three consecutive failures we skip the
  RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
  hammering signal-cli for a recipient it can't deliver to. A
  successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
  turn starts fresh.

E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.

Credits kshitijk4poor (NousResearch#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant