You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two bugs combine to create an infinite loop / resource exhaustion in the gateway when the agent hits repeated errors (e.g., context overflow from #813):
1. Log handler duplication — every log line written N times after N messages
AIAgent.__init__() unconditionally calls logging.getLogger().addHandler(_error_file_handler) (line ~310 in run_agent.py). The gateway creates a new AIAgent for every incoming message. After 20 messages in a session, every single log line gets written 20 times to errors.log.
Evidence from production — the same error line repeated 22+ times simultaneously:
2026-03-10 01:58:17,956 ERROR root: API call failed after 6 retries...
2026-03-10 01:58:17,956 ERROR root: API call failed after 6 retries...
2026-03-10 01:58:17,956 ERROR root: API call failed after 6 retries...
... (22 identical lines at the same timestamp)
2. Unbounded interrupt recursion in _run_agent
When a new message arrives while the agent is processing, the gateway interrupts the current run and _run_agentrecursively calls itself (line ~2895 in gateway/run.py) with the pending message. There is no depth limit.
If the agent keeps failing (context too large, API returning 400/502, etc.) and the user sends more messages, each recursive call spawns a new agent that also fails, which can be interrupted by yet another message, recursing indefinitely.
Steps to Reproduce
Have an active Discord/Telegram session with a large conversation history
Added _interrupt_depth parameter to _run_agent, capped at MAX_INTERRUPT_DEPTH = 3:
if_interrupt_depth>=MAX_INTERRUPT_DEPTH:
return {
"final_response": "Too many rapid messages while processing. Please wait a moment and try again.",
...
}
The recursive call now passes _interrupt_depth=_interrupt_depth + 1.
Bug Description
Two bugs combine to create an infinite loop / resource exhaustion in the gateway when the agent hits repeated errors (e.g., context overflow from #813):
1. Log handler duplication — every log line written N times after N messages
AIAgent.__init__()unconditionally callslogging.getLogger().addHandler(_error_file_handler)(line ~310 inrun_agent.py). The gateway creates a new AIAgent for every incoming message. After 20 messages in a session, every single log line gets written 20 times toerrors.log.Evidence from production — the same error line repeated 22+ times simultaneously:
2. Unbounded interrupt recursion in
_run_agentWhen a new message arrives while the agent is processing, the gateway interrupts the current run and
_run_agentrecursively calls itself (line ~2895 ingateway/run.py) with the pending message. There is no depth limit.If the agent keeps failing (context too large, API returning 400/502, etc.) and the user sends more messages, each recursive call spawns a new agent that also fails, which can be interrupted by yet another message, recursing indefinitely.
Steps to Reproduce
_run_agentcallerrors.logfills with N*N duplicate lines, CPU spinsFix Applied
Log handler deduplication (
run_agent.py)Added a sentinel attribute (
_hermes_error_log) to the handler and a check before adding:Interrupt recursion depth cap (
gateway/run.py)Added
_interrupt_depthparameter to_run_agent, capped atMAX_INTERRUPT_DEPTH = 3:The recursive call now passes
_interrupt_depth=_interrupt_depth + 1.Related