Bug Description
When /new is issued via Telegram while an agent is actively processing a message (mid-response), the session is reset correctly but the confirmation response is never sent back to the user. The bot shows "typing..." indefinitely with no reply.
Steps to Reproduce
- Send a long-running request to the bot on Telegram (a prompt that takes 10+ seconds)
- While the agent is still processing and showing "typing...", type
/new
- Observe: session resets (logs show
Invalidated run generation (new_command) + (session_reset)) but the "✨ Session reset!" confirmation is never sent
- Any subsequent message sent after this works fine — the gateway is alive and responsive
Contrast with Working Case
When no agent is running and /new is issued, it works perfectly:
✅ 00:43:36 — Invalidated run generation (session_reset)
✅ 00:43:37 — Sending response (223 chars) to 95787569
When an agent IS running, the response goes missing:
❌ 01:17:40 — Invalidated run generation (new_command)
❌ 01:17:40 — Invalidated run generation (session_reset)
⏳ (no "Sending response" follows — gaps indefinitely until next user message)
Root Cause Analysis
The /new command takes a different code path depending on whether an agent is running:
Code Path A — Agent NOT running (works)
_handle_message() → normal command dispatch at gateway/run.py line 4992: if canonical == "new": return await self._handle_reset_command(event) → response sent via _process_message_background
Code Path B — Agent IS running (broken)
handle_message() → adapter detects active session at base.py line 2553 → routes to _dispatch_active_session_command() (line 2574) for new/reset/stop → inside _handle_message(), early intercept at run.py line 4666 fires → _interrupt_and_clear_session() (line 4668) → _handle_reset_command() (line 4676) → returns EphemeralReply → back in _dispatch_active_session_command, calls cancel_session_processing() (line 2494) → attempts _send_with_retry() (line 2501)
The response is generated (the EphemeralReply from _handle_reset_command) but never reaches Telegram.
Key observations:
-
Invisible failure — The Sending response log (base.py line 2800) only fires in _process_message_background, not in the _dispatch_active_session_command path. This makes the failure invisible without deep code inspection.
-
Race in _dispatch_active_session_command — After receiving the response at line 2491, it cancels the old session task at line 2494 via cancel_session_processing(). The interaction between this cancellation and the response send at line 2501 is suspect:
cancel_session_processing pops _session_tasks[session_key] and calls task.cancel() (line 2420)
asyncio.wait_for(asyncio.shield(task), timeout=5.0) awaits the cancelled task
-
Interleaved state mutations — _interrupt_and_clear_session() (called from _handle_message at line 4668) already calls _release_running_agent_state(), adapter.interrupt_session_activity() (which sets the interrupt Event and calls stop_typing()), and adapter.get_pending_message(). These overlap with cancel_session_processing() called later by the adapter, creating a window for lost messages.
Key Code Locations
| File |
Line |
Description |
gateway/run.py |
4666-4676 |
Early intercept for /new when agent is running |
gateway/run.py |
6579-6691 |
_handle_reset_command() — works correctly in isolation |
gateway/run.py |
11096-11119 |
_interrupt_and_clear_session() — interrupts agent + clears state |
gateway/platforms/base.py |
2460-2523 |
_dispatch_active_session_command() — the race-prone dispatch |
gateway/platforms/base.py |
2393-2441 |
cancel_session_processing() — cancels old task after response received |
Suggested Fix
In _dispatch_active_session_command() (base.py line 2491-2512):
- Add logging — add a
logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id) matching the one at line 2800 so future failures are visible
- Ensure send is cancellation-safe — wrap
_send_with_retry so it cannot be affected by the concurrent cancel_session_processing at line 2494. Consider sending the response before cancelling the old task (swap lines 2491-2494 and 2499-2512)
- Verify task isolation — confirm
cancel_session_processing() targets only the background processing task and not the coroutine executing _dispatch_active_session_command()
Workaround
Wait for the current response to finish before typing /new, or use /stop first.
Environment
- OS: macOS (Silicon)
- Gateway: launchd service, Telegram polling mode
- Adapter: telegram.py (python-telegram-bot v20+)
- Provider: DeepSeek (deepseek-v4-flash)
Bug Description
When
/newis issued via Telegram while an agent is actively processing a message (mid-response), the session is reset correctly but the confirmation response is never sent back to the user. The bot shows "typing..." indefinitely with no reply.Steps to Reproduce
/newInvalidated run generation (new_command)+(session_reset)) but the "✨ Session reset!" confirmation is never sentContrast with Working Case
When no agent is running and
/newis issued, it works perfectly:When an agent IS running, the response goes missing:
Root Cause Analysis
The
/newcommand takes a different code path depending on whether an agent is running:Code Path A — Agent NOT running (works)
_handle_message()→ normal command dispatch atgateway/run.pyline 4992:if canonical == "new": return await self._handle_reset_command(event)→ response sent via_process_message_backgroundCode Path B — Agent IS running (broken)
handle_message()→ adapter detects active session atbase.pyline 2553 → routes to_dispatch_active_session_command()(line 2574) fornew/reset/stop→ inside_handle_message(), early intercept atrun.pyline 4666 fires →_interrupt_and_clear_session()(line 4668) →_handle_reset_command()(line 4676) → returnsEphemeralReply→ back in_dispatch_active_session_command, callscancel_session_processing()(line 2494) → attempts_send_with_retry()(line 2501)The response is generated (the
EphemeralReplyfrom_handle_reset_command) but never reaches Telegram.Key observations:
Invisible failure — The
Sending responselog (base.pyline 2800) only fires in_process_message_background, not in the_dispatch_active_session_commandpath. This makes the failure invisible without deep code inspection.Race in
_dispatch_active_session_command— After receiving the response at line 2491, it cancels the old session task at line 2494 viacancel_session_processing(). The interaction between this cancellation and the response send at line 2501 is suspect:cancel_session_processingpops_session_tasks[session_key]and callstask.cancel()(line 2420)asyncio.wait_for(asyncio.shield(task), timeout=5.0)awaits the cancelled taskInterleaved state mutations —
_interrupt_and_clear_session()(called from_handle_messageat line 4668) already calls_release_running_agent_state(),adapter.interrupt_session_activity()(which sets the interruptEventand callsstop_typing()), andadapter.get_pending_message(). These overlap withcancel_session_processing()called later by the adapter, creating a window for lost messages.Key Code Locations
gateway/run.py/newwhen agent is runninggateway/run.py_handle_reset_command()— works correctly in isolationgateway/run.py_interrupt_and_clear_session()— interrupts agent + clears stategateway/platforms/base.py_dispatch_active_session_command()— the race-prone dispatchgateway/platforms/base.pycancel_session_processing()— cancels old task after response receivedSuggested Fix
In
_dispatch_active_session_command()(base.pyline 2491-2512):logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)matching the one at line 2800 so future failures are visible_send_with_retryso it cannot be affected by the concurrentcancel_session_processingat line 2494. Consider sending the response before cancelling the old task (swap lines 2491-2494 and 2499-2512)cancel_session_processing()targets only the background processing task and not the coroutine executing_dispatch_active_session_command()Workaround
Wait for the current response to finish before typing
/new, or use/stopfirst.Environment