Skip to content

Race condition: /new during active agent session never sends response (Telegram gateway) #18912

@ec812

Description

@ec812

Bug Description

When /new is issued via Telegram while an agent is actively processing a message (mid-response), the session is reset correctly but the confirmation response is never sent back to the user. The bot shows "typing..." indefinitely with no reply.

Steps to Reproduce

  1. Send a long-running request to the bot on Telegram (a prompt that takes 10+ seconds)
  2. While the agent is still processing and showing "typing...", type /new
  3. Observe: session resets (logs show Invalidated run generation (new_command) + (session_reset)) but the "✨ Session reset!" confirmation is never sent
  4. Any subsequent message sent after this works fine — the gateway is alive and responsive

Contrast with Working Case

When no agent is running and /new is issued, it works perfectly:

✅ 00:43:36 — Invalidated run generation (session_reset)
✅ 00:43:37 — Sending response (223 chars) to 95787569

When an agent IS running, the response goes missing:

❌ 01:17:40 — Invalidated run generation (new_command)
❌ 01:17:40 — Invalidated run generation (session_reset)
⏳ (no "Sending response" follows — gaps indefinitely until next user message)

Root Cause Analysis

The /new command takes a different code path depending on whether an agent is running:

Code Path A — Agent NOT running (works)

_handle_message() → normal command dispatch at gateway/run.py line 4992: if canonical == "new": return await self._handle_reset_command(event) → response sent via _process_message_background

Code Path B — Agent IS running (broken)

handle_message() → adapter detects active session at base.py line 2553 → routes to _dispatch_active_session_command() (line 2574) for new/reset/stop → inside _handle_message(), early intercept at run.py line 4666 fires → _interrupt_and_clear_session() (line 4668) → _handle_reset_command() (line 4676) → returns EphemeralReply → back in _dispatch_active_session_command, calls cancel_session_processing() (line 2494) → attempts _send_with_retry() (line 2501)

The response is generated (the EphemeralReply from _handle_reset_command) but never reaches Telegram.

Key observations:

  1. Invisible failure — The Sending response log (base.py line 2800) only fires in _process_message_background, not in the _dispatch_active_session_command path. This makes the failure invisible without deep code inspection.

  2. Race in _dispatch_active_session_command — After receiving the response at line 2491, it cancels the old session task at line 2494 via cancel_session_processing(). The interaction between this cancellation and the response send at line 2501 is suspect:

    • cancel_session_processing pops _session_tasks[session_key] and calls task.cancel() (line 2420)
    • asyncio.wait_for(asyncio.shield(task), timeout=5.0) awaits the cancelled task
  3. Interleaved state mutations_interrupt_and_clear_session() (called from _handle_message at line 4668) already calls _release_running_agent_state(), adapter.interrupt_session_activity() (which sets the interrupt Event and calls stop_typing()), and adapter.get_pending_message(). These overlap with cancel_session_processing() called later by the adapter, creating a window for lost messages.

Key Code Locations

File Line Description
gateway/run.py 4666-4676 Early intercept for /new when agent is running
gateway/run.py 6579-6691 _handle_reset_command() — works correctly in isolation
gateway/run.py 11096-11119 _interrupt_and_clear_session() — interrupts agent + clears state
gateway/platforms/base.py 2460-2523 _dispatch_active_session_command() — the race-prone dispatch
gateway/platforms/base.py 2393-2441 cancel_session_processing() — cancels old task after response received

Suggested Fix

In _dispatch_active_session_command() (base.py line 2491-2512):

  1. Add logging — add a logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id) matching the one at line 2800 so future failures are visible
  2. Ensure send is cancellation-safe — wrap _send_with_retry so it cannot be affected by the concurrent cancel_session_processing at line 2494. Consider sending the response before cancelling the old task (swap lines 2491-2494 and 2499-2512)
  3. Verify task isolation — confirm cancel_session_processing() targets only the background processing task and not the coroutine executing _dispatch_active_session_command()

Workaround

Wait for the current response to finish before typing /new, or use /stop first.

Environment

  • OS: macOS (Silicon)
  • Gateway: launchd service, Telegram polling mode
  • Adapter: telegram.py (python-telegram-bot v20+)
  • Provider: DeepSeek (deepseek-v4-flash)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliveryplatform/telegramTelegram bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions