Description
When starting the gateway with --replace, a race condition can leave multiple gateway instances running simultaneously. This triggers Telegram (and likely other platform) polling conflicts and causes the bot to become unresponsive.
Steps to Reproduce
- Start the gateway normally (e.g. via launchd/systemd)
- A second instance starts with
--replace (e.g. manual restart or service restart overlap)
- Both processes remain alive simultaneously
Actual Behavior
Multiple processes run at once (observed PIDs 548, 4101, and 4188 all alive simultaneously). Repeated errors in logs:
WARNING gateway.platforms.telegram: [Telegram] Telegram polling conflict (1/3), will retry in 10s.
Error: Conflict: terminated by other getUpdates request; make sure that only one bot instance is running
Expected Behavior
The old process should be fully terminated before the new one starts polling.
Root Cause
In start_gateway() (gateway/run.py), the new process writes its PID to the PID file before the old process has exited. A racing second --replace invocation then reads its own PID from the file (instead of the old process PID), so it skips the termination step and both instances run.
Environment
- Platform: macOS (darwin)
- Triggered by launchd auto-restart overlapping with a manual
gateway run --replace
Suggested Fix
Write the new PID to the PID file only after the old process has been confirmed dead, or use a separate lock file that is held for the duration of the transition.
Description
When starting the gateway with
--replace, a race condition can leave multiple gateway instances running simultaneously. This triggers Telegram (and likely other platform) polling conflicts and causes the bot to become unresponsive.Steps to Reproduce
--replace(e.g. manual restart or service restart overlap)Actual Behavior
Multiple processes run at once (observed PIDs 548, 4101, and 4188 all alive simultaneously). Repeated errors in logs:
Expected Behavior
The old process should be fully terminated before the new one starts polling.
Root Cause
In
start_gateway()(gateway/run.py), the new process writes its PID to the PID file before the old process has exited. A racing second--replaceinvocation then reads its own PID from the file (instead of the old process PID), so it skips the termination step and both instances run.Environment
gateway run --replaceSuggested Fix
Write the new PID to the PID file only after the old process has been confirmed dead, or use a separate lock file that is held for the duration of the transition.