Skip to content

gateway run --replace race condition: multiple instances run simultaneously #11718

@AJV20

Description

@AJV20

Description

When starting the gateway with --replace, a race condition can leave multiple gateway instances running simultaneously. This triggers Telegram (and likely other platform) polling conflicts and causes the bot to become unresponsive.

Steps to Reproduce

  1. Start the gateway normally (e.g. via launchd/systemd)
  2. A second instance starts with --replace (e.g. manual restart or service restart overlap)
  3. Both processes remain alive simultaneously

Actual Behavior

Multiple processes run at once (observed PIDs 548, 4101, and 4188 all alive simultaneously). Repeated errors in logs:

WARNING gateway.platforms.telegram: [Telegram] Telegram polling conflict (1/3), will retry in 10s.
Error: Conflict: terminated by other getUpdates request; make sure that only one bot instance is running

Expected Behavior

The old process should be fully terminated before the new one starts polling.

Root Cause

In start_gateway() (gateway/run.py), the new process writes its PID to the PID file before the old process has exited. A racing second --replace invocation then reads its own PID from the file (instead of the old process PID), so it skips the termination step and both instances run.

Environment

  • Platform: macOS (darwin)
  • Triggered by launchd auto-restart overlapping with a manual gateway run --replace

Suggested Fix

Write the new PID to the PID file only after the old process has been confirmed dead, or use a separate lock file that is held for the duration of the transition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions