torch.distrubuted: lazy import pdb only when user calls breakpoint() by kelu-wandb · Pull Request #171818 · pytorch/pytorch

kelu-wandb · 2026-01-06T20:58:22Z

Makes torch/distributed/__init__.py only import pdb when needed, because we should avoid debugging-specific dependencies in production code.

In Python 3.13.1 through 3.13.7, this also avoids the following chain of imports from :
torch -> torch.distributed -> pdb -> rlcompleter -> readline

Importing readline, in turn, attempts to access stdin, which deadlocks if run from a subprocess launched with process_group=0 or preexec_fn=setpgrp because it doesn't have access to stdin.

Python 3.13.8 fixed the pdb -> rlcompleter -> readline dependency, but it's still good to import pdb only when necessary.

Testing

(All tests below on Mac.)

Test script:

deadline_minimal.py:

import sys
import subprocess

if __name__ == "__main__":
    code = """
print('importing torch...')
import sys
import torch
print('imported torch.')
if "pdb" in sys.modules:
    print("ERROR: pdb imported")
    exit(1)
"""

    kwargs = dict(process_group=0)
    proc = subprocess.Popen([sys.executable, "-c", code], **kwargs)
    try:
        proc.communicate(timeout=20)
        if proc.returncode == 0:
            print("PASS")
        else:
            print("FAIL")
    except subprocess.TimeoutExpired:
        print("FAIL: Process deadlocked after 20 seconds")
        proc.kill()

Failure repro: python 3.13.7, old pytorch

Deadlocks:

% conda create -n "pytorch-pdb-3.13.7" python=3.13.7 numpy pytorch -c conda-forge -y
% conda activate pytorch-pdb-3.13.7
% python deadlock_minimal.py
importing torch...
FAIL: Process deadlocked after 10 seconds

Failure repro: python 3.13.8, new pytorch

Does not deadlock due to underlying python fix, but still imports pdb:

% conda create -n "pytorch-pdb-3.13.8" python=3.13.8 numpy pytorch -c conda-forge -y
% conda activate pytorch-pdb-3.13.8
% python deadlock_minimal.py
imported torch.
ERROR: pdb imported
FAIL

Fix confirmation: python 3.13.7, new pytorch

No longer deadlocks, does not import pdb.

% conda create -n "pytorch-3.13.7" python=3.13.7
% conda activate pytorch-3.13.7
% pip install --group dev
% conda install pkg-config libuv
% USE_DISTRIBUTED=1 python -m pip install --no-build-isolation -v -e .
% python deadlock_minimal.py
importing torch...
imported torch.
PASS

% conda create -n "pytorch-3.13.11" python=3.13.11
% conda activate pytorch-3.13.11
% pip install --group dev
% conda install pkg-config libuv
% USE_DISTRIBUTED=1 python -m pip install --no-build-isolation -v -e .
% python deadlock_minimal.py
importing torch...
imported torch.
PASS

Test that `torch.distributed.breakpoint()` still works:

torch_breakpoint.py:

import sys
import torch.distributed as dist
print(f"is available: {dist.is_available()}")
dist.init_process_group()
dist.breakpoint(rank = 0)
print(f"pdb imported after breakpoint: {"pdb" in sys.modules}")

Then built with distributed on Mac and did a basic test:

% USE_DISTRIBUTED=1 python setup.py build --cmake
% RANK=0 WORLD_SIZE=1 MASTER_ADDR=127.0.0.1 MASTER_PORT=49999 python torch_breakpoint.py
is available: True
# snipped some errors due to not actually setting up a full scenario
> /Users/kelu/kelu-wandb/pytorch/torch/distributed/__init__.py(121)breakpoint()
-> pdb.set_trace()
(Pdb)

pytorch-bot · 2026-01-06T20:58:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171818

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 35812da with merge base 7e5e018 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-01-06T20:58:28Z

The committers listed above are authorized under a signed CLA.

✅ login: kelu-wandb / name: kelu-wandb (2928b4a, 35812da)

torch/distributed/__init__.py

kelu-wandb

Fixed based on comment.

torch/distributed/__init__.py

kelu-wandb · 2026-01-06T22:40:06Z

(This is take 2 of PR #163000, which expired because I didn't get to the CLA signing in time. Same code, but re-tested.)

kelu-wandb · 2026-01-06T22:43:10Z

@pytorchbot label "release notes: distributed (c10d)"

ezyang

thanks, appreciated

ezyang · 2026-01-07T05:26:01Z

@pytorchbot merge

pytorchmergebot · 2026-01-07T05:28:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#171818) Fixes pytorch#159645 Makes `torch/distributed/__init__.py` only import `pdb` when needed, because we should avoid debugging-specific dependencies in production code. In Python 3.13.1 through 3.13.7, this also avoids the following chain of imports from : `torch` -> `torch.distributed` -> `pdb` -> `rlcompleter` -> `readline` Importing `readline`, in turn, attempts to access stdin, which deadlocks if run from a subprocess launched with `process_group=0` or `preexec_fn=setpgrp` because it doesn't have access to stdin. Python 3.13.8 [fixed the `pdb` -> `rlcompleter` -> `readline` dependency](python/cpython#139280), but it's still good to import `pdb` only when necessary. ## Testing (All tests below on Mac.) ### Test script: `deadline_minimal.py`: ``` import sys import subprocess if __name__ == "__main__": code = """ print('importing torch...') import sys import torch print('imported torch.') if "pdb" in sys.modules: print("ERROR: pdb imported") exit(1) """ kwargs = dict(process_group=0) proc = subprocess.Popen([sys.executable, "-c", code], **kwargs) try: proc.communicate(timeout=20) if proc.returncode == 0: print("PASS") else: print("FAIL") except subprocess.TimeoutExpired: print("FAIL: Process deadlocked after 20 seconds") proc.kill() ``` ### Failure repro: python 3.13.7, old pytorch Deadlocks: ``` % conda create -n "pytorch-pdb-3.13.7" python=3.13.7 numpy pytorch -c conda-forge -y % conda activate pytorch-pdb-3.13.7 % python deadlock_minimal.py importing torch... FAIL: Process deadlocked after 10 seconds ``` ### Failure repro: python 3.13.8, new pytorch Does not deadlock due to underlying python fix, but still imports pdb: ``` % conda create -n "pytorch-pdb-3.13.8" python=3.13.8 numpy pytorch -c conda-forge -y % conda activate pytorch-pdb-3.13.8 % python deadlock_minimal.py imported torch. ERROR: pdb imported FAIL ``` ### Fix confirmation: python 3.13.7, new pytorch No longer deadlocks, does not import pdb. ``` % conda create -n "pytorch-3.13.7" python=3.13.7 % conda activate pytorch-3.13.7 % pip install --group dev % conda install pkg-config libuv % USE_DISTRIBUTED=1 python -m pip install --no-build-isolation -v -e . % python deadlock_minimal.py importing torch... imported torch. PASS ``` ``` % conda create -n "pytorch-3.13.11" python=3.13.11 % conda activate pytorch-3.13.11 % pip install --group dev % conda install pkg-config libuv % USE_DISTRIBUTED=1 python -m pip install --no-build-isolation -v -e . % python deadlock_minimal.py importing torch... imported torch. PASS ``` ### Test that `torch.distributed.breakpoint()` still works: `torch_breakpoint.py`: ``` import sys import torch.distributed as dist print(f"is available: {dist.is_available()}") dist.init_process_group() dist.breakpoint(rank = 0) print(f"pdb imported after breakpoint: {"pdb" in sys.modules}") ``` Then built with distributed on Mac and did a basic test: ``` % USE_DISTRIBUTED=1 python setup.py build --cmake % RANK=0 WORLD_SIZE=1 MASTER_ADDR=127.0.0.1 MASTER_PORT=49999 python torch_breakpoint.py is available: True # snipped some errors due to not actually setting up a full scenario > /Users/kelu/kelu-wandb/pytorch/torch/distributed/__init__.py(121)breakpoint() -> pdb.set_trace() (Pdb) ``` Pull Request resolved: pytorch#171818 Approved by: https://github.com/ezyang

lazy import pdb only when calling breakpoint()

2928b4a

pytorchbot added the open source label Jan 6, 2026

Skylion007 reviewed Jan 6, 2026

View reviewed changes

torch/distributed/__init__.py Outdated Show resolved Hide resolved

kelu-wandb changed the title ~~torch.distrubuted: lazy import pdb only when calling breakpoint() (take 2)~~ torch.distrubuted: lazy import pdb only when user calls breakpoint() Jan 6, 2026

Bring back context manager change

35812da

kelu-wandb commented Jan 6, 2026

View reviewed changes

torch/distributed/__init__.py Outdated Show resolved Hide resolved

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Jan 6, 2026

kelu-wandb marked this pull request as ready for review January 6, 2026 22:44

kelu-wandb requested a review from Skylion007 January 7, 2026 00:02

kelu-wandb mentioned this pull request Jan 7, 2026

torch.distrubuted: lazy import pdb only when calling breakpoint() #163000

Closed

ezyang approved these changes Jan 7, 2026

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 7, 2026

pytorchmergebot added the merging label Jan 7, 2026

pytorchmergebot added the Merged label Jan 7, 2026

pytorchmergebot closed this in c2e8056 Jan 7, 2026

pytorchmergebot removed the merging label Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.distrubuted: lazy import pdb only when user calls breakpoint()#171818

torch.distrubuted: lazy import pdb only when user calls breakpoint()#171818
kelu-wandb wants to merge 2 commits intopytorch:mainfrom
kelu-wandb:distributed-lazy-import-pdb-2

kelu-wandb commented Jan 6, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

kelu-wandb left a comment

Uh oh!

Uh oh!

kelu-wandb commented Jan 6, 2026

Uh oh!

kelu-wandb commented Jan 6, 2026

Uh oh!

ezyang left a comment

Uh oh!

ezyang commented Jan 7, 2026

Uh oh!

pytorchmergebot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kelu-wandb commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Test script:

Failure repro: python 3.13.7, old pytorch

Failure repro: python 3.13.8, new pytorch

Fix confirmation: python 3.13.7, new pytorch

Test that torch.distributed.breakpoint() still works:

Uh oh!

pytorch-bot bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171818

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kelu-wandb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kelu-wandb commented Jan 6, 2026

Uh oh!

kelu-wandb commented Jan 6, 2026

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jan 7, 2026

Uh oh!

pytorchmergebot commented Jan 7, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kelu-wandb commented Jan 6, 2026 •

edited

Loading

Test that `torch.distributed.breakpoint()` still works:

pytorch-bot bot commented Jan 6, 2026 •

edited

Loading

linux-foundation-easycla bot commented Jan 6, 2026 •

edited

Loading