Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device by fzyzcjy · Pull Request #149248 · pytorch/pytorch

fzyzcjy · 2025-03-15T06:22:20Z

This is merely a proof-of-concept PR. I would like to hear a bit of feedback - is the direction acceptable - before working on it deeper.

Things that will be added if the direction of PR looks acceptable: Unit tests, caches, implement-in-C++ (to speedup), etc.

pytorch-bot · 2025-03-15T06:22:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149248

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 98aef5c with merge base 1e37e5b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fzyzcjy · 2025-03-15T06:25:11Z

@pytorchbot label "release notes: distributed (miscellaneous)"

albanD · 2025-03-17T16:07:37Z

Let's discuss on the issue

github-actions · 2025-05-16T16:40:10Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

fzyzcjy · 2025-05-16T23:43:35Z

Hi is it possible to be merged?

albanD · 2025-05-20T18:13:25Z

@fzyzcjy I'm afraid it is not as this is very much breaking the current behavior. In particular for the many distributed users that rely on always using device=0 by setting appropriate CUDA_VISIBLE_DEVICE={rank}. This patch would make it impossible for these users to send Tensors across processes.

fzyzcjy · 2025-05-21T05:18:35Z

@albanD I see, thank you! However I do feel it to be weird: when we see such a tensor with "device=0", it indeed does not mean that it is on the 0th device, but mean no another device :/

fzyzcjy · 2025-05-21T05:18:42Z

(I also replied in sgl-project/sglang#4565)

REF: pytorch/pytorch#149248 Fix issues: 1. performance degradation when TP > 1 if sync w/o bucketing 2. CudaError when TP > 1 if sync w/ bucketing * add patch for sglang sync when TP > 1 * fix pylint

more

98aef5c

fzyzcjy mentioned this pull request Mar 15, 2025

(Will PR) Multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device #149196

Open

pytorch-bot Bot added the release notes: distributed (miscellaneous) label Mar 15, 2025

fzyzcjy mentioned this pull request Mar 15, 2025

[rollout] feat: add SGLang as rollout engine to verl verl-project/verl#490

Merged

pytorchbot added the open source label Mar 15, 2025

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 17, 2025

fzyzcjy mentioned this pull request Mar 19, 2025

Patch PyTorch's bug that cross-process tensor transfer will lead to wrong device sgl-project/sglang#4565

Merged

6 tasks

github-actions Bot added the Stale label May 16, 2025

github-actions Bot closed this Jun 20, 2025

hebiao064 mentioned this pull request Jun 25, 2025

[RL] support update_weights_from_tensor for mtp sgl-project/sglang#7415

Merged

6 tasks

SuperCB mentioned this pull request Jul 30, 2025

[fsdp, megatron, sglang] fix: Fixed a bug in the update_weight process where the GPU ID was being passed incorrectly. verl-project/verl#2620

Closed

7 tasks

lostkevin mentioned this pull request Sep 1, 2025

add patch for sglang sync when TP > 1 alibaba/ChatLearn#357

Merged

RolaoDenthu mentioned this pull request Dec 30, 2025

feat: Add SGLang rollout backend and tests NVIDIA-NeMo/RL#1674

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device#149248

Fix multiprocessing with CUDA_VISIBLE_DEVICES seems to give the wrong device#149248
fzyzcjy wants to merge 1 commit intopytorch:mainfrom
fzyzcjy:feat/ac3289

fzyzcjy commented Mar 15, 2025

Uh oh!

pytorch-bot Bot commented Mar 15, 2025 •

edited

Loading

Uh oh!

fzyzcjy commented Mar 15, 2025

Uh oh!

albanD commented Mar 17, 2025

Uh oh!

github-actions Bot commented May 16, 2025

Uh oh!

fzyzcjy commented May 16, 2025 •

edited

Loading

Uh oh!

albanD commented May 20, 2025

Uh oh!

fzyzcjy commented May 21, 2025

Uh oh!

fzyzcjy commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fzyzcjy commented Mar 15, 2025

Uh oh!

pytorch-bot Bot commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149248

✅ No Failures

Uh oh!

fzyzcjy commented Mar 15, 2025

Uh oh!

albanD commented Mar 17, 2025

Uh oh!

github-actions Bot commented May 16, 2025

Uh oh!

fzyzcjy commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented May 20, 2025

Uh oh!

fzyzcjy commented May 21, 2025

Uh oh!

fzyzcjy commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented Mar 15, 2025 •

edited

Loading

fzyzcjy commented May 16, 2025 •

edited

Loading