Skip to content

[C10D] Separate deviceKey from streamKey in getNCCLComm#129435

Closed
wconstab wants to merge 2 commits intogh/wconstab/313/basefrom
gh/wconstab/313/head
Closed

[C10D] Separate deviceKey from streamKey in getNCCLComm#129435
wconstab wants to merge 2 commits intogh/wconstab/313/basefrom
gh/wconstab/313/head

Conversation

@wconstab
Copy link
Copy Markdown
Contributor

@wconstab wconstab commented Jun 25, 2024

Stack from ghstack (oldest at bottom):

The paves the way for a set of changes that would let us use the same
nccl communicator for collectives and p2p ops (and have that be eagerly
initialized instead of lazily initialized), but always use a dedicated
nccl stream for p2p operations with different ranks to ensure they can
overlap.

This PR only adds a new streamKey that defaults to the value of the
deviceKey, so should not affect runtime behavior at all.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @d4l3k @c-p-i-o @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @penguinwu @tianyu-l @yf225 @chauhang

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jun 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129435

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6909683 with merge base failed to retrieve merge base, please contact dev infra:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Jun 25, 2024
[ghstack-poisoned]
@github-actions
Copy link
Copy Markdown
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale label Aug 24, 2024
@github-actions github-actions Bot closed this Sep 23, 2024
@github-actions github-actions Bot deleted the gh/wconstab/313/head branch October 25, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant