-
Notifications
You must be signed in to change notification settings - Fork 27.7k
Implement autograd functions for c10d communication operations #40702
Copy link
Copy link
Closed
Labels
featureA request for a proper, new feature.A request for a proper, new feature.module: bootcampWe plan to do a full writeup on the issue, and then get someone to do it for onboardingWe plan to do a full writeup on the issue, and then get someone to do it for onboardingoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
featureA request for a proper, new feature.A request for a proper, new feature.module: bootcampWe plan to do a full writeup on the issue, and then get someone to do it for onboardingWe plan to do a full writeup on the issue, and then get someone to do it for onboardingoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
This is inspired by #40690 and other requests we saw before. Applications might want to use c10d operations (e.g.,
all_gather,all_reduce) in the forward pass and expect they been linked in the same autograd graph. This would require implementing autograd functions for c10d operations as what's done forscatterandgatherin nn/parallel/_functions.py.cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse