Implement autograd functions for c10d communication operations

This is inspired by #40690 and other requests we saw before. Applications might want to use c10d operations (e.g., `all_gather`, `all_reduce`) in the forward pass and expect they been linked in the same autograd graph. This would require implementing autograd functions for c10d operations as what's done for `scatter` and `gather` in [nn/parallel/_functions.py](https://github.com/pytorch/pytorch/blob/b35cdc5200af963e410c0a25400fd07f30b89bca/torch/nn/parallel/_functions.py).

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement autograd functions for c10d communication operations #40702

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement autograd functions for c10d communication operations #40702

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions