🚀 Feature
The autograd profiler does not cover torch.distributed ops like allreduce, allgather etc. Adding support for this would be invaluable for debugging performance issues.
We should cover all the collective and point to point operations listed here: https://pytorch.org/docs/stable/distributed.html.
A nice extension for the autograd profiler in the distributed setting would be to have an API like torch.distributed.combine_profiles, where a single rank (ex: rank 0) can pull the autograd profiles for all other ranks and provide users with a single chrome trace view which displays the trace across all nodes.
Motivation
The autograd profiler has been an invaluable tool in PyTorch to debug performance issues and extending this to torch.distributed would be beneficial to users.
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski
🚀 Feature
The autograd profiler does not cover
torch.distributedops like allreduce, allgather etc. Adding support for this would be invaluable for debugging performance issues.We should cover all the collective and point to point operations listed here: https://pytorch.org/docs/stable/distributed.html.
A nice extension for the autograd profiler in the distributed setting would be to have an API like
torch.distributed.combine_profiles, where a single rank (ex: rank 0) can pull the autograd profiles for all other ranks and provide users with a single chrome trace view which displays the trace across all nodes.Motivation
The autograd profiler has been an invaluable tool in PyTorch to debug performance issues and extending this to
torch.distributedwould be beneficial to users.cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski