torch.distributed NCCL backend does not support bitwise reduction ops

## 🐛 Bug

The documentation at https://pytorch.org/docs/stable/distributed.html specifies that BAND, BOR, BXOR are supported reduction operators, however, they do not work with `all_reduce` using the NCCL backend.

We can see in the [code](https://github.com/pytorch/pytorch/blob/c451ddaeda3e58d0f51d4026add4de1b3cf9657d/torch/lib/c10d/ProcessGroupNCCL.cpp#L39) that there is no mapping for the bitwise operators, and we [use this mapping](https://github.com/pytorch/pytorch/blob/c451ddaeda3e58d0f51d4026add4de1b3cf9657d/torch/lib/c10d/ProcessGroupNCCL.cpp#L744) to get the nccl operation to run. What happens when the mapping is not specified is that the map attempts to default construct a `ncclRedOp_t` type (https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html#ncclredop-t) and ends up incorrectly mapping these reduction types to `ncclSum`. This will mean that if we use these bitwise reduction ops we will just end up doing a sum. 

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.distributed NCCL backend does not support bitwise reduction ops #41362

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.distributed NCCL backend does not support bitwise reduction ops #41362

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions