Problems in TensorPipeRpcBackendOptions device mapping documentation?

## 📚 Documentation

The new release of PyTorch 1.8 introduces CUDA-support in RPC.
I've referred to the RPC documentation, and the only reference for the CUDA-support I could find is under [`TensorPipeRpcBackendOptions`](https://pytorch.org/docs/1.8.0/rpc.html#torch.distributed.rpc.TensorPipeRpcBackendOptions) and [`set_device_map`](https://pytorch.org/docs/1.8.0/rpc.html#torch.distributed.rpc.TensorPipeRpcBackendOptions.set_device_map).
Seems like setting up CUDA-support is simply done by supplying a device mapping in the `TensorPipeRpcBackendOptions`, pretty cool.

However, I find the documentation for the `device_maps`/`device_map` to be unclear. It seems that `TensorPipeRpcBackendOptions`'s `device_maps` is a dictionary where the keys are worker names, but I'm not exactly sure what the structure of the dictionary's values should be like? Supposedly each value should be some sort of dictionary (as indicated by the parameter's type - `Dict[str, Dict]`), yet the example code provides a set:  `device_maps={"worker1": {0, 1}}`. I don't really understand how does this "map worker0's cuda:0 to worker1's cuda:1"?

Same for `set_device_map`'s `device_map`, the parameter's type also indicates it's a dictionary (`(Dict of python:int, str, or torch.device)`), but doesn't quite explain its structure. And again, the example code provides a set: `options.set_device_map("worker1", {1, 2})`.

It is also not explained how to define a GPU->CPU mapping (or vice versa).

Apart for this, there are 2 obvious errors in the example code provided in that documentation:

1. There is a missing comma in the following part:
```python
>>> rpc.init_rpc(
>>>     "worker0",
>>>     rank=0,
>>>     world_size=2  # <-- missing comma
>>>     backend=rpc.BackendType.TENSORPIPE,
>>>     rpc_backend_options=options
>>> )
```
2. I don't see how it is possible that those two `print`s will give different results. I'm guessing that the second line should read `print(rets[1])`?
```python
>>> print(rets[0])  # tensor([2., 2.], device='cuda:0')
>>> print(rets[0])  # tensor([2., 2.], device='cuda:1')
```

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @jjlilley @osalpekar @jiayisuse @mrzzd @agolynski @SciPioneer @H-Huang @cbalioglu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems in TensorPipeRpcBackendOptions device mapping documentation? #53501

📚 Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems in TensorPipeRpcBackendOptions device mapping documentation? #53501

Description

📚 Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions