$ python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=1234 abcd.py
Traceback (most recent call last):
File "abcd.py", line 18, in <module>
torch.save(model, 'model.pt')
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 209, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 134, in _with_file_like
return body(f)
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 209, in <lambda>
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 282, in _save
pickler.dump(obj)
AttributeError: Can't pickle local object 'DistributedDataParallel._register_nccl_grad_hook.<locals>.allreduce_hook'
Traceback (most recent call last):
File "abcd.py", line 18, in <module>
torch.save(model, 'model.pt')
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 209, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 134, in _with_file_like
return body(f)
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 209, in <lambda>
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/michaelp/.local/lib/python3.6/site-packages/torch/serialization.py", line 282, in _save
pickler.dump(obj)
AttributeError: Can't pickle local object 'DistributedDataParallel._register_nccl_grad_hook.<locals>.allreduce_hook'
import argparse
import torch
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--local_rank", type=int)
args = parser.parse_args()
device = torch.device('cuda', args.local_rank)
torch.distributed.init_process_group(backend='nccl')
model = torch.nn.LSTM(10, 10).to(device)
model = torch.nn.parallel.DistributedDataParallel(
model, device_ids=[args.local_rank], output_device=args.local_rank, dim=1)
torch.save(model, 'model.pt')
Issue description
Code example
System Info
PyTorch version: 0.4.1
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.3.0-16ubuntu3) 7.3.0
CMake version: version 3.10.2
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB
GPU 2: Tesla P100-PCIE-16GB
GPU 3: Tesla P100-PCIE-16GB
Nvidia driver version: 390.30
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.3