-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
Description
Bug
This is related to:
- undefined symbol curandCreateGenerator for torch extensions pytorch/pytorch#69666
- [BUG] torch-nightly: linker issue with
cpu_adam.so#1625 - AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #1846
When prebuilding the transformer-inference op, it is not properly linked against libcurand. This was fixed for JIT as part of #1688, but it doesn't seem to work for prebuilding:
/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ ldd transformer_inference_op.cpython-39-x86_64-linux-gnu.so | grep curand
[no output]
This manifests like so:
ImportError: /tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-39-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator
It seems no changes were made (outside of JIT) in either pytorch/pytorch#69666 or #1625. @stas00 reported prebuilding suddenly starting to work, but that doesn't seem to be the case in our environment.
Prebuilt OP
/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ LD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/torch/lib ldd transformer_inference_op.cpython-39-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffde01cf000)
libc10.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so (0x00007fd10840a000)
libtorch_cpu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (0x00007fd0ee399000)
libtorch_python.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so (0x00007fd0ed45e000)
libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fd0ed1ae000)
libc10_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so (0x00007fd0ed0b0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd0ecee3000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd0ecec7000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd0ecd02000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd0ecbbe000)
libgomp-a34b3233.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00007fd0ec994000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd0ec972000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd1085aa000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd0ec967000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd0ec95f000)
libcudart-45da57e3.so.11.0 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudart-45da57e3.so.11.0 (0x00007fd0ec6b7000)
libshm.so => /usr/local/lib/python3.9/site-packages/torch/lib/libshm.so (0x00007fd0ec6ad000)
libtorch.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch.so (0x00007fd0ec6a8000)
libtorch_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so (0x00007fd0ec689000)
libtorch_cuda_cpp.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007fd0df3e9000)
libnvToolsExt-847d78f2.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007fd0df1de000)
libcudnn.so.8 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudnn.so.8 (0x00007fd0defb6000)
libtorch_cuda_cu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007fd0b756f000)
libcublas.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublas.so.11 (0x00007fd0addf1000)
libcublasLt.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublasLt.so.11 (0x00007fd098d89000)
/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ nm transformer_inference_op.cpython-39-x86_64-linux-gnu.so | grep curand
U curandCreateGenerator
U curandSetPseudoRandomGeneratorSeed
JIT Op
~/.cache/torch_extensions/py39_cu116/transformer_inference$ LD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/torch/lib ldd transformer_inference.so
linux-vdso.so.1 (0x00007ffdf6ecc000)
libcurand.so.10 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 (0x00007f02ce0f2000)
libc10.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so (0x00007f02ce059000)
libc10_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so (0x00007f02cdf5b000)
libtorch_cpu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (0x00007f02b3eea000)
libtorch_python.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so (0x00007f02b2faf000)
libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f02b2d0b000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f02b2b3c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f02b2b22000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f02b295d000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f02b2952000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f02b2930000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f02b292a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f02b27e4000)
/lib64/ld-linux-x86-64.so.2 (0x00007f02d3d6b000)
libgomp-a34b3233.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00007f02b25ba000)
libcudart-45da57e3.so.11.0 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudart-45da57e3.so.11.0 (0x00007f02b2312000)
libshm.so => /usr/local/lib/python3.9/site-packages/torch/lib/libshm.so (0x00007f02b2308000)
libtorch.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch.so (0x00007f02b2303000)
libtorch_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so (0x00007f02b22e2000)
libtorch_cuda_cpp.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007f02a5044000)
libnvToolsExt-847d78f2.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007f02a4e39000)
libcudnn.so.8 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudnn.so.8 (0x00007f02a4c11000)
libtorch_cuda_cu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007f027d1ca000)
libcublas.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublas.so.11 (0x00007f0273a4a000)
libcublasLt.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublasLt.so.11 (0x00007f025e9e4000)
/.cache/torch_extensions/py39_cu116/transformer_inference$ nm transformer_inference.so | grep curand
U curandCreateGenerator@libcurand.so.10
U curandSetPseudoRandomGeneratorSeed@libcurand.so.10
Environment
Torch
$ python -m torch.utils.collect_envCollecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31
Python version: 3.9.10 (main, Mar 2 2022, 04:23:34) [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.2
[pip3] torch==1.12.1+cu116
[pip3] torchinfo==1.7.0
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect
Deepspeed
$ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.9/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/tmp/DeepSpeed/deepspeed']
deepspeed info ................... 0.7.1+9b418c1e, 9b418c1e, master
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6
Reproduction
- Install
torch:
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
- Install DeepSpeed (devel install):
git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
DS_BUILD_TRANSFORMER_INFERENCE=1 pip install -e . --global-option="build_ext" --global-option="-j4" --no-cache -v --disable-pip-version-check
- Try to use inference:
import deepspeed
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-cased")
# Initialize the DeepSpeed-Inference engine
ds_engine = deepspeed.init_inference(model,
mp_size=1,
dtype=torch.half,
replace_method='auto',
replace_with_kernel_inject=True)
model = ds_engine.module
Error
Traceback (most recent call last):
File "/tmp/test2.py", line 8, in <module>
ds_engine = deepspeed.init_inference(model,
File "/tmp/DeepSpeed/deepspeed/__init__.py", line 292, in init_inference
engine = InferenceEngine(model,
File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 140, in __init__
self._apply_injection_policy(
File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 333, in _apply_injection_policy
replace_transformer_layer(
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 771, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 954, in replace_module
replaced_module, _ = _replace_module(model, policy)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 971, in _replace_module
replaced_module = policies[child.__class__][0](child,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 761, in replace_fn
new_module = replace_with_policy(child,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 359, in replace_with_policy
new_module = transformer_inference.DeepSpeedTransformerInference(
File "/tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference.py", line 774, in __init__
inference_cuda_module = builder.load()
File "/tmp/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 468, in load
return importlib.import_module(self.absolute_name())
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
File "<frozen importlib._bootstrap>", line 565, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 1173, in create_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-39-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator