-
Notifications
You must be signed in to change notification settings - Fork 228
Open
Description
I'm running the inference script bloom-ds-inference.py by invoking:
deepspeed --num_gpus 1 ~/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py --name bigscience/bloom-1b3 --benchmark, but I change the generation arguments to generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=False, use_cache=False) (adding use_cache option).
Error:
*** Starting to generate 100 tokens with bs=1
Generate args {'max_new_tokens': 100, 'do_sample': False, 'use_cache': False}
!!!! kernel execution error. (m: 8192, n: 74, k: 2048, error: 13)
!!!! kernel execution error. (m: 2048, n: 74, k: 8192, error: 13)
!!!! kernel execution error. (m: 6144, n: 74, k: 2048, error: 13)
Traceback (most recent call last):
File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 257, in <module>
_ = generate()
File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 244, in generate
outputs = model.generate(**input_tokens, **generate_kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1294, in generate
return self.greedy_search(
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1689, in greedy_search
outputs = self(
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
result = forward_call(*input, **kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 821, in forward
transformer_outputs = self.transformer(
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 709, in forward
outputs = block(
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 829, in forward
self.attention(input,
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 541, in forward
output = DeepSpeedSelfAttentionFunction.apply(
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 461, in forward
output, key_layer, value_layer, context_layer, inp_norm = selfAttention_fp()
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 425, in selfAttention_fp
context_layer, key_layer, value_layer = compute_attention(qkv_out[0] if isinstance(qkv_out, list) else qkv_out, input_mask)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 373, in compute_attention
context_layer, presents = backup_attention(qkv_out, layer_past, alibi, input_mask, norm_factor)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 203, in backup_attention
value_layer) = split_tensor_along_last_dim(mixed_x_layer,
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 189, in split_tensor_along_last_dim
return tuple(chunk.contiguous() for chunk in tensor_list)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 189, in <genexpr>
return tuple(chunk.contiguous() for chunk in tensor_list)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::Error'
what(): NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:172, unhandled cuda error, NCCL version 21.0.3
Process Group destroyed on rank 0
Exception raised from ncclCommAbort at ../torch/csrc/distributed/c10d/NCCLUtils.hpp:172 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7fe702e251dc in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7fe702e02c96 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x2d00603 (0x7fe627b47603 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x1d1 (0x7fe627b29a01 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0xd (0x7fe627b29ebd in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x115a211 (0x7fe63e830211 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x11362eb (0x7fe63e80c2eb in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0xa030e2 (0x7fe63e0d90e2 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xa040a3 (0x7fe63e0da0a3 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: /secondary/thies/.virtualenvs/bloom/bin/python() [0x5cedf8]
frame #10: /secondary/thies/.virtualenvs/bloom/bin/python() [0x5d1cdc]
frame #11: PyDict_Clear + 0xeb (0x5cef3b in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #12: /secondary/thies/.virtualenvs/bloom/bin/python() [0x6aa1ba]
frame #13: /secondary/thies/.virtualenvs/bloom/bin/python() [0x4ef8d8]
frame #14: _PyGC_CollectNoFail + 0x2f (0x672bcf in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #15: PyImport_Cleanup + 0x314 (0x685414 in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #16: Py_FinalizeEx + 0x7f (0x68040f in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #17: Py_RunMain + 0x32d (0x6b7a1d in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #18: Py_BytesMain + 0x2d (0x6b7c8d in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #19: __libc_start_main + 0xf3 (0x7fe716fff0b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: _start + 0x2e (0x5fb12e in /secondary/thies/.virtualenvs/bloom/bin/python)
[2022-08-03 15:03:43,770] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 140325
[2022-08-03 15:03:43,770] [ERROR] [launch.py:292:sigkill_handler] ['/secondary/thies/.virtualenvs/bloom/bin/python', '-u', '/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py', '--local_rank=0', '--name', 'bigscience/bloom-1b3', '--benchmark'] exits with return code = -6
When using generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=True, use_cache=False) I get a different error:
*** Starting to generate 100 tokens with bs=1
Generate args {'max_new_tokens': 100, 'do_sample': True, 'use_cache': False}
Traceback (most recent call last):
File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 257, in <module>
_ = generate()
File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 244, in generate
outputs = model.generate(**input_tokens, **generate_kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1326, in generate
return self.sample(
File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1981, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
[2022-08-03 15:06:16,298] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 140658
[2022-08-03 15:06:16,298] [ERROR] [launch.py:292:sigkill_handler] ['/secondary/thies/.virtualenvs/bloom/bin/python', '-u', '/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py', '--local_rank=0', '--name', 'bigscience/bloom-1b3', '--benchmark'] exits with return code = 1
Metadata
Metadata
Assignees
Labels
No labels