Skip to content

Errors in generation (Bloom) when changing options sampling/use_cache #324

@thies1006

Description

@thies1006

I'm running the inference script bloom-ds-inference.py by invoking:
deepspeed --num_gpus 1 ~/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py --name bigscience/bloom-1b3 --benchmark, but I change the generation arguments to generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=False, use_cache=False) (adding use_cache option).

Error:

*** Starting to generate 100 tokens with bs=1
Generate args {'max_new_tokens': 100, 'do_sample': False, 'use_cache': False}
!!!! kernel execution error. (m: 8192, n: 74, k: 2048, error: 13) 
!!!! kernel execution error. (m: 2048, n: 74, k: 8192, error: 13) 
!!!! kernel execution error. (m: 6144, n: 74, k: 2048, error: 13) 
Traceback (most recent call last):
  File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 257, in <module>
    _ = generate()
  File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 244, in generate
    outputs = model.generate(**input_tokens, **generate_kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1294, in generate
    return self.greedy_search(
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1689, in greedy_search
    outputs = self(
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1128, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 821, in forward
    transformer_outputs = self.transformer(
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 709, in forward
    outputs = block(
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 829, in forward
    self.attention(input,
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 541, in forward
    output = DeepSpeedSelfAttentionFunction.apply(
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 461, in forward
    output, key_layer, value_layer, context_layer, inp_norm = selfAttention_fp()
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 425, in selfAttention_fp
    context_layer, key_layer, value_layer = compute_attention(qkv_out[0] if isinstance(qkv_out, list) else qkv_out, input_mask)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 373, in compute_attention
    context_layer, presents = backup_attention(qkv_out, layer_past, alibi, input_mask, norm_factor)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 203, in backup_attention
    value_layer) = split_tensor_along_last_dim(mixed_x_layer,
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 189, in split_tensor_along_last_dim
    return tuple(chunk.contiguous() for chunk in tensor_list)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 189, in <genexpr>
    return tuple(chunk.contiguous() for chunk in tensor_list)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::Error'
  what():  NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:172, unhandled cuda error, NCCL version 21.0.3
Process Group destroyed on rank 0
Exception raised from ncclCommAbort at ../torch/csrc/distributed/c10d/NCCLUtils.hpp:172 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7fe702e251dc in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7fe702e02c96 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x2d00603 (0x7fe627b47603 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x1d1 (0x7fe627b29a01 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0xd (0x7fe627b29ebd in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x115a211 (0x7fe63e830211 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x11362eb (0x7fe63e80c2eb in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0xa030e2 (0x7fe63e0d90e2 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xa040a3 (0x7fe63e0da0a3 in /secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: /secondary/thies/.virtualenvs/bloom/bin/python() [0x5cedf8]
frame #10: /secondary/thies/.virtualenvs/bloom/bin/python() [0x5d1cdc]
frame #11: PyDict_Clear + 0xeb (0x5cef3b in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #12: /secondary/thies/.virtualenvs/bloom/bin/python() [0x6aa1ba]
frame #13: /secondary/thies/.virtualenvs/bloom/bin/python() [0x4ef8d8]
frame #14: _PyGC_CollectNoFail + 0x2f (0x672bcf in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #15: PyImport_Cleanup + 0x314 (0x685414 in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #16: Py_FinalizeEx + 0x7f (0x68040f in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #17: Py_RunMain + 0x32d (0x6b7a1d in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #18: Py_BytesMain + 0x2d (0x6b7c8d in /secondary/thies/.virtualenvs/bloom/bin/python)
frame #19: __libc_start_main + 0xf3 (0x7fe716fff0b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: _start + 0x2e (0x5fb12e in /secondary/thies/.virtualenvs/bloom/bin/python)

[2022-08-03 15:03:43,770] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 140325
[2022-08-03 15:03:43,770] [ERROR] [launch.py:292:sigkill_handler] ['/secondary/thies/.virtualenvs/bloom/bin/python', '-u', '/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py', '--local_rank=0', '--name', 'bigscience/bloom-1b3', '--benchmark'] exits with return code = -6

When using generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=True, use_cache=False) I get a different error:

*** Starting to generate 100 tokens with bs=1
Generate args {'max_new_tokens': 100, 'do_sample': True, 'use_cache': False}
Traceback (most recent call last):
  File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 257, in <module>
    _ = generate()
  File "/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py", line 244, in generate
    outputs = model.generate(**input_tokens, **generate_kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/secondary/thies/.virtualenvs/bloom/lib/python3.8/site-packages/transformers/generation_utils.py", line 1981, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
[2022-08-03 15:06:16,298] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 140658
[2022-08-03 15:06:16,298] [ERROR] [launch.py:292:sigkill_handler] ['/secondary/thies/.virtualenvs/bloom/bin/python', '-u', '/secondary/thies/Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py', '--local_rank=0', '--name', 'bigscience/bloom-1b3', '--benchmark'] exits with return code = 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions