Skip to content

[Bug] Exception: cublasLt ran into an error! during fine-tuning LLM in 8bit mode #538

@NanoCode012

Description

@NanoCode012

Problem

Hello, I'm getting this weird cublasLt error on a lambdalabs H100 with cuda 118, pytorch 2.0.1, python3.10 Miniconda while trying to fine-tune a 3B param open-llama using LORA with 8bit loading. This only happens if we turn on 8bit loading. Lora alone or 4bit loading (qlora) works.

The same commands did work 2 weeks ago and stopped working a week ago.

I've tried bitsandbytes version 0.39.0 and 0.39.1 as prior versions don't work with H100. Source gives me a different issue as mentioned in Env section.

Expected

No error

Reproduce

Setup Miniconda then follow https://github.com/OpenAccess-AI-Collective/axolotl 's readme on lambdalabs and run the default open llama lora config.

Trace

0.39.0

File "/home/ubuntu/axolotl/scripts/finetune.py", line 352, in <module>
    fire.Fire(train)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)   
  File "/home/ubuntu/axolotl/scripts/finetune.py", line 337, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1531, in train
    return inner_training_loop(
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1795, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2640, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in compute_loss
    outputs = model(**inputs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs) 
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
    return model_forward(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/peft/peft_model.py", line 827, in forward
    return self.base_model(
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs) 
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
    outputs = self.model(
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs) 
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
    layer_outputs = decoder_layer(
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs) 
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 293, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs) 
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 195, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs) 
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/peft/tuners/lora.py", line 942, in forward
    result = super().forward(x)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 402, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 562, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 400, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1781, in igemmlt
    raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!

Env

python -m bitsandbytes

bin /home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12
.0'] files: {PosixPath('/home/ubuntu/miniconda3/envs/py310/lib/libcudart.so.11.0'), PosixPath('/home/ubuntu/miniconda3/envs/py310/lib/libcudart.so')}.. We'll flip a coin and try one of
 these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0
'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /home/ubuntu/miniconda3/envs/py310/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 9.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++
/home/ubuntu/miniconda3/envs/py310/lib/libcudart.so
/home/ubuntu/miniconda3/envs/py310/lib/stubs/libcuda.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda_linalg.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10_cuda.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda114.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda111_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda111.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda115_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda115.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/home/ubuntu/miniconda3/envs/py310/nsight-compute/2023.1.1/target/linux-desktop-glibc_2_19_0-ppc64le/libcuda-injection.so
/home/ubuntu/miniconda3/envs/py310/nsight-compute/2023.1.1/target/linux-desktop-glibc_2_11_3-x64/libcuda-injection.so
/home/ubuntu/miniconda3/envs/py310/nsight-compute/2023.1.1/target/linux-desktop-t210-a64/libcuda-injection.so

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++


+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++


++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++
+++++++++++ /usr/lib/x86_64-linux-gnu CUDA PATHS +++++++++++
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so

++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = ['9.0']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
    + library is importable
    + CUDA function is callable


WARNING: Please be sure to sanitize sensible info from any such env vars!

SUCCESS!
Installation was successful!
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

Misc

All related issues:

Also tried install cudatoolkit via conda.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or requestMedium Priority(will be worked on after all high priority issues)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions