-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Closed
Description
I'm trying to run inference on a model which doesn't fit on my GPU using this code:
import torch
device_map = {'transformer.wte': 0,
'transformer.drop': 0,
'transformer.h.0': 0,
'transformer.h.1': 0,
'transformer.h.2': 0,
'transformer.h.3': 0,
'transformer.h.4': 0,
'transformer.h.5': 0,
'transformer.h.6': 0,
'transformer.h.7': 0,
'transformer.h.8': 0,
'transformer.h.9': 0,
'transformer.h.10': 0,
'transformer.h.11': 0,
'transformer.h.12': 0,
'transformer.h.13': 0,
'transformer.h.14': 0,
'transformer.h.15': 0,
'transformer.h.16': 0,
'transformer.h.17': 0,
'transformer.h.18': 0,
'transformer.h.19': 0,
'transformer.h.20': 0,
'transformer.h.21': 0,
'transformer.h.22': 0,
'transformer.h.23': 'cpu',
'transformer.h.24': 'cpu',
'transformer.h.25': 'cpu',
'transformer.h.26': 'cpu',
'transformer.h.27': 'cpu',
'transformer.ln_f': 'cpu',
'lm_head': 'cpu'}
tokenizer = AutoTokenizer.from_pretrained("tomaxe/fr-boris-sharded")
model = AutoModelForCausalLM.from_pretrained("tomaxe/fr-boris-sharded", load_in_8bit = True, load_in_8bit_skip_modules = ['lm_head',
'transformer.ln_f',
'transformer.h.27',
'transformer.h.26',
'transformer.h.25',
'transformer.h.24',
'transformer.h.23'], device_map = device_map)
input_text = "salut"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids, max_length = 20)
print(tokenizer.decode(outputs[0]))
And I'm running into this error :
@younesbelkada Do you know what I could do ? Thanks
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /home/thomas/anaconda3/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/thomas/anaconda3/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 0%| | 0/30 [00:00<?, ?it/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A: torch.Size([2, 4096]), B: torch.Size([4096, 4096]), C: (2, 4096); (lda, ldb, ldc): (c_int(64), c_int(131072), c_int(64)); (m, n, k): (c_int(2), c_int(4096), c_int(4096))
Traceback (most recent call last):
File "/home/thomas/anaconda3/lib/python3.9/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "/home/thomas/Downloads/infersharded.py", line 46, in <module>
outputs = model.generate(input_ids, max_length = 20)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 1391, in generate
return self.greedy_search(
File "/home/thomas/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 2179, in greedy_search
outputs = self(
File "/home/thomas/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 813, in forward
transformer_outputs = self.transformer(
File "/home/thomas/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 668, in forward
outputs = block(
File "/home/thomas/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 302, in forward
attn_outputs = self.attn(
File "/home/thomas/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 203, in forward
query = self.q_proj(hidden_states)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 254, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 405, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 311, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/thomas/anaconda3/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
cuBLAS API failed with status 15
error detected```
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels