Hi, I'm trying to perform the fine-tuning of Llama-3 8B on a V100 GPU. To do this, as required by Unsloth I upgraded the torch version to 2.1, and I followed the recommended installations for google Colab as in this tutorial, however fine-tuning cannot be performed because Xformers requires a computational capacity of 8 and I have 7, anyway Unsloth is able to perform the fine-tuning of Llama-3 8B on a T4 which has computational capability of 7.5. What I'm missing? Is there a version of Xformers that is compatible with my hardware and Unsloth requirements?
My torch version is: 2.1.0+cu121
This is my GPU setup:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100S-PCIE-32GB Off | 00000000:3B:00.0 Off | 0 |
| N/A 46C P0 28W / 250W | 5MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
This is my code:
import torch
print(torch.__version__)
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
# Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
!pip install --no-deps packaging ninja einops flash-attn "xformers<0.0.26" trl peft accelerate bitsandbytes
else:
# Use this for older GPUs (V100, Tesla T4, RTX 20xx)
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
pass
This is the error I get:
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(2, 555, 8, 4, 128) (torch.float16)
key : shape=(2, 555, 8, 4, 128) (torch.float16)
value : shape=(2, 555, 8, 4, 128) (torch.float16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
p : 0.0
`flshattF@0.0.0` is not supported because:
xFormers wasn't build with CUDA support
requires device with capability > (8, 0) but your GPU has capability (7, 0) (too old)
operator wasn't built - see `python -m xformers.info` for more info
`cutlassF` is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
xFormers wasn't build with CUDA support
dtype=torch.float16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
operator wasn't built - see `python -m xformers.info` for more info
operator does not support BMGHK format
unsupported embed per head: 128
Hi, I'm trying to perform the fine-tuning of Llama-3 8B on a V100 GPU. To do this, as required by Unsloth I upgraded the torch version to 2.1, and I followed the recommended installations for google Colab as in this tutorial, however fine-tuning cannot be performed because Xformers requires a computational capacity of 8 and I have 7, anyway Unsloth is able to perform the fine-tuning of Llama-3 8B on a T4 which has computational capability of 7.5. What I'm missing? Is there a version of Xformers that is compatible with my hardware and Unsloth requirements?
My torch version is:
2.1.0+cu121This is my GPU setup:
This is my code:
This is the error I get: