Skip to content

Fine-tuning on a V100 GPU  #496

@AnnaVitali

Description

@AnnaVitali

Hi, I'm trying to perform the fine-tuning of Llama-3 8B on a V100 GPU. To do this, as required by Unsloth I upgraded the torch version to 2.1, and I followed the recommended installations for google Colab as in this tutorial, however fine-tuning cannot be performed because Xformers requires a computational capacity of 8 and I have 7, anyway Unsloth is able to perform the fine-tuning of Llama-3 8B on a T4 which has computational capability of 7.5. What I'm missing? Is there a version of Xformers that is compatible with my hardware and Unsloth requirements?

My torch version is: 2.1.0+cu121

This is my GPU setup:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100S-PCIE-32GB          Off | 00000000:3B:00.0 Off |                    0 |
| N/A   46C    P0              28W / 250W |      5MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

This is my code:

import torch
print(torch.__version__)
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn "xformers<0.0.26" trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
pass

This is the error I get:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(2, 555, 8, 4, 128) (torch.float16)
     key         : shape=(2, 555, 8, 4, 128) (torch.float16)
     value       : shape=(2, 555, 8, 4, 128) (torch.float16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
     p           : 0.0
`flshattF@0.0.0` is not supported because:
    xFormers wasn't build with CUDA support
    requires device with capability > (8, 0) but your GPU has capability (7, 0) (too old)
    operator wasn't built - see `python -m xformers.info` for more info
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    dtype=torch.float16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    operator does not support BMGHK format
    unsupported embed per head: 128

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions