ROCM support by electron271 · Pull Request #3279 · unslothai/unsloth

electron271 · 2025-09-05T22:18:48Z

closes #37

electron271 · 2025-09-05T22:23:39Z

windows support may also be possible but i would need some help testing this as i do not have a windows machine

electron271 · 2025-09-06T01:38:24Z

docs changes:

diff --git a/get-started/installing-+-updating/pip-install.md b/get-started/installing-+-updating/pip-install.md
index c1f0975..5f66dbf 100644
--- a/get-started/installing-+-updating/pip-install.md
+++ b/get-started/installing-+-updating/pip-install.md
@@ -24,6 +24,16 @@ pip uninstall unsloth unsloth_zoo -y && pip install --no-deps git+https://github
 
 If you're installing Unsloth in Jupyter, Colab, or other notebooks, be sure to prefix the command with `!`. This isn't necessary when using a terminal
 
+**To install Unsloth on AMD GPUs:**
+
+{% hint style="info" %}
+You can safely ignore errors about CUDA not being linked properly if you are installing Unsloth on AMD GPUs.
+{% endhint %}
+
+```bash
+pip install "unsloth[rocm64-torch280]"
+```
+
 ## Uninstall + Reinstall
 
 If you're still encountering dependency issues with Unsloth, many users have resolved them by forcing uninstalling and reinstalling Unsloth:

diff --git a/get-started/beginner-start-here/unsloth-requirements.md b/get-started/beginner-start-here/unsloth-requirements.md
index 793bd63..b5f5429 100644
--- a/get-started/beginner-start-here/unsloth-requirements.md
+++ b/get-started/beginner-start-here/unsloth-requirements.md
@@ -8,7 +8,7 @@ description: Here are Unsloth's requirements including system and GPU VRAM requi
 
 * **Operating System**: Works on Linux and Windows.
 * Supports NVIDIA GPUs since 2018+ including [Blackwell RTX 50](../../basics/training-llms-with-blackwell-rtx-50-series-and-unsloth) series. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.
-* Unsloth should work on [AMD](https://github.com/unslothai/unsloth/pull/2520) and [Intel](https://github.com/unslothai/unsloth/pull/2621) GPUs! Apple/Silicon/MLX is in the works.
+* Unsloth should work on [AMD](../installing-+-updating/pip-install#amd-installation) and [Intel](https://github.com/unslothai/unsloth/pull/2621) GPUs! Apple/Silicon/MLX is in the works.
 * If you have different versions of torch, transformers etc., `pip install unsloth` will automatically install all the latest versions of those libraries so you don't need to worry about version compatibility.
 * Your device must have `xformers`, `torch`, `BitsandBytes` and `triton` support.

electron271 · 2025-09-06T04:41:19Z

seems like 4bit exporting has some issues as 64 blocksize is not supported with rocm (ROCm/bitsandbytes#10), it is possible to have 64 blocksize though depending on warp size so i will look into submitting a pr to bitsandbytes

electron271 · 2025-09-06T05:12:17Z

i have found a likely solution, if it works maybe i can switch over the builds to my fork until its merged in so 4bit works

electron271 · 2025-09-06T05:56:25Z

marking as draft until i get this issue fixed as it is fairly major

electron271 · 2025-09-06T21:07:27Z

pr created: bitsandbytes-foundation/bitsandbytes#1748

electron271 · 2025-09-06T22:41:36Z

should work now, testing changes

electron271 · 2025-09-07T01:32:27Z

works

emuchogu · 2025-09-09T10:18:39Z

Works great on AMD MI100.

I added this to my vllm Dockerfile and it just worked.

RUN git clone --recurse https://github.com/ROCm/bitsandbytes && cd bitsandbytes && git checkout rocm_enabled_multi_backend && pip install -r requirements-dev.txt && cmake -DCOMPUTE_BACKEND=hip -S . && make -j  && pip install .
RUN git clone https://github.com/electron271/unsloth-rocm.git && cd unsloth-rocm && pip install .
RUN pip install unsloth_zoo

Thanks

electron271 · 2025-09-09T10:53:42Z

Works great on AMD MI100.

I added this to my vllm Dockerfile and it just worked.

RUN git clone --recurse https://github.com/ROCm/bitsandbytes && cd bitsandbytes && git checkout rocm_enabled_multi_backend && pip install -r requirements-dev.txt && cmake -DCOMPUTE_BACKEND=hip -S . && make -j  && pip install .
RUN git clone https://github.com/electron271/unsloth-rocm.git && cd unsloth-rocm && pip install .
RUN pip install unsloth_zoo

Thanks

great to hear! you also shouldn't need to use the rocm fork of bitsandbytes (afaik), this branch will install rocm supported bitsandbytes as a dependency and if you want to manually install it was merged into main so you can use main bitsandbytes

nole70 · 2025-09-09T22:27:59Z

I ran git clone https://github.com/electron271/unsloth-rocm.git && cd unsloth-rocm && pip install .[rocm-torch280] on MI300x and tried to do DPO and get this error:

Traceback (most recent call last):
  File "/workspace/script.py", line 193, in <module>
    dpo_trainer.train()
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/trainer.py", line 2328, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 323, in _fast_inner_training_loop
  File "<string>", line 40, in _unsloth_training_step
  File "/tmp/unsloth_compiled_cache/UnslothDPOTrainer.py", line 2065, in compute_loss
    loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/UnslothDPOTrainer.py", line 1981, in get_batch_loss_metrics
    model_output = self.concatenated_forward(model, batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/UnslothDPOTrainer.py", line 1855, in concatenated_forward
    outputs = model(input_ids, **model_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/accelerate/utils/operations.py", line 818, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/accelerate/utils/operations.py", line 806, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/peft/peft_model.py", line 1850, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/peft/tuners/tuners_utils.py", line 222, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/unsloth_compiled_module_gemma3.py", line 880, in forward
    return Gemma3ForConditionalGeneration_forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, token_type_ids, cache_position, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, logits_to_keep, **lm_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_dynamo/external_utils.py", line 198, in nonrecursive_disable_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/unsloth_compiled_module_gemma3.py", line 696, in Gemma3ForConditionalGeneration_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/generic.py", line 940, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 937, in forward
    outputs = self.language_model(
              ^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/generic.py", line 1064, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 555, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return self._gradient_checkpointing_func(partial(super().__call__, **kwargs), *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 929, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 488, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/autograd/function.py", line 576, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/unsloth_zoo/gradient_checkpointing.py", line 475, in forward
    outputs = run_function(*args)
              ^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/generic.py", line 1024, in wrapped_forward
    output = orig_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 389, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/unsloth_zoo/temporary_patches/gemma.py", line 762, in forward
    return forward_function(self, hidden_states, position_embeddings, attention_mask, past_key_values, cache_position, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/unsloth_zoo/temporary_patches/gemma.py", line 643, in forward_function
    query_states_fp16 = self.q_proj(hidden_states) # output fp16
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/Linear4bit_peft_forward.py", line 56, in unsloth_forward
    result = self.base_layer(x, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/nn/modules.py", line 565, in forward
    return bnb.matmul_4bit(x, weight, bias=bias, quant_state=self.weight.quant_state).to(inp_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py", line 466, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/autograd/function.py", line 576, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py", line 380, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/functional.py", line 1002, in dequantize_4bit
    out = torch.ops.bitsandbytes.dequantize_4bit.default(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_ops.py", line 829, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 929, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/library.py", line 752, in func_no_dynamo
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/backends/cuda/ops.py", line 361, in _
    _dequantize_4bit_impl(A, absmax, blocksize, quant_type, dtype, out=out)
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/backends/cuda/ops.py", line 389, in _dequantize_4bit_impl
    torch._check(blocksize in [4096, 2048, 1024, 512, 256, 128])
  File "/workspace/venv312/lib/python3.12/site-packages/torch/__init__.py", line 1684, in _check
    _check_with(RuntimeError, cond, message)
  File "/workspace/venv312/lib/python3.12/site-packages/torch/__init__.py", line 1666, in _check_with
    raise error_type(message_evaluated)
RuntimeError: Expected cond to be True, but got False. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

electron271 · 2025-09-09T23:54:50Z

I ran git clone https://github.com/electron271/unsloth-rocm.git && cd unsloth-rocm && pip install .[rocm-torch280] on MI300x and tried to do DPO and get this error:

Traceback (most recent call last):
  File "/workspace/script.py", line 193, in <module>
    dpo_trainer.train()
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/trainer.py", line 2328, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 323, in _fast_inner_training_loop
  File "<string>", line 40, in _unsloth_training_step
  File "/tmp/unsloth_compiled_cache/UnslothDPOTrainer.py", line 2065, in compute_loss
    loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/UnslothDPOTrainer.py", line 1981, in get_batch_loss_metrics
    model_output = self.concatenated_forward(model, batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/UnslothDPOTrainer.py", line 1855, in concatenated_forward
    outputs = model(input_ids, **model_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/accelerate/utils/operations.py", line 818, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/accelerate/utils/operations.py", line 806, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/peft/peft_model.py", line 1850, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/peft/tuners/tuners_utils.py", line 222, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/unsloth_compiled_module_gemma3.py", line 880, in forward
    return Gemma3ForConditionalGeneration_forward(self, input_ids, pixel_values, attention_mask, position_ids, past_key_values, token_type_ids, cache_position, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, logits_to_keep, **lm_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_dynamo/external_utils.py", line 198, in nonrecursive_disable_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/unsloth_compiled_module_gemma3.py", line 696, in Gemma3ForConditionalGeneration_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/generic.py", line 940, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 937, in forward
    outputs = self.language_model(
              ^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/generic.py", line 1064, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 555, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return self._gradient_checkpointing_func(partial(super().__call__, **kwargs), *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 929, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 488, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/autograd/function.py", line 576, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/unsloth_zoo/gradient_checkpointing.py", line 475, in forward
    outputs = run_function(*args)
              ^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/generic.py", line 1024, in wrapped_forward
    output = orig_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 389, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/unsloth_zoo/temporary_patches/gemma.py", line 762, in forward
    return forward_function(self, hidden_states, position_embeddings, attention_mask, past_key_values, cache_position, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/unsloth_zoo/temporary_patches/gemma.py", line 643, in forward_function
    query_states_fp16 = self.q_proj(hidden_states) # output fp16
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/unsloth_compiled_cache/Linear4bit_peft_forward.py", line 56, in unsloth_forward
    result = self.base_layer(x, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/nn/modules.py", line 565, in forward
    return bnb.matmul_4bit(x, weight, bias=bias, quant_state=self.weight.quant_state).to(inp_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py", line 466, in matmul_4bit
    return MatMul4Bit.apply(A, B, out, bias, quant_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/autograd/function.py", line 576, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py", line 380, in forward
    output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/functional.py", line 1002, in dequantize_4bit
    out = torch.ops.bitsandbytes.dequantize_4bit.default(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_ops.py", line 829, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 929, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/torch/library.py", line 752, in func_no_dynamo
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/backends/cuda/ops.py", line 361, in _
    _dequantize_4bit_impl(A, absmax, blocksize, quant_type, dtype, out=out)
  File "/workspace/venv312/lib/python3.12/site-packages/bitsandbytes/backends/cuda/ops.py", line 389, in _dequantize_4bit_impl
    torch._check(blocksize in [4096, 2048, 1024, 512, 256, 128])
  File "/workspace/venv312/lib/python3.12/site-packages/torch/__init__.py", line 1684, in _check
    _check_with(RuntimeError, cond, message)
  File "/workspace/venv312/lib/python3.12/site-packages/torch/__init__.py", line 1666, in _check_with
    raise error_type(message_evaluated)
RuntimeError: Expected cond to be True, but got False. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

4bit is broken on CDNA gpus as they do not support 64 block size, i am unaware if there is a solution or not

billishyahao · 2025-09-10T07:13:56Z

Hi @electron271 , glad to see this fabulous contribution for amd GPU. Let me help on verifying on more kinds of devices and hope to collaborate on this.

billishyahao · 2025-09-10T07:46:26Z

I like the way to provide our end user the fresh prebuilt bnb binary directly in the patch. Somehow this does not work in some environment

That's one of the reasons why I install bnb from source in my previous patch #2520
I suggest to provide a dockerfile of rocm for end user to ensure this would work finally. What do you think ? cc @danielhanchen @shimmyshimmer

electron271 · 2025-09-10T15:34:42Z

I like the way to provide our end user the fresh prebuilt bnb binary directly in the patch. Somehow this does not work in some environment That's one of the reasons why I install bnb from source in my previous patch #2520 I suggest to provide a dockerfile of rocm for end user to ensure this would work finally. What do you think ? cc @danielhanchen @shimmyshimmer

i think a dockerfile would be beneficial for systems that dont support this. this error is caused by having a out of date system, the minimally usable version of gcc is GCC 13.2, released July 27, 2023. i will note that i had a lot of issues with dockerized rocm when i was trying to get unsloth working on rocm initially, so i'm not sure if i am able to help with it.

electron271 · 2025-10-24T01:34:50Z

the upstream bitsandbytes pr should hopefully be able to be merged soon

matthewdouglas · 2025-10-28T16:45:47Z

Hi @electron271
You'll want to try to build on Ubuntu 22.04 instead of Ubuntu 24.04 to have better compatibility - your repo is producing wheels with a glibc 2.39 requirement.

With that said, the official bitsandbytes wheels we build and will eventually publish are compatible with Ubuntu 22.04 (and other supported systems with glibc>=2.24).

I am going to go ahead and merge that PR on bitsandbytes soon; we'll drop the ROCm 6.1 build and keep 6.2/6.3/6.4/7.0. We still need to add the RDNA4/CDNA4 build targets (RX 9070/9060, MI350X/MI355X), and need to keep in mind that while this can enable blocksize 64 on RDNA (consumer) it won't for CDNA (datacenter).

cc @billishyahao @danielhanchen

electron271 · 2025-11-05T23:06:50Z

Hi @electron271 You'll want to try to build on Ubuntu 22.04 instead of Ubuntu 24.04 to have better compatibility - your repo is producing wheels with a glibc 2.39 requirement.

With that said, the official bitsandbytes wheels we build and will eventually publish are compatible with Ubuntu 22.04 (and other supported systems with glibc>=2.24).

I am going to go ahead and merge that PR on bitsandbytes soon; we'll drop the ROCm 6.1 build and keep 6.2/6.3/6.4/7.0. We still need to add the RDNA4/CDNA4 build targets (RX 9070/9060, MI350X/MI355X), and need to keep in mind that while this can enable blocksize 64 on RDNA (consumer) it won't for CDNA (datacenter).

cc @billishyahao @danielhanchen

done, my bitsandbytes builds are temporarily broken though as i reached maximum git lfs bandwidth and the limit resets in ~30 days. will think of a potential solution

electron271 · 2025-12-01T03:58:24Z

Hi @electron271 You'll want to try to build on Ubuntu 22.04 instead of Ubuntu 24.04 to have better compatibility - your repo is producing wheels with a glibc 2.39 requirement.
With that said, the official bitsandbytes wheels we build and will eventually publish are compatible with Ubuntu 22.04 (and other supported systems with glibc>=2.24).
I am going to go ahead and merge that PR on bitsandbytes soon; we'll drop the ROCm 6.1 build and keep 6.2/6.3/6.4/7.0. We still need to add the RDNA4/CDNA4 build targets (RX 9070/9060, MI350X/MI355X), and need to keep in mind that while this can enable blocksize 64 on RDNA (consumer) it won't for CDNA (datacenter).
cc @billishyahao @danielhanchen

done, my bitsandbytes builds are temporarily broken though as i reached maximum git lfs bandwidth and the limit resets in ~30 days. will think of a potential solution

limit ended up resetting so it works now, i may look into hosting it myself but its probably a bad idea to have sources from unreliable urls in unsloth so it may be best to wait until bitsandbytes has updated

…nslothai#3859) * add int8 weight-only QAT scheme, add test, fix tests for current torchao version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change quantization to PerAxis * lambda =/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add torchao messages, remove group_size from int8 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * raise exception on missing torchao * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * touch up the torchao imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.11 → v0.14.13](astral-sh/ruff-pre-commit@v0.14.11...v0.14.13) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Implement vLLM patch for notebook detection Add patch for vLLM compatibility in notebook environments. * Fix sys.stdout.fileno for vLLM compatibility Patch sys.stdout.fileno for vLLM compatibility in notebooks. * Add patch_vllm_for_notebooks to initialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden vLLM notebook stdout patch * Use logger for vLLM notebook patch * Clarify vLLM notebook patch log message --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>

* Handle Transformers 5 vLLM import errors * Deduplicate vLLM transformers mismatch handling --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>

… models (unslothai#3719) * add FastSentenceTransformer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Gemini code review suggestions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unsloth-zoo patch only fixed usage for XLMRobertaForMaskedLM, this is a fix for XLMRobertaModel * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor do_lower_case * add some comments * force disable FP8 loading * refactor pooling detection, add missing pooling types * add save_pretrained_merged method which gets modules and config * fix _save_pretrained_merged * rename read_pooling_mode, load modules instead of hard-coding em * comment * revert save_pretrained_merged change * propagate trust_remote_code properly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add super hacky mpnet patch from hell * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor _load_modules, add for_inference to from_pretrained, add transformers 5 code for mpnet, add distilbert patches * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add ModernBert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * deberta-v2 support (provisional), fix remote_code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add generic add_pooling_layer logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for missing config * add push_to_hub_merged * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * edit messages, throw exception if no HF token * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix device_map mismatch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comments, move import, other suggestions by Datta0 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-add adapter removal to save_pretrained_merged, but if saving to folder which had adapters before, leave them * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unsloth branding to save_pretrained_merged * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * propagate dtype to internal module when loading for inference * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mpnet gradient checkpointing for torch >= 2.9 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * same thing for transformers 5, oops =) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix FastSentenceTransformer performance: 6x speedup via torch.compile + SDPA The original implementation was 31% slower than naive SentenceTransformer due to conflicting decorators from Unsloth's auto-compiler (@torch.compile on attention modules but @torch.compiler.disable on sub-modules). Changes: - Add fast encoder path that bypasses Unsloth patching for encoder models - Use native torch.compile with mode="reduce-overhead" for 6x speedup - Auto-detect and enable SDPA for models that support it (BERT, RoBERTa, etc.) - Change defaults: load_in_16bit=True, load_in_4bit=False (16-bit is optimal) - Change default: use_gradient_checkpointing=False (conflicts with torch.compile) - Add UNSLOTH_COMPILE_DISABLE=1 env var to fall back to old path if needed Supported encoder types: mpnet, bert, distilbert, roberta, xlm-roberta, albert, electra Benchmark results (BS=32, seq_len=128): - Naive 16-bit LoRA: 13-50ms per iter - Unsloth 16-bit LoRA: 2-9ms per iter (5.4x-6.7x faster) - Memory usage: 61MB-1.3GB (even largest model fits easily) Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665). 4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead, so 16-bit is recommended for these small encoder models anyway. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use Unsloth's prepare_model_for_kbit_training for consistency Changed from peft.prepare_model_for_kbit_training to unsloth.models._utils.prepare_model_for_kbit_training. Unsloth's version provides: - Float32 mixed precision upcasting for LoRA layers - Better numerical stability - Consistency with rest of Unsloth codebase * Use relative imports and add float16 machine support - Changed absolute import to relative: from ._utils import prepare_model_for_kbit_training - Added SUPPORTS_BFLOAT16 import for proper dtype detection - Handle devices that don't support bfloat16 by falling back to float16 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save_pretrained_torchao * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add auto-compile for torch.compile based on training step breakeven analysis Changes: - Change default compile_mode from "reduce-overhead" to "default" since CUDA Graphs (used by reduce-overhead) is incompatible with PEFT/LoRA - Add _estimate_compile_threshold() to calculate minimum steps needed for torch.compile to be beneficial based on model parameter count - Add _apply_torch_compile() helper with accelerate unwrap_model bug workaround - Defer torch.compile application to trainer initialization time so we can check max_steps against the breakeven threshold - Patch SentenceTransformerTrainer to auto-apply compile when max_steps exceeds the calculated threshold Breakeven thresholds (with 1.2x safety margin): - 22M params (MiniLM): ~1388 steps - 110M params (mpnet): ~242 steps - 335M params (snowflake): ~203 steps This ensures torch.compile warmup cost is only paid when training is long enough to benefit from the speedup. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do QAT preparation for fast path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix double loading model, thanks Etherl * do mpnet gradient checkpoint patch if gc is enabled * remove distilbert patches from mpnet fix * sanity check on model params, thanks Etherl * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save_pretrained_gguf, thanks Etherl * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refine compile threshold estimation for sentence transformers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>

* Guard torch.compile on ROCm when triton_key missing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten ROCm Triton import handling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Rachel Li <rachelliqx07@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

electron271 · 2026-03-03T15:53:54Z

@electron271 Sorry for the late response, I was out for some time. Will be looking into this when I get the bandwidth.

any updates at all

sstamenk · 2026-03-06T16:43:15Z

@electron271 Testing various ROCm versions atm, will open a PR soon, the unsloth_zoo issues seem to be resolved.

sstamenk · 2026-03-09T14:54:44Z

@electron271 I opened a PR to your fork, can you take a look?

electron271 · 2026-03-09T18:15:33Z

looking right now

ROCm/PyTorch install combinations

sstamenk · 2026-03-12T20:24:39Z

@danielhanchen was this closed by mistake or is there another PR up that incorporates these changes?

danielhanchen · 2026-03-13T06:54:22Z

@electron271 Sorry Unsloth had a rebase, so had to recover your PR at #4271.

danielhanchen · 2026-03-13T07:18:40Z

@electron271 I also recovered the broader ROCm install-matrix portion of your PR at #4272.

electron271 marked this pull request as draft September 5, 2025 22:18

electron271 mentioned this pull request Sep 5, 2025

[Feature] enable unsloth on amd gpu #2520

Merged

electron271 marked this pull request as ready for review September 6, 2025 00:29

electron271 marked this pull request as draft September 6, 2025 01:28

electron271 marked this pull request as ready for review September 6, 2025 01:39

electron271 marked this pull request as draft September 6, 2025 05:56

electron271 mentioned this pull request Sep 6, 2025

[Feature Request] AMD GPU #37

Closed

electron271 marked this pull request as ready for review September 7, 2025 00:15

billishyahao reviewed Sep 10, 2025

View reviewed changes

Comment thread pyproject.toml Outdated

danielhanchen and others added 2 commits December 9, 2025 17:37

Merge branch 'main' into nightly

6b908cf

Fix RawTextDataLoader import issue

112a893

electroglyph and others added 11 commits January 16, 2026 09:32

[pre-commit.ci] pre-commit autoupdate (unslothai#3905)

72100f4

updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.11 → v0.14.13](astral-sh/ruff-pre-commit@v0.14.11...v0.14.13) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Handle Transformers 5 vLLM import errors (unslothai#3908)

d59ee86

* Handle Transformers 5 vLLM import errors * Deduplicate vLLM transformers mismatch handling --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>

Versioning

063a02d

Update vision.py

e0bca16

Embedding model support

4f8a8c0

Embedding model fine-tuning support

66ec249

Merge branch 'main' into main

d277fd2

GoldenGrapeGentleman mentioned this pull request Feb 10, 2026

[ROCm] add rocm dockerfile #3324

Closed

GoldenGrapeGentleman mentioned this pull request Feb 25, 2026

fix(ROCm): Comprehensive RDNA GPU support - fix Gemma3 NaN & add is_rdna() #4109

Merged

Add more ROCm/PyTorch versions

ed6877f

Add more ROCm/PyTorch combinations

d02aa7f

Merge pull request #1 from sstamenk/rocm_install

3d8f12b

ROCm/PyTorch install combinations

electron271 requested a review from danielhanchen as a code owner March 9, 2026 18:17

sstamenk reviewed Mar 10, 2026

View reviewed changes

Comment thread unsloth/device_type.py Outdated

remove duplicate import

d1f4fb5

danielhanchen closed this Mar 12, 2026

danielhanchen force-pushed the main branch from 997f1a7 to 0099fff Compare March 12, 2026 05:34

danielhanchen mentioned this pull request Mar 13, 2026

ROCM support #4271

Merged

danielhanchen mentioned this pull request Mar 13, 2026

ROCM support #4272

Merged

Uh oh!

Conversation

electron271 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

electron271 commented Sep 5, 2025

Uh oh!

electron271 commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

electron271 commented Sep 6, 2025

Uh oh!

electron271 commented Sep 6, 2025

Uh oh!

electron271 commented Sep 6, 2025

Uh oh!

electron271 commented Sep 6, 2025

Uh oh!

electron271 commented Sep 6, 2025

Uh oh!

electron271 commented Sep 7, 2025

Uh oh!

emuchogu commented Sep 9, 2025

Uh oh!

electron271 commented Sep 9, 2025

Uh oh!

nole70 commented Sep 9, 2025

Uh oh!

electron271 commented Sep 9, 2025

Uh oh!

billishyahao commented Sep 10, 2025

Uh oh!

Uh oh!

billishyahao commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

electron271 commented Sep 10, 2025

Uh oh!

electron271 commented Oct 24, 2025

Uh oh!

matthewdouglas commented Oct 28, 2025

Uh oh!

electron271 commented Nov 5, 2025

Uh oh!

electron271 commented Dec 1, 2025

Uh oh!

electron271 commented Mar 3, 2026

Uh oh!

sstamenk commented Mar 6, 2026

Uh oh!

sstamenk commented Mar 9, 2026

Uh oh!

electron271 commented Mar 9, 2026

Uh oh!

Uh oh!

sstamenk commented Mar 12, 2026

Uh oh!

danielhanchen commented Mar 13, 2026

Uh oh!

danielhanchen commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

electron271 commented Sep 5, 2025 •

edited

Loading

electron271 commented Sep 6, 2025 •

edited

Loading

billishyahao commented Sep 10, 2025 •

edited

Loading