The Unsloth provider enables 2x faster training with 20% less memory through optimized CUDA kernels and efficient memory management.
Unsloth is a specialized library that patches HuggingFace Transformers to use optimized implementations for:
- Flash Attention 2
- Fused optimizer kernels
- Efficient gradient checkpointing
- Optimized LoRA implementations
✅ 2x faster training compared to standard HuggingFace
✅ 20% memory reduction for the same batch size
✅ Zero code changes - same API as HuggingFace
✅ Supports popular architectures: Llama, Mistral, Qwen, Gemma, Phi
✅ Compatible with all strategies: SFT, QLoRA, RLHF, DPO
| Platform | Supported | Notes |
|---|---|---|
| Linux (Native) | ✅ | Recommended |
| WSL 2 | ✅ | Full support |
| Docker | ✅ | With NVIDIA runtime |
| Windows (Native) | ❌ | Use WSL or Docker for Unsloth |
| macOS (Apple Silicon) | ❌ | Not supported - Unsloth requires NVIDIA CUDA GPUs. Use HuggingFace provider on macOS |
pip install unslothSee Windows Installation Guide.
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
RUN pip install unslothpython -c "import unsloth; print('Unsloth version:', unsloth.__version__)"{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048,
"task": "text-generation",
"strategy": "sft",
"num_train_epochs": 3,
"lora_r": 16,
"lora_alpha": 32
}max_seq_length. Auto-inference (-1) is NOT supported.
Valid:
{
"provider": "unsloth",
"max_seq_length": 2048 // ✅ Fixed value
}Invalid:
{
"provider": "unsloth",
"max_seq_length": -1 // ❌ NOT supported
}Common values:
512- Short sequences, lower memory1024- Medium sequences2048- Standard (recommended)4096- Long contexts, more memory8192- Very long contexts, high memory
- Go to Training tab
- Select Provider:
unsloth - Set Max Sequence Length:
2048(or your preferred value) - Configure other settings
- Start training
curl -X POST http://localhost:8000/api/start_training \
-H "Content-Type: application/json" \
-d '{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048,
"task": "text-generation",
"strategy": "sft",
"dataset": "/path/to/dataset.jsonl",
"num_train_epochs": 3
}'-
Llama (1, 2, 3, 3.1, 3.2)
meta-llama/Llama-3.2-1Bmeta-llama/Llama-3.2-3Bmeta-llama/Llama-3.1-8B
-
Mistral
mistralai/Mistral-7B-v0.1mistralai/Mistral-7B-Instruct-v0.3
-
Qwen
Qwen/Qwen2-1.5BQwen/Qwen2-7B
-
Gemma
google/gemma-2bgoogle/gemma-7b
-
Phi
microsoft/phi-2microsoft/phi-3-mini
- BART - Some optimizations not available
- T5 - Not recommended with Unsloth
Setup: Llama-3.2-3B, 1000 examples, NVIDIA RTX 3090
| Provider | Time | Speedup |
|---|---|---|
| HuggingFace | 45 min | 1.0x |
| Unsloth | 22 min | 2.0x |
Setup: Llama-3.2-7B, batch_size=4, seq_length=2048
| Provider | VRAM | Reduction |
|---|---|---|
| HuggingFace | 16.2 GB | - |
| Unsloth | 12.8 GB | 21% |
Setup: Llama-3.2-3B, batch_size=8
| Provider | Tokens/sec | Improvement |
|---|---|---|
| HuggingFace | 2,400 | - |
| Unsloth | 4,800 | 2x |
{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048,
"strategy": "qlora",
"use_4bit": true,
"bf16": true,
"gradient_checkpointing": true,
"per_device_train_batch_size": 8,
"gradient_accumulation_steps": 4,
"lora_r": 64,
"lora_alpha": 16,
"lora_dropout": 0.1
}For limited VRAM:
{
"provider": "unsloth",
"max_seq_length": 1024, // Reduce sequence length
"per_device_train_batch_size": 2,
"gradient_accumulation_steps": 8,
"gradient_checkpointing": true,
"use_4bit": true
}For maximum speed:
{
"provider": "unsloth",
"max_seq_length": 2048,
"per_device_train_batch_size": 16,
"gradient_accumulation_steps": 1,
"bf16": true,
"optim": "adamw_8bit"
}Unsloth auto-detects optimal LoRA target modules, but you can override:
{
"provider": "unsloth",
"target_modules": [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj"
]
}Unsloth uses optimized gradient checkpointing:
{
"provider": "unsloth",
"gradient_checkpointing": true // Automatically optimized
}Problem: Provider error when selecting Unsloth
Solution: Install Unsloth:
pip install unslothProblem: Running on native Windows
Solution: Use WSL or Docker. See Windows Installation.
Problem: Auto-inference not supported
Solution: Set a fixed value:
{
"max_seq_length": 2048
}Problem: OOM errors during training
Solutions:
- Reduce
max_seq_length:2048→1024 - Reduce
per_device_train_batch_size:8→4 - Enable
gradient_checkpointing: true - Use 4-bit quantization:
use_4bit: true
Problem: Specific model doesn't work with Unsloth
Solution: Fall back to HuggingFace provider:
{
"provider": "huggingface"
}Problem: Flash Attention 2 compatibility issues
Solution: Disable Flash Attention:
export UNSLOTH_DISABLE_FLASH_ATTN=1
modelforge| Feature | HuggingFace | Unsloth |
|---|---|---|
| Training Speed | 1x | 2x |
| Memory Usage | Baseline | -20% |
| Platform Support | All (including macOS MPS) | Linux/WSL/Docker only (CUDA required) |
| Model Support | All | Llama, Mistral, Qwen, Gemma, Phi |
| Complexity | Simple | Simple |
| Stability | Stable | Stable |
| Documentation | Extensive | Growing |
- Training on Linux or WSL with NVIDIA GPU
- Using supported models (Llama, Mistral, etc.)
- Need faster training times
- Have limited VRAM
- Training large models (7B+)
- Running on native Windows (use HuggingFace)
- Running on macOS with Apple Silicon (use HuggingFace)
- Using unsupported models (BART, T5)
- Debugging issues (HuggingFace has better error messages)
- Need maximum compatibility
Switching is simple - just change the provider:
Before:
{
"provider": "huggingface",
"model_name": "meta-llama/Llama-3.2-3B",
...
}After:
{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048, // Add this!
...
}All other settings remain the same!
- Provider Overview - Compare all providers
- HuggingFace Provider - Standard provider docs
- Configuration Guide - All config options
- Performance Optimization - Get the best results
Unsloth: Train faster, use less memory! 🚀