The SFT strategy is the standard, general-purpose fine-tuning approach using LoRA adapters.
Supervised Fine-Tuning uses labeled examples to teach the model new behaviors or knowledge through standard supervised learning.
✅ Simple and effective - Works for most use cases
✅ Well-tested - Industry standard approach
✅ Fast training - Faster than RLHF/DPO
✅ Low complexity - Easy to understand and debug
✅ Good quality - High-quality results for most tasks
{
"strategy": "sft",
"task": "text-generation",
"model_name": "meta-llama/Llama-3.2-3B",
"lora_r": 16,
"lora_alpha": 32,
"lora_dropout": 0.1
}SFT uses LoRA (Low-Rank Adaptation) for efficient fine-tuning:
{
"lora_r": 16, // Rank (8, 16, 32, 64)
"lora_alpha": 32, // Alpha (usually 2x rank)
"lora_dropout": 0.1, // Dropout rate
"target_modules": "all-linear"
}Standard input-output pairs:
{"input": "Question or instruction", "output": "Expected response"}See Dataset Formats for details.
- QLoRA Strategy - Memory-efficient variant
- Strategy Overview - Compare all strategies
- Configuration Guide - All options
SFT: The reliable standard for fine-tuning! ✨