Understanding training strategies in ModelForge.
Strategies define how models are trained:
- Model preparation (adapters, PEFT configuration)
- Dataset formatting
- Trainer setup
- Training algorithm
Different strategies offer different trade-offs in terms of memory, speed, and quality.
| Strategy | Memory | Speed | Quality | Use Case |
|---|---|---|---|---|
| SFT | Baseline | 1x | High | General-purpose fine-tuning |
| QLoRA | -30-50% | 0.9x | High | Limited VRAM |
| RLHF | Medium | Medium | Very High | Alignment with human preferences |
| DPO | Medium | Medium | Very High | Simpler alternative to RLHF |
✅ First time fine-tuning
✅ Have sufficient VRAM
✅ Standard supervised learning task
✅ Want simplest setup
✅ Limited VRAM (< 12GB for 7B models)
✅ Want to train larger models
✅ Memory is the bottleneck
✅ Can accept slightly slower training
✅ Aligning model with human preferences ✅ Have preference pairs (prompt/chosen/rejected) ✅ Quality is critical ✅ Want conservative training defaults
✅ Have preference pairs (chosen/rejected)
✅ Want simpler alternative to RLHF
✅ Alignment without reward model
✅ More stable training than RLHF
Specify strategy in training config:
{
"strategy": "sft" // or "qlora", "rlhf", "dpo"
}Note: DPO and RLHF strategies require
"task": "text-generation". This is enforced by schema validation.
- SFT Strategy - Standard supervised fine-tuning
- QLoRA Strategy - Memory-efficient training
- Configuration Guide - All options
Choose the right strategy for your needs! 🎯