Skip to content

Latest commit

 

History

History
74 lines (51 loc) · 1.94 KB

File metadata and controls

74 lines (51 loc) · 1.94 KB

Strategy Overview

Understanding training strategies in ModelForge.

What Are Strategies?

Strategies define how models are trained:

  • Model preparation (adapters, PEFT configuration)
  • Dataset formatting
  • Trainer setup
  • Training algorithm

Different strategies offer different trade-offs in terms of memory, speed, and quality.

Available Strategies

Strategy Memory Speed Quality Use Case
SFT Baseline 1x High General-purpose fine-tuning
QLoRA -30-50% 0.9x High Limited VRAM
RLHF Medium Medium Very High Alignment with human preferences
DPO Medium Medium Very High Simpler alternative to RLHF

Choosing a Strategy

Use SFT When:

✅ First time fine-tuning
✅ Have sufficient VRAM
✅ Standard supervised learning task
✅ Want simplest setup

Use QLoRA When:

✅ Limited VRAM (< 12GB for 7B models)
✅ Want to train larger models
✅ Memory is the bottleneck
✅ Can accept slightly slower training

Use RLHF When:

✅ Aligning model with human preferences ✅ Have preference pairs (prompt/chosen/rejected) ✅ Quality is critical ✅ Want conservative training defaults

Use DPO When:

✅ Have preference pairs (chosen/rejected)
✅ Want simpler alternative to RLHF
✅ Alignment without reward model
✅ More stable training than RLHF

Configuration

Specify strategy in training config:

{
  "strategy": "sft"  // or "qlora", "rlhf", "dpo"
}

Note: DPO and RLHF strategies require "task": "text-generation". This is enforced by schema validation.

Next Steps


Choose the right strategy for your needs! 🎯