overview.md

Strategy Overview

Understanding training strategies in ModelForge.

What Are Strategies?

Strategies define how models are trained:

Model preparation (adapters, PEFT configuration)
Dataset formatting
Trainer setup
Training algorithm

Different strategies offer different trade-offs in terms of memory, speed, and quality.

Available Strategies

Strategy	Memory	Speed	Quality	Use Case
SFT	Baseline	1x	High	General-purpose fine-tuning
QLoRA	-30-50%	0.9x	High	Limited VRAM
RLHF	Medium	Medium	Very High	Alignment with human preferences
DPO	Medium	Medium	Very High	Simpler alternative to RLHF

Choosing a Strategy

Use SFT When:

✅ First time fine-tuning
✅ Have sufficient VRAM
✅ Standard supervised learning task
✅ Want simplest setup

Use QLoRA When:

✅ Limited VRAM (< 12GB for 7B models)
✅ Want to train larger models
✅ Memory is the bottleneck
✅ Can accept slightly slower training

Use RLHF When:

✅ Aligning model with human preferences ✅ Have preference pairs (prompt/chosen/rejected) ✅ Quality is critical ✅ Want conservative training defaults

Use DPO When:

✅ Have preference pairs (chosen/rejected)
✅ Want simpler alternative to RLHF
✅ Alignment without reward model
✅ More stable training than RLHF

Configuration

Specify strategy in training config:

{
  "strategy": "sft"  // or "qlora", "rlhf", "dpo"
}

Note: DPO and RLHF strategies require "task": "text-generation". This is enforced by schema validation.

Next Steps

SFT Strategy - Standard supervised fine-tuning
QLoRA Strategy - Memory-efficient training
Configuration Guide - All options

Choose the right strategy for your needs! 🎯

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy Overview

What Are Strategies?

Available Strategies

Choosing a Strategy

Use SFT When:

Use QLoRA When:

Use RLHF When:

Use DPO When:

Configuration

Next Steps

FilesExpand file tree

overview.md

Latest commit

History

overview.md

File metadata and controls

Strategy Overview

What Are Strategies?

Available Strategies

Choosing a Strategy

Use SFT When:

Use QLoRA When:

Use RLHF When:

Use DPO When:

Configuration

Next Steps