Complete API reference for training configuration options.
This document describes all available configuration options for fine-tuning models in ModelForge. The configuration is passed as JSON to the training API.
The complete training configuration schema.
class TrainingConfig(BaseModel):
"""Complete training configuration."""
# Model and task settings
task: str
model_name: str
provider: str = "huggingface"
strategy: str = "sft"
dataset: str
compute_specs: str
# LoRA settings
lora_r: int = 16
lora_alpha: int = 32
lora_dropout: float = 0.1
# Quantization settings
use_4bit: bool = True
use_8bit: bool = False
bnb_4bit_compute_dtype: str = "float16"
bnb_4bit_quant_type: str = "nf4"
use_nested_quant: bool = False
# Training precision
fp16: bool = False
bf16: bool = False
# Training hyperparameters
num_train_epochs: int = 1
per_device_train_batch_size: int = 1
per_device_eval_batch_size: int = 1
gradient_accumulation_steps: int = 4
gradient_checkpointing: bool = True
max_grad_norm: float = 0.3
learning_rate: float = 2e-4
weight_decay: float = 0.001
optim: str = "paged_adamw_32bit"
lr_scheduler_type: str = "cosine"
max_steps: int = -1
warmup_ratio: float = 0.03
group_by_length: bool = True
packing: bool = False
# Sequence settings
max_seq_length: Optional[int] = None
# Evaluation settings
eval_split: float = 0.2
eval_steps: int = 100- Type:
string - Required: Yes
- Valid Values:
"text-generation","summarization","extractive-question-answering" - Description: The training task type
Example:
{
"task": "text-generation"
}- Type:
string - Required: Yes
- Description: HuggingFace model ID or local path to model
- Validation: Cannot be empty
Examples:
{
"model_name": "meta-llama/Llama-3.1-8B-Instruct"
}{
"model_name": "/path/to/local/model"
}- Type:
string - Required: No
- Default:
"huggingface" - Valid Values:
"huggingface","unsloth" - Description: Model loading provider
Example:
{
"provider": "unsloth"
}- Type:
string - Required: No
- Default:
"sft" - Valid Values:
"sft","qlora","rlhf","dpo" - Description: Training strategy
Example:
{
"strategy": "qlora"
}- Type:
string - Required: Yes
- Description: Path to JSONL dataset file
- Validation: Cannot be empty
Example:
{
"dataset": "/path/to/dataset.jsonl"
}- Type:
string - Required: Yes
- Valid Values:
"low_end","mid_range","high_end" - Description: Hardware profile for optimization
Example:
{
"compute_specs": "mid_range"
}- Type:
integer - Default:
16 - Range: 1-256
- Description: LoRA rank (dimensionality of adapter)
- Recommendations:
- Low VRAM: 8-16
- Mid VRAM: 16-32
- High VRAM: 32-64
Example:
{
"lora_r": 64
}- Type:
integer - Default:
32 - Description: LoRA alpha scaling parameter
- Recommendation: Usually
2 × lora_r
Example:
{
"lora_alpha": 128
}- Type:
float - Default:
0.1 - Range: 0.0-1.0
- Description: Dropout probability for LoRA layers
Example:
{
"lora_dropout": 0.05
}- Type:
boolean - Default:
true - Description: Enable 4-bit quantization (BitsAndBytes)
Example:
{
"use_4bit": true
}- Type:
boolean - Default:
false - Description: Enable 8-bit quantization
Example:
{
"use_8bit": true
}- Type:
string - Default:
"float16" - Valid Values:
"float16","bfloat16","float32" - Description: Compute dtype for 4-bit quantization
Example:
{
"bnb_4bit_compute_dtype": "bfloat16"
}- Type:
string - Default:
"nf4" - Valid Values:
"nf4","fp4" - Description: Quantization type (NormalFloat4 or Float4)
Example:
{
"bnb_4bit_quant_type": "nf4"
}- Type:
boolean - Default:
false - Description: Enable nested quantization for additional memory savings
Example:
{
"use_nested_quant": true
}- Type:
boolean - Default:
false - Description: Use 16-bit floating point precision
- Use: Older GPUs (GTX 10xx, RTX 20xx)
Example:
{
"fp16": true
}- Type:
boolean - Default:
false - Description: Use bfloat16 precision
- Use: Ampere+ GPUs (RTX 30xx, 40xx, A100)
- Recommendation: Preferred over fp16 on supported hardware
Example:
{
"bf16": true
}- Type:
integer - Default:
1 - Minimum: 1
- Description: Number of training epochs
Examples:
{
"num_train_epochs": 3
}- Type:
integer - Default:
1 - Minimum: 1
- Description: Training batch size per GPU
Recommendations:
- Low End: 1
- Mid Range: 2-4
- High End: 4-8
Example:
{
"per_device_train_batch_size": 4
}- Type:
integer - Default:
1 - Minimum: 1
- Description: Evaluation batch size per GPU
Example:
{
"per_device_eval_batch_size": 4
}- Type:
integer - Default:
4 - Description: Number of gradient accumulation steps
- Purpose: Simulate larger batch sizes without increasing memory
Effective Batch Size = per_device_train_batch_size × gradient_accumulation_steps
Example:
{
"gradient_accumulation_steps": 8
}- Type:
boolean - Default:
true - Description: Enable gradient checkpointing to save memory
- Trade-off: Lower memory usage, slightly slower training
Example:
{
"gradient_checkpointing": true
}- Type:
float - Default:
0.3 - Description: Maximum gradient norm for clipping
Example:
{
"max_grad_norm": 1.0
}- Type:
float - Default:
2e-4 - Range: > 0 and ≤ 1
- Description: Initial learning rate
Recommendations:
- SFT: 2e-4
- QLoRA: 2e-4
- RLHF: 1.41e-5
- DPO: 5e-7
Example:
{
"learning_rate": 2e-4
}- Type:
float - Default:
0.001 - Description: Weight decay for regularization
Example:
{
"weight_decay": 0.01
}- Type:
string - Default:
"paged_adamw_32bit" - Valid Values:
"adamw_torch","adamw_8bit","paged_adamw_32bit","sgd" - Description: Optimizer type
Recommendations:
- Standard:
"adamw_torch" - Memory-efficient:
"paged_adamw_32bit"or"adamw_8bit"
Example:
{
"optim": "paged_adamw_32bit"
}- Type:
string - Default:
"cosine" - Valid Values:
"linear","cosine","constant","polynomial" - Description: Learning rate scheduler type
Example:
{
"lr_scheduler_type": "cosine"
}- Type:
integer - Default:
-1(disabled) - Description: Maximum number of training steps (overrides epochs if set)
Example:
{
"max_steps": 1000
}- Type:
float - Default:
0.03 - Range: 0.0-1.0
- Description: Proportion of training for learning rate warmup
Example:
{
"warmup_ratio": 0.1
}- Type:
boolean - Default:
true - Description: Group sequences of similar length for efficiency
Example:
{
"group_by_length": true
}- Type:
boolean - Default:
false - Description: Pack multiple sequences into one to maximize GPU utilization
Example:
{
"packing": true
}- Type:
integerornull - Default:
null(auto-detect) - Description: Maximum sequence length in tokens
Recommendations:
- Low End: 512-1024
- Mid Range: 1024-2048
- High End: 2048-4096
Note: When using Unsloth provider, cannot be -1 or null.
Example:
{
"max_seq_length": 2048
}- Type:
float - Default:
0.2 - Range: 0.0-1.0
- Description: Proportion of data to use for evaluation
Example:
{
"eval_split": 0.1
}- Type:
integer - Default:
100 - Description: Evaluate every N steps
Example:
{
"eval_steps": 50
}{
"task": "text-generation",
"model_name": "qwen/Qwen2.5-3B",
"provider": "huggingface",
"strategy": "qlora",
"dataset": "/path/to/dataset.jsonl",
"compute_specs": "low_end",
"lora_r": 16,
"lora_alpha": 32,
"lora_dropout": 0.1,
"use_4bit": true,
"bf16": false,
"fp16": true,
"num_train_epochs": 3,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 16,
"learning_rate": 2e-4,
"max_seq_length": 512,
"eval_split": 0.2,
"eval_steps": 100
}{
"task": "text-generation",
"model_name": "meta-llama/Llama-3.1-8B-Instruct",
"provider": "unsloth",
"strategy": "qlora",
"dataset": "/path/to/dataset.jsonl",
"compute_specs": "mid_range",
"lora_r": 64,
"lora_alpha": 128,
"lora_dropout": 0.1,
"use_4bit": true,
"bf16": true,
"num_train_epochs": 3,
"per_device_train_batch_size": 2,
"gradient_accumulation_steps": 4,
"learning_rate": 2e-4,
"max_seq_length": 2048,
"eval_split": 0.2,
"eval_steps": 100
}{
"task": "text-generation",
"model_name": "meta-llama/Llama-4-Maverick-17B-128E-Instruct",
"provider": "unsloth",
"strategy": "sft",
"dataset": "/path/to/dataset.jsonl",
"compute_specs": "high_end",
"lora_r": 64,
"lora_alpha": 128,
"lora_dropout": 0.05,
"use_4bit": false,
"bf16": true,
"num_train_epochs": 3,
"per_device_train_batch_size": 4,
"gradient_accumulation_steps": 2,
"learning_rate": 2e-4,
"max_seq_length": 4096,
"eval_split": 0.1,
"eval_steps": 50
}The configuration schema enforces these validation rules:
- task: Must be in
["text-generation", "summarization", "extractive-question-answering"] - strategy: Must be in
["sft", "qlora", "rlhf", "dpo"] - provider: Must be in
["huggingface", "unsloth"] - model_name: Cannot be empty
- dataset: Cannot be empty
- num_train_epochs: Must be ≥ 1
- per_device_train_batch_size: Must be ≥ 1
- learning_rate: Must be > 0 and ≤ 1
- lora_r: Must be between 1 and 256
- eval_split: Must be between 0 and 1
- REST API Documentation - API endpoints
- Response Formats - API response structures
- Configuration Guide - User-friendly configuration guide
Complete schema reference for ModelForge training configuration.