Common questions and answers about ModelForge.
ModelForge is a no-code toolkit for fine-tuning Large Language Models on your local GPU. It provides a web-based UI for training custom models without writing code.
Yes! ModelForge is open-source under the MIT license. You can use it freely for personal or commercial projects.
v3 builds on the v2 architecture with:
- Apple Silicon (MPS) support — train on M1/M2/M3/M4/M5 Macs natively
- Interactive CLI wizard (
modelforge cli) for headless/SSH environments - Optional quantization — bitsandbytes moved to
[quantization]extra for a lighter base install - Schema validation — catches incompatible config combinations at startup
All existing v2 workflows remain compatible.
NVIDIA GPU (Windows/Linux):
- Python 3.11.x
- NVIDIA GPU with 4GB+ VRAM
- CUDA 11.8 or 12.x
- 8GB RAM
- 10GB free disk space
Apple Silicon Mac (macOS):
- Python 3.11.x
- Apple Silicon Mac (M1 or later) with 8GB+ unified memory
- macOS 12.3 or later
- 10GB free disk space
Yes, but training will be extremely slow (10-100x slower). Only recommended for testing with very small models.
Currently, only NVIDIA GPUs are supported due to CUDA requirements.
Yes — Apple Silicon Macs are supported via PyTorch's MPS backend (added in v3).
Supported chips: M1, M2, M3, M4, M5. Intel Macs are not supported (no MPS).
Limitations on MPS:
- HuggingFace provider only (Unsloth requires NVIDIA CUDA)
- No 4-bit/8-bit quantization (bitsandbytes is CUDA-only)
- Smaller models recommended (1–7B depending on unified memory)
See the macOS Installation Guide for full setup instructions.
Some dependencies don't yet support Python 3.12. We'll add support when the ecosystem catches up.
nvcc --versionYes! Create environment with Python 3.11:
conda create -n modelforge python=3.11
conda activate modelforge
pip install modelforge-finetuningMake sure you're using Python 3.11:
python --version # Should show 3.11.xYes! The HuggingFace provider works perfectly on native Windows. For Unsloth support, use WSL or Docker.
Unsloth requires Linux-specific libraries and compilation. Use WSL 2 or Docker for Unsloth support.
Open PowerShell as Administrator:
wsl --install -d Ubuntu-22.04See Windows Installation Guide for details.
No! WSL uses your Windows NVIDIA drivers automatically. Only install CUDA Toolkit in WSL.
Depends on model size, dataset size, and hardware:
| Model Size | Dataset | GPU | Provider | Time |
|---|---|---|---|---|
| 1B | 500 examples | RTX 3060 | HuggingFace | 10 min |
| 1B | 500 examples | RTX 3060 | Unsloth | 5 min |
| 7B | 1000 examples | RTX 3090 | HuggingFace | 90 min |
| 7B | 1000 examples | RTX 3090 | Unsloth | 45 min |
| Model Size | Minimum VRAM | Recommended VRAM |
|---|---|---|
| < 1B | 4GB | 6GB |
| 1-3B | 6GB | 8GB |
| 3-7B | 8GB | 12GB |
| 7-13B | 12GB | 16GB |
| 13B+ | 16GB | 24GB+ |
Use QLoRA strategy to reduce memory usage by 30-50%.
Not recommended. Training uses all available VRAM. Train one model at a time.
Currently not supported. Training must complete in one session. Use checkpointing to save progress.
Monitor in the UI:
- Loss should decrease over time
- Accuracy/metrics should improve
- No errors in console
JSONL (JSON Lines). Each line is one JSON object.
See Dataset Formats for details.
Minimum: 100 examples
Recommended: 1,000+ examples
Optimal: 5,000+ examples
Quality > quantity!
Yes! Convert them to JSONL format:
from datasets import load_dataset
import json
dataset = load_dataset("your-dataset", split="train")
with open('dataset.jsonl', 'w') as f:
for item in dataset:
f.write(json.dumps({"input": item["text"], "output": item["label"]}) + '\n')Check the license of your data. Fine-tuning on copyrighted material may violate copyright laws.
All HuggingFace models compatible with Transformers:
- Llama (1, 2, 3)
- Mistral
- Qwen
- Gemma
- Phi
- BART
- T5
- And many more!
Yes! You need:
- Accept license on HuggingFace
- Use access token with proper permissions
Yes! Provide local path instead of HuggingFace ID:
{
"model_name": "/path/to/local/model"
}Navigate to Models tab in UI and click Download, or find checkpoints in:
- Linux:
~/.local/share/modelforge/model_checkpoints/ - Windows:
C:\Users\<user>\AppData\Local\modelforge\model_checkpoints\
| Feature | HuggingFace | Unsloth |
|---|---|---|
| Speed | 1x | 2x |
| Memory | Baseline | -20% |
| Platform | All | Linux/WSL/Docker |
| Compatibility | All models | Llama, Mistral, Qwen, Gemma, Phi |
Use Unsloth if:
- Running on Linux or WSL
- Using supported models
- Need faster training
- Have limited VRAM
Use HuggingFace if:
- On native Windows
- Using unsupported models
- Maximum compatibility needed
Unsloth pre-allocates memory for performance. Auto-inference (-1) is not supported.
SFT: Standard supervised fine-tuning with LoRA QLoRA: Quantized LoRA using 4-bit quantization for lower memory
QLoRA uses 30-50% less memory with minimal quality loss.
In ModelForge, both RLHF and DPO use TRL's DPOTrainer internally. The difference is in default hyperparameters:
- RLHF: More conservative defaults (lr=1.41e-5, 1 epoch) — suited for careful alignment
- DPO: Standard defaults (lr=5e-7, 3 epochs) — good general-purpose preference learning
Both require preference data (prompt/chosen/rejected) and "task": "text-generation". Start with SFT for most use cases.
modelforge cli launches an 8-step interactive wizard in the terminal. It's ideal for headless servers, SSH sessions, or Jupyter notebooks where a browser isn't available. Install the CLI extra first:
pip install modelforge-finetuning[cli]bitsandbytes is now an optional dependency. Install it with:
pip install modelforge-finetuning[quantization]Legacy tuners were removed in v2.1. Use the provider/strategy pattern instead:
{
"provider": "huggingface",
"strategy": "sft"
}Both use TRL's DPOTrainer internally — no PPO, no reward model needed. The only difference is default hyperparameters. RLHF uses more conservative settings (lower learning rate, fewer epochs) while DPO uses standard settings.
No. Choose one strategy per training run.
Solutions:
- Reduce
per_device_train_batch_size - Use QLoRA strategy
- Reduce
max_seq_length - Enable
gradient_checkpointing - Use smaller model
Check:
- Model ID is correct
- HuggingFace token is set
- You have access to gated models
- Internet connection is working
Check:
- File is valid JSONL
- Required fields are present
- All values are strings
- At least 10 examples
Check:
- Provider is installed (
pip install unsloth) - Provider name is correct (
huggingfaceorunsloth) - On correct platform (Unsloth needs Linux/WSL)
- Use Unsloth provider (2x faster)
- Use QLoRA strategy
- Increase batch size if you have VRAM
- Use fp16 or bf16
- Enable gradient checkpointing
Best practices:
- Use Unsloth provider
- Larger batch size (if VRAM allows)
- Use bf16 on Ampere+ GPUs (RTX 30xx/40xx)
- Reduce gradient accumulation steps
- Use NVMe SSD for dataset storage
Best practices:
- Use QLoRA strategy
- Enable 4-bit quantization
- Reduce batch size
- Reduce max_seq_length
- Enable gradient checkpointing
Yes! ModelForge provides a REST API. See API Documentation.
Yes! Use the API or import ModelForge as a library.
Not yet, but you can use the REST API with requests.
Yes! ModelForge is production-ready. Deploy with:
- Docker containers
- Systemd services
- Cloud platforms (AWS, GCP, Azure)
Use reverse proxy (nginx/Apache) with SSL:
server {
listen 443 ssl;
location / {
proxy_pass http://localhost:8000;
}
}Yes! ModelForge uses SQLAlchemy. Easy to switch to PostgreSQL.
See Contributing Guide for:
- Reporting bugs
- Suggesting features
- Submitting PRs
- Adding model configurations
Create provider class and register in factory. See Custom Providers.
Add JSON config file to model_configs/. See Model Configurations.
- Check Troubleshooting Guide
- Search GitHub Issues
- Ask in GitHub Discussions
- Create new issue if bug
Can't find your answer? Ask in GitHub Discussions!