Learn how to contribute model recommendations to ModelForge.
ModelForge uses a modular configuration system for model recommendations. Each hardware profile has its own JSON configuration file specifying recommended models for different tasks.
ModelForge/model_configs/
├── low_end.json # 4-8GB VRAM
├── mid_range.json # 8-16GB VRAM
└── high_end.json # 16GB+ VRAM
Each configuration file follows this structure:
{
"profile": "profile_name",
"tasks": {
"task_name": {
"primary": "best_model_id",
"alternatives": ["model1", "model2", "model3"]
}
}
}- profile (string): Hardware profile name (must match filename without .json)
- tasks (object): Task configurations
- task_name (string): One of:
text-generation,summarization,extractive-question-answering - primary (string): Default recommended model for this task/profile
- alternatives (array): List of additional recommended models
- task_name (string): One of:
Target Hardware:
- NVIDIA GTX 1060 (6GB)
- NVIDIA GTX 1070 (8GB)
- NVIDIA RTX 3050 (8GB)
Model Criteria:
- Parameter count: < 3B
- VRAM usage with 4-bit: < 6GB
- Fast inference
Example:
{
"profile": "low_end",
"tasks": {
"text-generation": {
"primary": "meta-llama/Llama-3.2-1B",
"alternatives": [
"microsoft/phi-2",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0"
]
},
"summarization": {
"primary": "facebook/bart-base",
"alternatives": [
"google-t5/t5-small",
"sshleifer/distilbart-cnn-12-6"
]
}
}
}Target Hardware:
- NVIDIA RTX 3060 (12GB)
- NVIDIA RTX 2080 Ti (11GB)
- NVIDIA RTX 3080 (10-12GB)
Model Criteria:
- Parameter count: 3-7B
- VRAM usage with 4-bit: 6-12GB
- Good quality/performance balance
Example:
{
"profile": "mid_range",
"tasks": {
"text-generation": {
"primary": "meta-llama/Llama-3.2-3B",
"alternatives": [
"mistralai/Mistral-7B-Instruct-v0.3",
"microsoft/phi-3-mini-4k-instruct"
]
},
"summarization": {
"primary": "facebook/bart-large",
"alternatives": [
"google-t5/t5-base",
"philschmid/bart-large-cnn-samsum"
]
}
}
}Target Hardware:
- NVIDIA RTX 3090 (24GB)
- NVIDIA RTX 4090 (24GB)
- NVIDIA A100 (40-80GB)
Model Criteria:
- Parameter count: 7B+
- Highest quality
- State-of-the-art performance
Example:
{
"profile": "high_end",
"tasks": {
"text-generation": {
"primary": "meta-llama/Llama-3.1-8B-Instruct",
"alternatives": [
"mistralai/Mistral-7B-Instruct-v0.3",
"Qwen/Qwen2-7B-Instruct"
]
},
"summarization": {
"primary": "facebook/bart-large-cnn",
"alternatives": [
"google-t5/t5-large",
"google/pegasus-xsum"
]
}
}
}Determine which profile(s) the model fits:
- Test VRAM usage with 4-bit quantization
- Consider inference speed
- Evaluate output quality
Open the appropriate JSON file:
cd ModelForge/model_configs/
nano mid_range.json # or low_end.json, high_end.jsonAs Primary (replaces current default):
{
"text-generation": {
"primary": "new-org/new-model-7b", // Changed
"alternatives": ["old-primary-model", "other-model"]
}
}As Alternative (adds to list):
{
"text-generation": {
"primary": "current-primary-model",
"alternatives": [
"existing-model-1",
"existing-model-2",
"new-org/new-model-7b" // Added
]
}
}Ensure valid JSON syntax:
python -m json.tool mid_range.json# Run ModelForge
modelforge
# Check that model appears in recommendations
# Try training with the new modelgit checkout -b add-model-recommendations
git add ModelForge/model_configs/
git commit -m "feat: add new-model-7b to mid_range recommendations"
git push origin add-model-recommendationsCreate PR on GitHub with description of:
- Model name and organization
- Why it's a good fit for this profile
- Test results (VRAM usage, quality, speed)
✅ DO include models that:
- Are publicly accessible on HuggingFace
- Have appropriate licenses (MIT, Apache 2.0, etc.)
- Perform well on relevant benchmarks
- Are actively maintained
- Have good documentation
- Work with standard Transformers library
❌ DON'T include models that:
- Are gated without clear access process
- Have restrictive licenses
- Are deprecated or unmaintained
- Require special dependencies
- Have known critical issues
- Are inappropriate for general use
Verify VRAM usage:
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
model_id = "your-model/model-name"
# 4-bit quantization config
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16
)
# Load model
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quant_config,
device_map="auto"
)
# Check VRAM usage
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print(f"VRAM used: {info.used / 1024**3:.2f} GB")Test training speed:
from time import time
start = time()
# Run sample training
duration = time() - start
print(f"Training time: {duration:.2f} seconds")- Focus on instruction-tuned models
- Prefer models with chat templates
- Consider context window size
- Test prompt following ability
Good examples:
meta-llama/Llama-3.2-3B-Instructmistralai/Mistral-7B-Instruct-v0.3microsoft/phi-3-mini-4k-instruct
- Prefer models trained on summarization tasks
- Check ROUGE scores on standard benchmarks
- Consider domain (news, legal, medical, etc.)
Good examples:
facebook/bart-large-cnngoogle-t5/t5-basephilschmid/bart-large-cnn-samsum
- Prefer models trained on QA datasets
- Check F1 and EM scores on SQuAD
- Consider retrieval-augmented use cases
Good examples:
deepset/roberta-base-squad2bert-large-uncased-whole-word-masking-finetuned-squad
List best alternatives first:
{
"alternatives": [
"highest-quality-model",
"good-quality-model",
"acceptable-model"
]
}Provide options with different trade-offs:
- Speed vs quality
- Size vs performance
- General vs specialized
- Remove deprecated models
- Add new state-of-the-art models
- Update based on community feedback
In PR description, include:
- Benchmark results
- VRAM measurements
- Training speed tests
- Quality comparisons
## Add Qwen2-7B to mid_range recommendations
**Model**: Qwen/Qwen2-7B-Instruct
**Profile**: mid_range (8-16GB VRAM)
**Task**: text-generation
**Tests**:
- VRAM usage (4-bit): 8.2 GB ✅
- Training speed: ~25 tokens/sec on RTX 3060 ✅
- Quality: Excellent instruction following
- License: Apache 2.0 ✅
**Benchmarks**:
- MMLU: 68.2
- GSM8K: 76.5
- HumanEval: 52.1
**Why add**:
- Performs better than current alternatives
- Efficient memory usage
- Strong multilingual support
- Active community support
**Changes**:
- Added to mid_range.json as alternative
- Tested successful fine-tuningCheck:
- JSON syntax is valid
- Profile name matches filename
- Model ID is correct
- Restart ModelForge
Check:
- Model is publicly accessible
- HuggingFace token has permissions
- Model is compatible with Transformers
- No gating issues
- See Contributing Guide
- Ask in GitHub Discussions
- Check FAQ
Thank you for improving ModelForge's model recommendations! 🤖