model-configs.md

Adding Model Configurations

Learn how to contribute model recommendations to ModelForge.

Overview

ModelForge uses a modular configuration system for model recommendations. Each hardware profile has its own JSON configuration file specifying recommended models for different tasks.

Directory Structure

ModelForge/model_configs/
├── low_end.json         # 4-8GB VRAM
├── mid_range.json       # 8-16GB VRAM
└── high_end.json        # 16GB+ VRAM

Configuration Schema

Each configuration file follows this structure:

{
  "profile": "profile_name",
  "tasks": {
    "task_name": {
      "primary": "best_model_id",
      "alternatives": ["model1", "model2", "model3"]
    }
  }
}

Fields

profile (string): Hardware profile name (must match filename without .json)
tasks (object): Task configurations
- task_name (string): One of: text-generation, summarization, extractive-question-answering
- primary (string): Default recommended model for this task/profile
- alternatives (array): List of additional recommended models

Hardware Profiles

low_end.json (4-8GB VRAM)

Target Hardware:

NVIDIA GTX 1060 (6GB)
NVIDIA GTX 1070 (8GB)
NVIDIA RTX 3050 (8GB)

Model Criteria:

Parameter count: < 3B
VRAM usage with 4-bit: < 6GB
Fast inference

Example:

{
  "profile": "low_end",
  "tasks": {
    "text-generation": {
      "primary": "meta-llama/Llama-3.2-1B",
      "alternatives": [
        "microsoft/phi-2",
        "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
      ]
    },
    "summarization": {
      "primary": "facebook/bart-base",
      "alternatives": [
        "google-t5/t5-small",
        "sshleifer/distilbart-cnn-12-6"
      ]
    }
  }
}

mid_range.json (8-16GB VRAM)

Target Hardware:

NVIDIA RTX 3060 (12GB)
NVIDIA RTX 2080 Ti (11GB)
NVIDIA RTX 3080 (10-12GB)

Model Criteria:

Parameter count: 3-7B
VRAM usage with 4-bit: 6-12GB
Good quality/performance balance

Example:

{
  "profile": "mid_range",
  "tasks": {
    "text-generation": {
      "primary": "meta-llama/Llama-3.2-3B",
      "alternatives": [
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/phi-3-mini-4k-instruct"
      ]
    },
    "summarization": {
      "primary": "facebook/bart-large",
      "alternatives": [
        "google-t5/t5-base",
        "philschmid/bart-large-cnn-samsum"
      ]
    }
  }
}

high_end.json (16GB+ VRAM)

Target Hardware:

NVIDIA RTX 3090 (24GB)
NVIDIA RTX 4090 (24GB)
NVIDIA A100 (40-80GB)

Model Criteria:

Parameter count: 7B+
Highest quality
State-of-the-art performance

Example:

{
  "profile": "high_end",
  "tasks": {
    "text-generation": {
      "primary": "meta-llama/Llama-3.1-8B-Instruct",
      "alternatives": [
        "mistralai/Mistral-7B-Instruct-v0.3",
        "Qwen/Qwen2-7B-Instruct"
      ]
    },
    "summarization": {
      "primary": "facebook/bart-large-cnn",
      "alternatives": [
        "google-t5/t5-large",
        "google/pegasus-xsum"
      ]
    }
  }
}

Adding Models

1. Identify Hardware Profile

Determine which profile(s) the model fits:

Test VRAM usage with 4-bit quantization
Consider inference speed
Evaluate output quality

2. Edit Configuration File

Open the appropriate JSON file:

cd ModelForge/model_configs/
nano mid_range.json  # or low_end.json, high_end.json

3. Add Model

As Primary (replaces current default):

{
  "text-generation": {
    "primary": "new-org/new-model-7b",  // Changed
    "alternatives": ["old-primary-model", "other-model"]
  }
}

As Alternative (adds to list):

{
  "text-generation": {
    "primary": "current-primary-model",
    "alternatives": [
      "existing-model-1",
      "existing-model-2",
      "new-org/new-model-7b"  // Added
    ]
  }
}

4. Validate JSON

Ensure valid JSON syntax:

python -m json.tool mid_range.json

5. Test Locally

# Run ModelForge
modelforge

# Check that model appears in recommendations
# Try training with the new model

6. Submit Pull Request

git checkout -b add-model-recommendations
git add ModelForge/model_configs/
git commit -m "feat: add new-model-7b to mid_range recommendations"
git push origin add-model-recommendations

Create PR on GitHub with description of:

Model name and organization
Why it's a good fit for this profile
Test results (VRAM usage, quality, speed)

Model Selection Criteria

Quality Criteria

✅ DO include models that:

Are publicly accessible on HuggingFace
Have appropriate licenses (MIT, Apache 2.0, etc.)
Perform well on relevant benchmarks
Are actively maintained
Have good documentation
Work with standard Transformers library

❌ DON'T include models that:

Are gated without clear access process
Have restrictive licenses
Are deprecated or unmaintained
Require special dependencies
Have known critical issues
Are inappropriate for general use

Hardware Compatibility

Verify VRAM usage:

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model_id = "your-model/model-name"

# 4-bit quantization config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quant_config,
    device_map="auto"
)

# Check VRAM usage
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print(f"VRAM used: {info.used / 1024**3:.2f} GB")

Performance Testing

Test training speed:

from time import time

start = time()
# Run sample training
duration = time() - start

print(f"Training time: {duration:.2f} seconds")

Task-Specific Considerations

Text Generation

Focus on instruction-tuned models
Prefer models with chat templates
Consider context window size
Test prompt following ability

Good examples:

meta-llama/Llama-3.2-3B-Instruct
mistralai/Mistral-7B-Instruct-v0.3
microsoft/phi-3-mini-4k-instruct

Summarization

Prefer models trained on summarization tasks
Check ROUGE scores on standard benchmarks
Consider domain (news, legal, medical, etc.)

Good examples:

facebook/bart-large-cnn
google-t5/t5-base
philschmid/bart-large-cnn-samsum

Question Answering

Prefer models trained on QA datasets
Check F1 and EM scores on SQuAD
Consider retrieval-augmented use cases

Good examples:

deepset/roberta-base-squad2
bert-large-uncased-whole-word-masking-finetuned-squad

Best Practices

1. Order Alternatives by Quality

List best alternatives first:

{
  "alternatives": [
    "highest-quality-model",
    "good-quality-model",
    "acceptable-model"
  ]
}

2. Include Diversity

Provide options with different trade-offs:

Speed vs quality
Size vs performance
General vs specialized

3. Keep Updated

Remove deprecated models
Add new state-of-the-art models
Update based on community feedback

4. Document Changes

In PR description, include:

Benchmark results
VRAM measurements
Training speed tests
Quality comparisons

Example PR Description

## Add Qwen2-7B to mid_range recommendations

**Model**: Qwen/Qwen2-7B-Instruct

**Profile**: mid_range (8-16GB VRAM)

**Task**: text-generation

**Tests**:
- VRAM usage (4-bit): 8.2 GB ✅
- Training speed: ~25 tokens/sec on RTX 3060 ✅
- Quality: Excellent instruction following
- License: Apache 2.0 ✅

**Benchmarks**:
- MMLU: 68.2
- GSM8K: 76.5
- HumanEval: 52.1

**Why add**:
- Performs better than current alternatives
- Efficient memory usage
- Strong multilingual support
- Active community support

**Changes**:
- Added to mid_range.json as alternative
- Tested successful fine-tuning

Troubleshooting

Model Not Appearing

Check:

JSON syntax is valid
Profile name matches filename
Model ID is correct
Restart ModelForge

Model Fails to Load

Check:

Model is publicly accessible
HuggingFace token has permissions
Model is compatible with Transformers
No gating issues

Questions?

Thank you for improving ModelForge's model recommendations! 🤖

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Model Configurations

Overview

Directory Structure

Configuration Schema

Fields

Hardware Profiles

low_end.json (4-8GB VRAM)

mid_range.json (8-16GB VRAM)

high_end.json (16GB+ VRAM)

Adding Models

1. Identify Hardware Profile

2. Edit Configuration File

3. Add Model

4. Validate JSON

5. Test Locally

6. Submit Pull Request

Model Selection Criteria

Quality Criteria

Hardware Compatibility

Performance Testing

Task-Specific Considerations

Text Generation

Summarization

Question Answering

Best Practices

1. Order Alternatives by Quality

2. Include Diversity

3. Keep Updated

4. Document Changes

Example PR Description

Troubleshooting

Model Not Appearing

Model Fails to Load

Questions?

FilesExpand file tree

model-configs.md

Latest commit

History

model-configs.md

File metadata and controls

Adding Model Configurations

Overview

Directory Structure

Configuration Schema

Fields

Hardware Profiles

low_end.json (4-8GB VRAM)

mid_range.json (8-16GB VRAM)

high_end.json (16GB+ VRAM)

Adding Models

1. Identify Hardware Profile

2. Edit Configuration File

3. Add Model

4. Validate JSON

5. Test Locally

6. Submit Pull Request

Model Selection Criteria

Quality Criteria

Hardware Compatibility

Performance Testing

Task-Specific Considerations

Text Generation

Summarization

Question Answering

Best Practices

1. Order Alternatives by Quality

2. Include Diversity

3. Keep Updated

4. Document Changes

Example PR Description

Troubleshooting

Model Not Appearing

Model Fails to Load

Questions?