Skip to content

Latest commit

 

History

History
423 lines (327 loc) · 8.43 KB

File metadata and controls

423 lines (327 loc) · 8.43 KB

Adding Model Configurations

Learn how to contribute model recommendations to ModelForge.

Overview

ModelForge uses a modular configuration system for model recommendations. Each hardware profile has its own JSON configuration file specifying recommended models for different tasks.

Directory Structure

ModelForge/model_configs/
├── low_end.json         # 4-8GB VRAM
├── mid_range.json       # 8-16GB VRAM
└── high_end.json        # 16GB+ VRAM

Configuration Schema

Each configuration file follows this structure:

{
  "profile": "profile_name",
  "tasks": {
    "task_name": {
      "primary": "best_model_id",
      "alternatives": ["model1", "model2", "model3"]
    }
  }
}

Fields

  • profile (string): Hardware profile name (must match filename without .json)
  • tasks (object): Task configurations
    • task_name (string): One of: text-generation, summarization, extractive-question-answering
    • primary (string): Default recommended model for this task/profile
    • alternatives (array): List of additional recommended models

Hardware Profiles

low_end.json (4-8GB VRAM)

Target Hardware:

  • NVIDIA GTX 1060 (6GB)
  • NVIDIA GTX 1070 (8GB)
  • NVIDIA RTX 3050 (8GB)

Model Criteria:

  • Parameter count: < 3B
  • VRAM usage with 4-bit: < 6GB
  • Fast inference

Example:

{
  "profile": "low_end",
  "tasks": {
    "text-generation": {
      "primary": "meta-llama/Llama-3.2-1B",
      "alternatives": [
        "microsoft/phi-2",
        "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
      ]
    },
    "summarization": {
      "primary": "facebook/bart-base",
      "alternatives": [
        "google-t5/t5-small",
        "sshleifer/distilbart-cnn-12-6"
      ]
    }
  }
}

mid_range.json (8-16GB VRAM)

Target Hardware:

  • NVIDIA RTX 3060 (12GB)
  • NVIDIA RTX 2080 Ti (11GB)
  • NVIDIA RTX 3080 (10-12GB)

Model Criteria:

  • Parameter count: 3-7B
  • VRAM usage with 4-bit: 6-12GB
  • Good quality/performance balance

Example:

{
  "profile": "mid_range",
  "tasks": {
    "text-generation": {
      "primary": "meta-llama/Llama-3.2-3B",
      "alternatives": [
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/phi-3-mini-4k-instruct"
      ]
    },
    "summarization": {
      "primary": "facebook/bart-large",
      "alternatives": [
        "google-t5/t5-base",
        "philschmid/bart-large-cnn-samsum"
      ]
    }
  }
}

high_end.json (16GB+ VRAM)

Target Hardware:

  • NVIDIA RTX 3090 (24GB)
  • NVIDIA RTX 4090 (24GB)
  • NVIDIA A100 (40-80GB)

Model Criteria:

  • Parameter count: 7B+
  • Highest quality
  • State-of-the-art performance

Example:

{
  "profile": "high_end",
  "tasks": {
    "text-generation": {
      "primary": "meta-llama/Llama-3.1-8B-Instruct",
      "alternatives": [
        "mistralai/Mistral-7B-Instruct-v0.3",
        "Qwen/Qwen2-7B-Instruct"
      ]
    },
    "summarization": {
      "primary": "facebook/bart-large-cnn",
      "alternatives": [
        "google-t5/t5-large",
        "google/pegasus-xsum"
      ]
    }
  }
}

Adding Models

1. Identify Hardware Profile

Determine which profile(s) the model fits:

  • Test VRAM usage with 4-bit quantization
  • Consider inference speed
  • Evaluate output quality

2. Edit Configuration File

Open the appropriate JSON file:

cd ModelForge/model_configs/
nano mid_range.json  # or low_end.json, high_end.json

3. Add Model

As Primary (replaces current default):

{
  "text-generation": {
    "primary": "new-org/new-model-7b",  // Changed
    "alternatives": ["old-primary-model", "other-model"]
  }
}

As Alternative (adds to list):

{
  "text-generation": {
    "primary": "current-primary-model",
    "alternatives": [
      "existing-model-1",
      "existing-model-2",
      "new-org/new-model-7b"  // Added
    ]
  }
}

4. Validate JSON

Ensure valid JSON syntax:

python -m json.tool mid_range.json

5. Test Locally

# Run ModelForge
modelforge

# Check that model appears in recommendations
# Try training with the new model

6. Submit Pull Request

git checkout -b add-model-recommendations
git add ModelForge/model_configs/
git commit -m "feat: add new-model-7b to mid_range recommendations"
git push origin add-model-recommendations

Create PR on GitHub with description of:

  • Model name and organization
  • Why it's a good fit for this profile
  • Test results (VRAM usage, quality, speed)

Model Selection Criteria

Quality Criteria

DO include models that:

  • Are publicly accessible on HuggingFace
  • Have appropriate licenses (MIT, Apache 2.0, etc.)
  • Perform well on relevant benchmarks
  • Are actively maintained
  • Have good documentation
  • Work with standard Transformers library

DON'T include models that:

  • Are gated without clear access process
  • Have restrictive licenses
  • Are deprecated or unmaintained
  • Require special dependencies
  • Have known critical issues
  • Are inappropriate for general use

Hardware Compatibility

Verify VRAM usage:

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model_id = "your-model/model-name"

# 4-bit quantization config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quant_config,
    device_map="auto"
)

# Check VRAM usage
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print(f"VRAM used: {info.used / 1024**3:.2f} GB")

Performance Testing

Test training speed:

from time import time

start = time()
# Run sample training
duration = time() - start

print(f"Training time: {duration:.2f} seconds")

Task-Specific Considerations

Text Generation

  • Focus on instruction-tuned models
  • Prefer models with chat templates
  • Consider context window size
  • Test prompt following ability

Good examples:

  • meta-llama/Llama-3.2-3B-Instruct
  • mistralai/Mistral-7B-Instruct-v0.3
  • microsoft/phi-3-mini-4k-instruct

Summarization

  • Prefer models trained on summarization tasks
  • Check ROUGE scores on standard benchmarks
  • Consider domain (news, legal, medical, etc.)

Good examples:

  • facebook/bart-large-cnn
  • google-t5/t5-base
  • philschmid/bart-large-cnn-samsum

Question Answering

  • Prefer models trained on QA datasets
  • Check F1 and EM scores on SQuAD
  • Consider retrieval-augmented use cases

Good examples:

  • deepset/roberta-base-squad2
  • bert-large-uncased-whole-word-masking-finetuned-squad

Best Practices

1. Order Alternatives by Quality

List best alternatives first:

{
  "alternatives": [
    "highest-quality-model",
    "good-quality-model",
    "acceptable-model"
  ]
}

2. Include Diversity

Provide options with different trade-offs:

  • Speed vs quality
  • Size vs performance
  • General vs specialized

3. Keep Updated

  • Remove deprecated models
  • Add new state-of-the-art models
  • Update based on community feedback

4. Document Changes

In PR description, include:

  • Benchmark results
  • VRAM measurements
  • Training speed tests
  • Quality comparisons

Example PR Description

## Add Qwen2-7B to mid_range recommendations

**Model**: Qwen/Qwen2-7B-Instruct

**Profile**: mid_range (8-16GB VRAM)

**Task**: text-generation

**Tests**:
- VRAM usage (4-bit): 8.2 GB ✅
- Training speed: ~25 tokens/sec on RTX 3060 ✅
- Quality: Excellent instruction following
- License: Apache 2.0 ✅

**Benchmarks**:
- MMLU: 68.2
- GSM8K: 76.5
- HumanEval: 52.1

**Why add**:
- Performs better than current alternatives
- Efficient memory usage
- Strong multilingual support
- Active community support

**Changes**:
- Added to mid_range.json as alternative
- Tested successful fine-tuning

Troubleshooting

Model Not Appearing

Check:

  1. JSON syntax is valid
  2. Profile name matches filename
  3. Model ID is correct
  4. Restart ModelForge

Model Fails to Load

Check:

  1. Model is publicly accessible
  2. HuggingFace token has permissions
  3. Model is compatible with Transformers
  4. No gating issues

Questions?


Thank you for improving ModelForge's model recommendations! 🤖