Skip to content

Add model quality scoring system with improvement suggestions #104

@noahgift

Description

@noahgift

Summary

Implement a model quality scoring system for trained ML models, inspired by the dataset quality scoring in alimentar (GH-6) and rust-project-score in paiml-mcp-agent-toolkit.

Toyota Way Alignment

  • Jidoka: Build quality in at every step
  • Andon: Visual management for immediate status understanding
  • Standard Work: Define "what good looks like" for model quality

Features

100-Point Weighted Scoring

Severity Weight Examples
Critical 2.0x Model loads successfully, no NaN weights
High 1.5x Reasonable parameter count, no zero-variance layers
Medium 1.0x Balanced layer sizes, no dead neurons
Low 0.5x Documentation present, metadata complete

Letter Grades

  • A (95-100): Deploy immediately
  • B (85-94): Deploy with monitoring
  • C (70-84): Review before deployment
  • D (50-69): Significant issues
  • F (<50): Do not deploy

CLI Interface

# Basic quality score
aprender model score model.apr

# With improvement suggestions
aprender model score model.apr --suggest

# JSON output for CI/CD
aprender model score model.apr --json

# Badge URL for README
aprender model score model.apr --badge

Checklist Items (Draft)

Critical:

  1. Model file loads without errors
  2. No NaN/Inf values in weights
  3. Architecture is valid (layers connect properly)

High:
4. Parameter count is reasonable (<1B for most use cases)
5. No zero-variance layers (dead layers)
6. Input/output dimensions documented

Medium:
7. Layer sizes are balanced (no extreme bottlenecks)
8. Activation functions are appropriate
9. Model size is reasonable for deployment target
10. Training metadata present (epochs, loss, etc.)

Low:
11. Model description/documentation present
12. Version information included
13. Compatible format version

Output Format

Text (Andon-style)

═══════════════════════════════════════════════════════════════
  Model Quality Score: ✓ A (97.5%)  
  Decision: Deploy immediately  
═══════════════════════════════════════════════════════════════

File: model.apr
Points: 19.5 / 20.0

Severity Breakdown:
  ✓ Critical: 3/3 passed (6.0/6.0 pts)
  ✓ High    : 3/3 passed (4.5/4.5 pts)
  ✓ Medium  : 4/4 passed (4.0/4.0 pts)
  ✗ Low     : 4/5 passed (4.0/5.0 pts)

JSON

{
  "score": 97.50,
  "grade": "A",
  "is_deployable": true,
  "decision": "Deploy immediately",
  "badge_url": "https://img.shields.io/badge/model_quality-A_(98%25)-brightgreen"
}

Acceptance Criteria

  • Score calculation matches 100-point checklist weights
  • JSON output format validated
  • CLI commands implemented (aprender model score)
  • Badge URL generation (shields.io compatible)
  • Works with .apr, SafeTensors, and GGUF formats
  • Integration with make quality target

References


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions