Summary
Implement a model quality scoring system for trained ML models, inspired by the dataset quality scoring in alimentar (GH-6) and rust-project-score in paiml-mcp-agent-toolkit.
Toyota Way Alignment
- Jidoka: Build quality in at every step
- Andon: Visual management for immediate status understanding
- Standard Work: Define "what good looks like" for model quality
Features
100-Point Weighted Scoring
| Severity |
Weight |
Examples |
| Critical |
2.0x |
Model loads successfully, no NaN weights |
| High |
1.5x |
Reasonable parameter count, no zero-variance layers |
| Medium |
1.0x |
Balanced layer sizes, no dead neurons |
| Low |
0.5x |
Documentation present, metadata complete |
Letter Grades
- A (95-100): Deploy immediately
- B (85-94): Deploy with monitoring
- C (70-84): Review before deployment
- D (50-69): Significant issues
- F (<50): Do not deploy
CLI Interface
# Basic quality score
aprender model score model.apr
# With improvement suggestions
aprender model score model.apr --suggest
# JSON output for CI/CD
aprender model score model.apr --json
# Badge URL for README
aprender model score model.apr --badge
Checklist Items (Draft)
Critical:
- Model file loads without errors
- No NaN/Inf values in weights
- Architecture is valid (layers connect properly)
High:
4. Parameter count is reasonable (<1B for most use cases)
5. No zero-variance layers (dead layers)
6. Input/output dimensions documented
Medium:
7. Layer sizes are balanced (no extreme bottlenecks)
8. Activation functions are appropriate
9. Model size is reasonable for deployment target
10. Training metadata present (epochs, loss, etc.)
Low:
11. Model description/documentation present
12. Version information included
13. Compatible format version
Output Format
Text (Andon-style)
═══════════════════════════════════════════════════════════════
Model Quality Score: ✓ A (97.5%)
Decision: Deploy immediately
═══════════════════════════════════════════════════════════════
File: model.apr
Points: 19.5 / 20.0
Severity Breakdown:
✓ Critical: 3/3 passed (6.0/6.0 pts)
✓ High : 3/3 passed (4.5/4.5 pts)
✓ Medium : 4/4 passed (4.0/4.0 pts)
✗ Low : 4/5 passed (4.0/5.0 pts)
JSON
{
"score": 97.50,
"grade": "A",
"is_deployable": true,
"decision": "Deploy immediately",
"badge_url": "https://img.shields.io/badge/model_quality-A_(98%25)-brightgreen"
}
Acceptance Criteria
References
🤖 Generated with Claude Code
Summary
Implement a model quality scoring system for trained ML models, inspired by the dataset quality scoring in
alimentar(GH-6) andrust-project-scorein paiml-mcp-agent-toolkit.Toyota Way Alignment
Features
100-Point Weighted Scoring
Letter Grades
CLI Interface
Checklist Items (Draft)
Critical:
High:
4. Parameter count is reasonable (<1B for most use cases)
5. No zero-variance layers (dead layers)
6. Input/output dimensions documented
Medium:
7. Layer sizes are balanced (no extreme bottlenecks)
8. Activation functions are appropriate
9. Model size is reasonable for deployment target
10. Training metadata present (epochs, loss, etc.)
Low:
11. Model description/documentation present
12. Version information included
13. Compatible format version
Output Format
Text (Andon-style)
JSON
{ "score": 97.50, "grade": "A", "is_deployable": true, "decision": "Deploy immediately", "badge_url": "https://img.shields.io/badge/model_quality-A_(98%25)-brightgreen" }Acceptance Criteria
aprender model score).apr, SafeTensors, and GGUF formatsmake qualitytargetReferences
🤖 Generated with Claude Code