Skip to content

MauroDruwel/NIMStats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,001 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NIMStats Banner

CI Live Dashboard Models License: MIT PRs Welcome Stars


Community-driven benchmarking of 20 NVIDIA NIM models β€” fully automated, zero infra cost, self-hostable in minutes.


πŸš€ View Live Dashboard Β· πŸ“– Docs Β· 🀝 Contribute Β· πŸ’¬ Discussions


✨ What is NIMStats?

NIMStats automatically benchmarks 20 NVIDIA NIM models every hour using GitHub Actions and publishes the results to a beautiful, interactive dashboard. No servers, no subscriptions β€” just fork, add your API key, and go.

🏎️ Hourly Benchmarks πŸ“Š Interactive Charts πŸ” Zero Infrastructure 🌍 Fully Open-Source
Automatic via GitHub Actions Response time, throughput & trends Static site + free CI/CD Fork and self-host in minutes

⚑ Quick Start

Get your own benchmarking dashboard running in under 5 minutes.

1. Fork & Clone

git clone https://github.com/MauroDruwel/NIMStats.git
cd NIMStats

2. Get a Free API Key

Visit build.nvidia.com β†’ Create a free account β†’ Copy your API key.

3. Add the Secret

In your forked repo: Settings β†’ Secrets and variables β†’ Actions β†’ New repository secret

Name Value
NIM_API_KEY Your NVIDIA NIM API key

4. Deploy the Dashboard

Platform Steps
Cloudflare Pages Connect repo in Cloudflare Pages
GitHub Pages Settings β†’ Pages β†’ Deploy from main
Netlify / Vercel Connect repo for instant auto-deploy

5. Run Your First Benchmark

Actions β†’ Benchmark NVIDIA NIM Models β†’ Run workflow

That's it β€” your dashboard auto-refreshes every hour. ✨


πŸ“Š Dashboard Features

Tab What you get
πŸ“Š Overview 5 animated KPI cards Β· success trend charts Β· top-10 speed & throughput bars Β· model reliability pills
πŸ† Leaderboard Composite score rankings Β· sortable columns Β· SVG sparklines Β· trend indicators (↑↓→) Β· provider chips
πŸ”¬ Explorer Per-model deep dive Β· response time history chart Β· error breakdown donut Β· availability heatmap
⏱ Timeline Filterable run history (All / 24h / 48h / 7d) · expandable run cards with full per-model detail
βš”οΈ Compare Head-to-head overlay chart Β· win-rate stats Β· side-by-side metric comparison

πŸ€– Benchmarked Models

20 models across 11 providers β€” click to expand
Provider Model Highlight
DeepSeek deepseek-ai/deepseek-v4-flash Fast MoE, optimized for speed
DeepSeek deepseek-ai/deepseek-v4-pro Professional-grade reasoning
DeepSeek deepseek-ai/deepseek-v3.2 Latest with improved reasoning
Z-AI z-ai/glm-5.1 Superior code understanding
Z-AI z-ai/glm-4.7 Strong mathematical capabilities
MiniMax minimaxai/minimax-m2.7 Efficient inference model
MiniMax minimaxai/minimax-m2.5 Previous generation MiniMax
NVIDIA nvidia/nemotron-3-super-120b-a12b NVIDIA's 120B flagship
NVIDIA nvidia/nemotron-3-nano-omni-30b-a3b-reasoning Compact omni reasoning model
Moonshot moonshotai/kimi-k2.6 Context-optimized model
Moonshot moonshotai/kimi-k2-instruct Instruction-tuned Kimi
OpenAI openai/gpt-oss-120b Open-source 120B
Google google/gemma-4-31b-it Lightweight edge inference
Qwen qwen/qwen3-coder-480b-a35b-instruct Specialized coding (480B MoE)
Qwen qwen/qwen2.5-coder-32b-instruct Lightweight Qwen coder
Qwen qwen/qwen3.5-397b-a17b Flagship Qwen (397B)
Qwen qwen/qwen3.5-122b-a10b Mid-range Qwen 3.5 MoE
Mistral mistralai/devstral-2-123b-instruct-2512 Developer-focused (123B)
Mistral mistralai/mistral-large-3-675b-instruct-2512 Largest Mistral (675B)
Mistral mistralai/mistral-medium-3.5-128b Efficient medium-scale Mistral
Meta meta/llama-3_3-70b-instruct Llama 3.3 70B
Meta meta/llama-4-maverick-17b-128e-instruct Llama 4 Maverick (128 experts)
Meta meta/llama-3.2-90b-vision-instruct Multimodal 90B vision model
StepFun stepfun-ai/step-3.5-flash Ultra-fast flash model
StepFun stepfun-ai/step-3.7-flash Latest high-performance flash

πŸ—οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ GitHub Actions (every hour) ──────────────────────┐
β”‚                                                                               β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚   β”‚  Job 1 β€” Group A    β”‚        β”‚  Job 2 β€” Group B    β”‚  (run in parallel) β”‚
β”‚   β”‚  10 NIM models      β”‚        β”‚  10 NIM models      β”‚                    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                               β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚                    β”‚  Merge + commit β”‚  β†’ history.db updated in repo         β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                       β”‚
└───────────────────────────────────────────────────────────────────────────── β”˜
                                     β”‚
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  Static Dashboard   β”‚  rebuilds on each push
                          β”‚  (Pages / Netlify)  β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Parallel jobs = ~50% faster benchmarks ⚑


πŸ› οΈ Customization

Change the benchmark prompt

Edit PROMPT in scripts/test_models.py:

PROMPT = "Your custom prompt here"
Add or remove models

Edit ALL_MODELS in scripts/test_models.py:

ALL_MODELS = [
    "your/custom-model",
    # ...
]
Change the schedule

Edit .github/workflows/benchmark.yml:

- cron: '0 */6 * * *'  # Every 6 hours instead of every hour
Run locally
# Serve the dashboard
python3 -m http.server 8000
# Open http://localhost:8000

# Run benchmarks manually (requires NIM_API_KEY env var)
export NIM_API_KEY=your_key_here
python3 scripts/test_models.py

πŸ“¦ Data Storage

history.db is a SQLite database persisted in the repo β€” the single source of truth. The browser loads it via sql.js (WebAssembly) and queries it entirely client-side. scripts/results.json is a temporary per-job artifact that is never committed.

Schema:

runs          (id, timestamp, prompt, success_count, total_models, fastest_model, fastest_time)
model_results (run_id, model, success, error, response_time, tokens_generated, total_tokens, response)

Benchmark parameters: temperature: 0.7 Β· top_p: 0.9 Β· max_tokens: 500 Β· OpenAI-compatible API


🀝 Contributing

Contributions are what make the open-source community amazing. Any contribution you make is greatly appreciated!

  1. Fork the repository
  2. Create your feature branch: git checkout -b feat/amazing-feature
  3. Commit your changes: git commit -m 'feat: add amazing feature'
  4. Push to the branch: git push origin feat/amazing-feature
  5. Open a Pull Request

Ideas for contributions:

  • πŸ†• Add new NIM models to the benchmark list
  • πŸ“Š New chart types or dashboard widgets
  • 🌐 Internationalization / translations
  • πŸ› Bug fixes and performance improvements
  • πŸ“– Improve documentation

Please read through open Issues before starting β€” someone might already be working on it!


πŸ”— Resources


πŸ“„ License

Distributed under the MIT License. See LICENSE for details.


Made with ❀️ for the ML community · ⭐ Star this repo if you find it useful!

footer

About

πŸ“Š Automated hourly benchmarks for 20+ NVIDIA NIM models β€” interactive dashboard, zero infra, self-hostable. Open-source & community-driven.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors