docs: add benchmark reproduction tooling by SantiagoDePolonia · Pull Request #138 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-03-12T19:19:37Z

Summary

Add benchmark scripts (compare.sh, bench_main.go, plot_benchmark_charts.py) to docs/about/benchmark-tools/
Update docs/about/benchmarks.mdx Mintlify page with a "Reproduce it yourself" section including prerequisites, quick start, script reference table, and tuning instructions

Test plan

Verify compare.sh runs successfully with a valid GROQ_API_KEY
Verify plot_benchmark_charts.py generates charts from JSON results
Verify the Mintlify docs page renders correctly with the new section and file links

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added a "Reproduce it yourself" section with prerequisites, quick start, and tuning guidance for running local benchmarks.
New Features
- Introduced end-to-end benchmarking tools: high-concurrency gateway comparisons, optional process sampling, JSON result export, and automated chart generation summarizing throughput, latency percentiles, memory, CPU, and error metrics.

Add the benchmark comparison scripts (compare.sh, bench CLI source, chart generator) to docs/about/benchmark-tools/ so anyone can reproduce the GoModel vs LiteLLM benchmark. Update the benchmarks.mdx Mintlify page with a "Reproduce it yourself" section including prerequisites, quick start, and tuning instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-12T19:19:55Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 92660c76-d461-4530-a369-5f82e9f72b3f

📥 Commits

Reviewing files that changed from the base of the PR and between 8c3371e and ef1e311.

📒 Files selected for processing (1)

docs/about/benchmarks.mdx

📝 Walkthrough

Walkthrough

Adds a reproducible benchmarking toolkit: a Go CLI for high-concurrency chat-completion load testing, a Bash orchestration script to run comparisons across gateways, a Python plotting utility to generate charts from JSON results, and documentation to run and tune the benchmarks.

Changes

Cohort / File(s)	Summary
Benchmark CLI `docs/about/benchmark-tools/bench_main.go`	New Go main that issues concurrent HTTP POSTs to chat-completion endpoints, collects per-request latencies, token usage, error details, optional PID-based CPU/RSS sampling, aggregates metrics (latency percentiles, throughput, token totals), and emits JSON summaries (stdout/file).
Orchestration Script `docs/about/benchmark-tools/compare.sh`	New Bash script to build binaries, discover/select models from Groq API, start/stop GoModel and LiteLLM gateways, run warmups and concurrency sweeps (including optional direct Groq baseline), save per-run JSON results, and generate a Markdown report.
Visualization & Docs `docs/about/benchmark-tools/plot_benchmark_charts.py`, `docs/about/benchmarks.mdx`	New Python plotting tool that loads _c.json results, groups by gateway, plots throughput/latency/memory/CPU and a 2x2 dashboard, and updated docs with a "Reproduce it yourself" section (prereqs, quick start, tuning).

Sequence Diagram

sequenceDiagram
    participant Orchestrator as Orchestrator (compare.sh)
    participant Bench as Bench CLI
    participant GoModel as GoModel Gateway
    participant LiteLLM as LiteLLM Gateway
    participant Groq as Groq API
    participant Monitor as Process Monitor

    Orchestrator->>Orchestrator: validate env, select model, build binaries
    Orchestrator->>GoModel: start
    Orchestrator->>LiteLLM: start
    GoModel-->>Orchestrator: ready
    LiteLLM-->>Orchestrator: ready

    loop For each concurrency level
        Orchestrator->>Bench: invoke with concurrency, target gateway
        Bench->>Monitor: optional start sampling (PID, interval)
        par Concurrent requests
            Bench->>GoModel: POST chat completion
            GoModel->>Groq: forward request
            Groq-->>GoModel: response + usage
            GoModel-->>Bench: response
        and
            Bench->>LiteLLM: POST chat completion
            LiteLLM->>Groq: forward request
            Groq-->>LiteLLM: response + usage
            LiteLLM-->>Bench: response
        end
        Bench->>Monitor: collect CPU/RSS samples
        Bench->>Bench: record latency, tokens, errors
    end

    Bench->>Bench: aggregate metrics (p50,p95,p99,throughput,errors)
    Bench-->>Orchestrator: write per-run JSON
    Orchestrator->>Orchestrator: generate REPORT.md and charts

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇🚀 I hopped through endpoints, timers in paw,
Goroutines raced while I scribbled the law,
Metrics and charts like carrots displayed,
Gateways competed in benchmarks we made,
Crunch, plot, and share — a rabbit-approved cause!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs: add benchmark reproduction tooling' accurately and concisely summarizes the main change: adding a set of new benchmark tools (shell script, Go CLI, and Python charting script) along with documentation updates to enable users to reproduce benchmarks.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/benchmark-reproduction

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: update Go version prerequisite to 1.25+

ef1e311

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SantiagoDePolonia merged commit 16dfece into main Mar 12, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add benchmark reproduction tooling#138

docs: add benchmark reproduction tooling#138
SantiagoDePolonia merged 2 commits intomainfrom
feat/benchmark-reproduction

SantiagoDePolonia commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 12, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SantiagoDePolonia commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SantiagoDePolonia commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading