Skip to content

docs: add benchmark reproduction tooling#138

Merged
SantiagoDePolonia merged 2 commits intomainfrom
feat/benchmark-reproduction
Mar 12, 2026
Merged

docs: add benchmark reproduction tooling#138
SantiagoDePolonia merged 2 commits intomainfrom
feat/benchmark-reproduction

Conversation

@SantiagoDePolonia
Copy link
Copy Markdown
Contributor

@SantiagoDePolonia SantiagoDePolonia commented Mar 12, 2026

Summary

  • Add benchmark scripts (compare.sh, bench_main.go, plot_benchmark_charts.py) to docs/about/benchmark-tools/
  • Update docs/about/benchmarks.mdx Mintlify page with a "Reproduce it yourself" section including prerequisites, quick start, script reference table, and tuning instructions

Test plan

  • Verify compare.sh runs successfully with a valid GROQ_API_KEY
  • Verify plot_benchmark_charts.py generates charts from JSON results
  • Verify the Mintlify docs page renders correctly with the new section and file links

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Added a "Reproduce it yourself" section with prerequisites, quick start, and tuning guidance for running local benchmarks.
  • New Features

    • Introduced end-to-end benchmarking tools: high-concurrency gateway comparisons, optional process sampling, JSON result export, and automated chart generation summarizing throughput, latency percentiles, memory, CPU, and error metrics.

Add the benchmark comparison scripts (compare.sh, bench CLI source,
chart generator) to docs/about/benchmark-tools/ so anyone can reproduce
the GoModel vs LiteLLM benchmark. Update the benchmarks.mdx Mintlify
page with a "Reproduce it yourself" section including prerequisites,
quick start, and tuning instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 12, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 92660c76-d461-4530-a369-5f82e9f72b3f

📥 Commits

Reviewing files that changed from the base of the PR and between 8c3371e and ef1e311.

📒 Files selected for processing (1)
  • docs/about/benchmarks.mdx

📝 Walkthrough

Walkthrough

Adds a reproducible benchmarking toolkit: a Go CLI for high-concurrency chat-completion load testing, a Bash orchestration script to run comparisons across gateways, a Python plotting utility to generate charts from JSON results, and documentation to run and tune the benchmarks.

Changes

Cohort / File(s) Summary
Benchmark CLI
docs/about/benchmark-tools/bench_main.go
New Go main that issues concurrent HTTP POSTs to chat-completion endpoints, collects per-request latencies, token usage, error details, optional PID-based CPU/RSS sampling, aggregates metrics (latency percentiles, throughput, token totals), and emits JSON summaries (stdout/file).
Orchestration Script
docs/about/benchmark-tools/compare.sh
New Bash script to build binaries, discover/select models from Groq API, start/stop GoModel and LiteLLM gateways, run warmups and concurrency sweeps (including optional direct Groq baseline), save per-run JSON results, and generate a Markdown report.
Visualization & Docs
docs/about/benchmark-tools/plot_benchmark_charts.py, docs/about/benchmarks.mdx
New Python plotting tool that loads _c.json results, groups by gateway, plots throughput/latency/memory/CPU and a 2x2 dashboard, and updated docs with a "Reproduce it yourself" section (prereqs, quick start, tuning).

Sequence Diagram

sequenceDiagram
    participant Orchestrator as Orchestrator (compare.sh)
    participant Bench as Bench CLI
    participant GoModel as GoModel Gateway
    participant LiteLLM as LiteLLM Gateway
    participant Groq as Groq API
    participant Monitor as Process Monitor

    Orchestrator->>Orchestrator: validate env, select model, build binaries
    Orchestrator->>GoModel: start
    Orchestrator->>LiteLLM: start
    GoModel-->>Orchestrator: ready
    LiteLLM-->>Orchestrator: ready

    loop For each concurrency level
        Orchestrator->>Bench: invoke with concurrency, target gateway
        Bench->>Monitor: optional start sampling (PID, interval)
        par Concurrent requests
            Bench->>GoModel: POST chat completion
            GoModel->>Groq: forward request
            Groq-->>GoModel: response + usage
            GoModel-->>Bench: response
        and
            Bench->>LiteLLM: POST chat completion
            LiteLLM->>Groq: forward request
            Groq-->>LiteLLM: response + usage
            LiteLLM-->>Bench: response
        end
        Bench->>Monitor: collect CPU/RSS samples
        Bench->>Bench: record latency, tokens, errors
    end

    Bench->>Bench: aggregate metrics (p50,p95,p99,throughput,errors)
    Bench-->>Orchestrator: write per-run JSON
    Orchestrator->>Orchestrator: generate REPORT.md and charts
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇🚀 I hopped through endpoints, timers in paw,
Goroutines raced while I scribbled the law,
Metrics and charts like carrots displayed,
Gateways competed in benchmarks we made,
Crunch, plot, and share — a rabbit-approved cause!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: add benchmark reproduction tooling' accurately and concisely summarizes the main change: adding a set of new benchmark tools (shell script, Go CLI, and Python charting script) along with documentation updates to enable users to reproduce benchmarks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/benchmark-reproduction
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@SantiagoDePolonia SantiagoDePolonia merged commit 16dfece into main Mar 12, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant