docs: add benchmark reproduction tooling#138
Conversation
Add the benchmark comparison scripts (compare.sh, bench CLI source, chart generator) to docs/about/benchmark-tools/ so anyone can reproduce the GoModel vs LiteLLM benchmark. Update the benchmarks.mdx Mintlify page with a "Reproduce it yourself" section including prerequisites, quick start, and tuning instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a reproducible benchmarking toolkit: a Go CLI for high-concurrency chat-completion load testing, a Bash orchestration script to run comparisons across gateways, a Python plotting utility to generate charts from JSON results, and documentation to run and tune the benchmarks. Changes
Sequence DiagramsequenceDiagram
participant Orchestrator as Orchestrator (compare.sh)
participant Bench as Bench CLI
participant GoModel as GoModel Gateway
participant LiteLLM as LiteLLM Gateway
participant Groq as Groq API
participant Monitor as Process Monitor
Orchestrator->>Orchestrator: validate env, select model, build binaries
Orchestrator->>GoModel: start
Orchestrator->>LiteLLM: start
GoModel-->>Orchestrator: ready
LiteLLM-->>Orchestrator: ready
loop For each concurrency level
Orchestrator->>Bench: invoke with concurrency, target gateway
Bench->>Monitor: optional start sampling (PID, interval)
par Concurrent requests
Bench->>GoModel: POST chat completion
GoModel->>Groq: forward request
Groq-->>GoModel: response + usage
GoModel-->>Bench: response
and
Bench->>LiteLLM: POST chat completion
LiteLLM->>Groq: forward request
Groq-->>LiteLLM: response + usage
LiteLLM-->>Bench: response
end
Bench->>Monitor: collect CPU/RSS samples
Bench->>Bench: record latency, tokens, errors
end
Bench->>Bench: aggregate metrics (p50,p95,p99,throughput,errors)
Bench-->>Orchestrator: write per-run JSON
Orchestrator->>Orchestrator: generate REPORT.md and charts
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
compare.sh,bench_main.go,plot_benchmark_charts.py) todocs/about/benchmark-tools/docs/about/benchmarks.mdxMintlify page with a "Reproduce it yourself" section including prerequisites, quick start, script reference table, and tuning instructionsTest plan
compare.shruns successfully with a validGROQ_API_KEYplot_benchmark_charts.pygenerates charts from JSON results🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
New Features