Skip to content

Commit a0a1d06

Browse files
docs(benchmarks): add reproducible benchmark scripts (#168)
* docs(benchmarks): add reproducible benchmark scripts * docs(benchmarks): trim reproduction package
1 parent 48f2b43 commit a0a1d06

11 files changed

Lines changed: 1507 additions & 0 deletions

File tree

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
__pycache__/
2+
output/
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# March 23, 2026 benchmark scripts
2+
3+
This directory is the reproducible entry point for the March 23, 2026
4+
GoModel vs LiteLLM benchmark refresh.
5+
6+
It is built around the benchmark workspace in
7+
`docs/2026-03-23_benchmark_scripts/gateway-comparison/`, then adds:
8+
9+
- a tested normalization step for the raw `hey` and streaming outputs,
10+
- chart generation for the blog assets,
11+
- a stable wrapper command for rerunning the benchmark and rebuilding the
12+
article artifacts.
13+
14+
## What this benchmark measures
15+
16+
This run uses the same localhost mock backend for both gateways, so the numbers
17+
measure gateway overhead rather than upstream model latency.
18+
19+
Workloads covered:
20+
21+
- `/v1/chat/completions` non-streaming
22+
- `/v1/chat/completions` streaming
23+
- `/v1/responses` non-streaming
24+
- `/v1/responses` streaming
25+
26+
The raw benchmark runner also records a direct baseline for chat traffic with no
27+
gateway in the middle.
28+
29+
## Prerequisites
30+
31+
- Go 1.26+
32+
- Python 3.10+
33+
- `hey`
34+
- `litellm`
35+
- Python packages: `matplotlib`, `numpy`
36+
37+
Install Python packages if needed:
38+
39+
```bash
40+
python3 -m pip install matplotlib numpy
41+
```
42+
43+
## Quick start
44+
45+
Run the raw benchmark and generate normalized artifacts:
46+
47+
```bash
48+
RUN_BENCHMARK=1 bash docs/2026-03-23_benchmark_scripts/run.sh
49+
```
50+
51+
If you already have a benchmark result directory, point the wrapper at it:
52+
53+
```bash
54+
RESULTS_DIR=/path/to/results bash docs/2026-03-23_benchmark_scripts/run.sh
55+
```
56+
57+
Copy the generated chart assets into the sibling Enterpilot blog repo:
58+
59+
```bash
60+
BLOG_PUBLIC_DIR=../enterpilot.io/blog/public/charts \
61+
bash docs/2026-03-23_benchmark_scripts/run.sh
62+
```
63+
64+
## Outputs
65+
66+
By default, generated artifacts land in `docs/2026-03-23_benchmark_scripts/output/`:
67+
68+
- `benchmark_summary.json`: normalized machine-readable metrics
69+
- `charts/gomodel-vs-litellm-march-2026-dashboard.png`
70+
- `charts/gomodel-vs-litellm-march-2026-throughput.png`
71+
- `charts/gomodel-vs-litellm-march-2026-latency.png`
72+
- `charts/gomodel-vs-litellm-march-2026-memory.png`
73+
- `charts/gomodel-vs-litellm-march-2026-speedup.png`
74+
75+
## Notes
76+
77+
- The raw benchmark runner lives in `docs/2026-03-23_benchmark_scripts/gateway-comparison/run-benchmark.sh`.
78+
- The normalization step exists because raw shell summaries are easy to drift or
79+
misparse; the parser in this directory is covered by unit tests with inline
80+
sample fixtures, so the repo does not need to carry benchmark result dumps.
81+
- These results are a point-in-time localhost benchmark, not a universal claim
82+
about every deployment shape.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
gomodel-bin
2+
mock-backend/mock-server
3+
stream-bench/stream-bench
4+
results/
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
server:
2+
port: "8081"
3+
master_key: ""
4+
body_size_limit: "10M"
5+
swagger_enabled: false
6+
pprof_enabled: false
7+
enable_passthrough_routes: false
8+
9+
cache:
10+
model:
11+
refresh_interval: 86400
12+
local:
13+
cache_dir: "/tmp/gomodel-bench-cache"
14+
15+
storage:
16+
type: "sqlite"
17+
sqlite:
18+
path: "/tmp/gomodel-bench.db"
19+
20+
logging:
21+
enabled: false
22+
23+
usage:
24+
enabled: false
25+
26+
metrics:
27+
enabled: false
28+
29+
admin:
30+
endpoints_enabled: false
31+
ui_enabled: false
32+
33+
http:
34+
timeout: 60
35+
response_header_timeout: 60
36+
37+
resilience:
38+
retry:
39+
max_retries: 0
40+
circuit_breaker:
41+
failure_threshold: 999
42+
43+
providers:
44+
openai:
45+
type: openai
46+
api_key: "sk-bench-test-key"
47+
base_url: "http://localhost:9999/v1"
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
model_list:
2+
- model_name: "gpt-4o-mini"
3+
litellm_params:
4+
model: "openai/gpt-4o-mini"
5+
api_key: "sk-bench-test-key"
6+
api_base: "http://localhost:9999/v1"
7+
8+
general_settings:
9+
master_key: null
10+
disable_spend_logs: true
11+
12+
litellm_settings:
13+
num_retries: 0
14+
request_timeout: 60
15+
drop_params: true

0 commit comments

Comments
 (0)