|
| 1 | +# March 23, 2026 benchmark scripts |
| 2 | + |
| 3 | +This directory is the reproducible entry point for the March 23, 2026 |
| 4 | +GoModel vs LiteLLM benchmark refresh. |
| 5 | + |
| 6 | +It is built around the benchmark workspace in |
| 7 | +`docs/2026-03-23_benchmark_scripts/gateway-comparison/`, then adds: |
| 8 | + |
| 9 | +- a tested normalization step for the raw `hey` and streaming outputs, |
| 10 | +- chart generation for the blog assets, |
| 11 | +- a stable wrapper command for rerunning the benchmark and rebuilding the |
| 12 | + article artifacts. |
| 13 | + |
| 14 | +## What this benchmark measures |
| 15 | + |
| 16 | +This run uses the same localhost mock backend for both gateways, so the numbers |
| 17 | +measure gateway overhead rather than upstream model latency. |
| 18 | + |
| 19 | +Workloads covered: |
| 20 | + |
| 21 | +- `/v1/chat/completions` non-streaming |
| 22 | +- `/v1/chat/completions` streaming |
| 23 | +- `/v1/responses` non-streaming |
| 24 | +- `/v1/responses` streaming |
| 25 | + |
| 26 | +The raw benchmark runner also records a direct baseline for chat traffic with no |
| 27 | +gateway in the middle. |
| 28 | + |
| 29 | +## Prerequisites |
| 30 | + |
| 31 | +- Go 1.26+ |
| 32 | +- Python 3.10+ |
| 33 | +- `hey` |
| 34 | +- `litellm` |
| 35 | +- Python packages: `matplotlib`, `numpy` |
| 36 | + |
| 37 | +Install Python packages if needed: |
| 38 | + |
| 39 | +```bash |
| 40 | +python3 -m pip install matplotlib numpy |
| 41 | +``` |
| 42 | + |
| 43 | +## Quick start |
| 44 | + |
| 45 | +Run the raw benchmark and generate normalized artifacts: |
| 46 | + |
| 47 | +```bash |
| 48 | +RUN_BENCHMARK=1 bash docs/2026-03-23_benchmark_scripts/run.sh |
| 49 | +``` |
| 50 | + |
| 51 | +If you already have a benchmark result directory, point the wrapper at it: |
| 52 | + |
| 53 | +```bash |
| 54 | +RESULTS_DIR=/path/to/results bash docs/2026-03-23_benchmark_scripts/run.sh |
| 55 | +``` |
| 56 | + |
| 57 | +Copy the generated chart assets into the sibling Enterpilot blog repo: |
| 58 | + |
| 59 | +```bash |
| 60 | +BLOG_PUBLIC_DIR=../enterpilot.io/blog/public/charts \ |
| 61 | + bash docs/2026-03-23_benchmark_scripts/run.sh |
| 62 | +``` |
| 63 | + |
| 64 | +## Outputs |
| 65 | + |
| 66 | +By default, generated artifacts land in `docs/2026-03-23_benchmark_scripts/output/`: |
| 67 | + |
| 68 | +- `benchmark_summary.json`: normalized machine-readable metrics |
| 69 | +- `charts/gomodel-vs-litellm-march-2026-dashboard.png` |
| 70 | +- `charts/gomodel-vs-litellm-march-2026-throughput.png` |
| 71 | +- `charts/gomodel-vs-litellm-march-2026-latency.png` |
| 72 | +- `charts/gomodel-vs-litellm-march-2026-memory.png` |
| 73 | +- `charts/gomodel-vs-litellm-march-2026-speedup.png` |
| 74 | + |
| 75 | +## Notes |
| 76 | + |
| 77 | +- The raw benchmark runner lives in `docs/2026-03-23_benchmark_scripts/gateway-comparison/run-benchmark.sh`. |
| 78 | +- The normalization step exists because raw shell summaries are easy to drift or |
| 79 | + misparse; the parser in this directory is covered by unit tests with inline |
| 80 | + sample fixtures, so the repo does not need to carry benchmark result dumps. |
| 81 | +- These results are a point-in-time localhost benchmark, not a universal claim |
| 82 | + about every deployment shape. |
0 commit comments