[Chore][CI]: K3 MP output token quantity tolerance by sammshen · Pull Request #3030 · LMCache/LMCache

sammshen · 2026-04-14T22:01:28Z

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Low Risk
Low risk: CI-only bash script changes that relax an assertion to account for known vLLM random-dataset token length drift; no production code or security-sensitive logic touched.

Overview
Updates the Buildkite vLLM bench verification scripts to treat total_input_tokens as approximate rather than requiring an exact match.

Both run-vllm-bench.sh variants now compute a 1% tolerance and use a shared check_input_tokens helper for LMCache and baseline runs, reducing spurious CI failures caused by vLLM’s random dataset re-tokenization drift.

^{Reviewed by Cursor Bugbot for commit a9f407e. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request introduces a 1% token tolerance in the vLLM benchmark verification scripts to handle minor drifts in token counts and refactors the verification logic into a helper function. A critical issue was identified in pyproject.toml where the torch version was updated to 2.11.0, a version that does not exist on PyPI, which will lead to build failures.

gemini-code-assist · 2026-04-14T22:05:20Z

    "setuptools>=77.0.3,<81.0.0",
    "setuptools_scm>=8",
-    "torch==2.10.0",
+    "torch==2.11.0",


The version 2.11.0 for torch does not exist on PyPI (the current stable versions are in the 2.x range, e.g., 2.5.1, 2.6.0). This appears to be a typo, likely intended to be 2.1.1 (given the previous value was 2.10.0, which was likely a typo for 2.1.0). Using a non-existent version will cause build failures when resolving dependencies for the build system.

vLLM's RandomDataset decodes and re-encodes generated token sequences (vllm/benchmarks/datasets.py) to avoid string-level drift, but the roundtrip is not guaranteed to preserve exact token counts — the benchmark itself only warns when token_mismatch != 0. The strict -eq assertion against NUM_PROMPTS * RANDOM_INPUT_LEN was failing with a 0.08% overage (500400 vs 500000) on Qwen3-14B after a vLLM upgrade. Switch to a ±1% tolerance check, which matches the benchmark's own semantics while still catching real workload-size regressions. Signed-off-by: Samuel Shen <slshen@uchciago.edu>

ApostaC

LGTM!

[CI] Allow 1% tolerance on vllm_bench total_input_tokens check vLLM's RandomDataset decodes and re-encodes generated token sequences (vllm/benchmarks/datasets.py) to avoid string-level drift, but the roundtrip is not guaranteed to preserve exact token counts — the benchmark itself only warns when token_mismatch != 0. The strict -eq assertion against NUM_PROMPTS * RANDOM_INPUT_LEN was failing with a 0.08% overage (500400 vs 500000) on Qwen3-14B after a vLLM upgrade. Switch to a ±1% tolerance check, which matches the benchmark's own semantics while still catching real workload-size regressions. Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

sammshen requested review from ApostaC, deng451e and hickeyma as code owners April 14, 2026 22:01

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

sammshen force-pushed the k3-mp-tolerance branch from d8d6c5a to d421250 Compare April 14, 2026 22:35

sammshen force-pushed the k3-mp-tolerance branch from d421250 to a9f407e Compare April 14, 2026 22:36

royyhuang approved these changes Apr 14, 2026

View reviewed changes

ApostaC approved these changes Apr 14, 2026

View reviewed changes

sammshen changed the title ~~[Chore][CI]: K3 MP tolerance~~ [Chore][CI]: K3 MP output token quantity tolerance Apr 14, 2026

sammshen enabled auto-merge (squash) April 14, 2026 22:40

github-actions Bot added the full Run comprehensive tests on this PR label Apr 14, 2026

sammshen merged commit 9cb6322 into LMCache:dev Apr 14, 2026
38 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chore][CI]: K3 MP output token quantity tolerance#3030

[Chore][CI]: K3 MP output token quantity tolerance#3030
sammshen merged 1 commit intoLMCache:devfrom
sammshen:k3-mp-tolerance

sammshen commented Apr 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 14, 2026

Uh oh!

ApostaC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sammshen commented Apr 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sammshen commented Apr 14, 2026 •

edited by cursor Bot

Loading