[Bench]: Add Blend V2 Stress Test script#2885
Conversation
Add long_doc_permutator.py and README for stress testing the blend v2 server across 5 axes: context boundaries, eviction, chunk homogeneity, prefix domination, and concurrency.
There was a problem hiding this comment.
Code Review
This pull request introduces a new benchmark tool, long_doc_permutator.py, and its accompanying documentation to stress test the Blend Server V2 implementation. The tool evaluates performance across several axes, including context boundaries, eviction, chunk homogeneity, prefix domination, and concurrency. The review feedback identifies several violations of the repository's style guide, specifically regarding missing type hints and docstrings for new functions, as well as improper import practices and path construction logic.
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def write_resp(text: str): |
There was a problem hiding this comment.
The function write_resp is missing a return type hint and a docstring. According to the repository style guide (lines 24 and 25), all new public functions must have type hints and docstrings. Please add the -> None return type and a docstring explaining the function's purpose.
| def write_resp(text: str): | |
| def write_resp(text: str) -> None: |
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def relative_time(df: pd.DataFrame, start_time: float): |
There was a problem hiding this comment.
The function relative_time is missing a return type hint and a docstring, violating the repository style guide (lines 24 and 25). Please add the -> None return type and a docstring.
| def relative_time(df: pd.DataFrame, start_time: float): | |
| def relative_time(df: pd.DataFrame, start_time: float) -> None: |
| } | ||
|
|
||
|
|
||
| def print_results(df: pd.DataFrame, wall_time: float, label: str): |
There was a problem hiding this comment.
The function print_results is missing a return type hint and a docstring, violating the repository style guide (lines 24 and 25). Please add the -> None return type and a docstring.
| def print_results(df: pd.DataFrame, wall_time: float, label: str): | |
| def print_results(df: pd.DataFrame, wall_time: float, label: str) -> None: |
| print(f" Throughput : {total_tokens / wall_time:.2f} tok/s") | ||
|
|
||
|
|
||
| def plot_ttft_distribution(df: pd.DataFrame, filename: str = "ttft_distribution.png"): |
There was a problem hiding this comment.
The function plot_ttft_distribution is missing a return type hint, violating the repository style guide (line 24). Please add the -> None return type.
| def plot_ttft_distribution(df: pd.DataFrame, filename: str = "ttft_distribution.png"): | |
| def plot_ttft_distribution(df: pd.DataFrame, filename: str = "ttft_distribution.png") -> None: |
References
- All new functions must have type hints for arguments and return values. (link)
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| async def main(args): |
There was a problem hiding this comment.
The main function is missing type hints for its args parameter and its return value, as well as a docstring. This violates the repository style guide (lines 24 and 25). Please type args as argparse.Namespace, add the -> None return type, and include a docstring.
| async def main(args): | |
| async def main(args: argparse.Namespace) -> None: |
| parser.add_argument( | ||
| "--lmcache-workers", | ||
| type=int, | ||
| default=4, |
There was a problem hiding this comment.
The function create_argument_parser is missing a return type hint and a docstring, violating the repository style guide (lines 24 and 25). Please add the -> argparse.ArgumentParser return type and a docstring.
def create_argument_parser() -> argparse.ArgumentParser:| script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| sys.path.insert(0, script_dir) | ||
| # Third Party | ||
| from parse_lmcache_log import parse_log | ||
| from parse_lmcache_log import report as cache_report |
There was a problem hiding this comment.
Modifying sys.path at runtime to handle imports is fragile and goes against best practices. It also violates the project's import ordering conventions (style guide line 28), which require imports to be at the top of the file. The comment # Third Party is also incorrect for this local import.
Please refactor this to use a standard import mechanism. If parse_lmcache_log is a sibling script, consider making this directory a package or adjusting PYTHONPATH externally when running the benchmark.
References
- Imports should be ordered: Standard / Third Party / First Party / Local, and placed at the top of the file. (link)
| if args.output: | ||
| if args.output_dir and args.output_dir != ".": | ||
| OUTPUT_FILE = os.path.join(args.output_dir, args.output) | ||
| else: | ||
| OUTPUT_FILE = args.output |
There was a problem hiding this comment.
The logic for constructing the OUTPUT_FILE path can be simplified. os.path.join handles the case where args.output_dir is . correctly. You can reduce these lines to a single os.path.join call inside the if args.output: block for better readability.
if args.output:
OUTPUT_FILE = os.path.join(args.output_dir, args.output)
JiwaniZakir
left a comment
There was a problem hiding this comment.
In generate_vocab_pool, the pool variable is typed as set[str] to avoid duplicates, but the uniqueness is already guaranteed structurally: the suffix f"{word}{len(pool)}" uses the current pool size as a counter, so every generated word is inherently unique regardless of the random base. Using a set here adds unnecessary overhead for large --vocab-size values; a list with a simple counter would be both cleaner and more efficient.
The sentinel value --max-inflight-requests 0 meaning "flood all requests" is a subtle footgun — zero conventionally reads as "no concurrency allowed" rather than "unlimited." A value of -1 (or a dedicated --flood flag) would align better with common CLI conventions and avoid confusion when users scan the argument help text.
One missing stress axis worth considering: the README documents five axes but there's no test scenario combining a very small --vocab-size (e.g., 6) with a high --num-permutations to simultaneously stress both chunk collision and eviction, which seems like the most adversarial real-world case for the rolling-hash logic.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| with open(OUTPUT_FILE, "a") as f: | ||
| f.write(text) | ||
| else: | ||
| sys.stdout.write(text) |
There was a problem hiding this comment.
Multiple public functions missing return type hints
Low Severity
Several new public functions lack return type hints: write_resp, relative_time, print_results, main, and create_argument_parser. The project's coding conventions require all functions to have type hints for arguments and return values. main also lacks a type hint for its args parameter.
Additional Locations (2)
Triggered by project rule: LMCache Code Review Style Guide
| if len(ok) > 0: | ||
| total_tokens = ok["prompt_tokens"].sum() + ok["completion_tokens"].sum() | ||
| print(f" Throughput : {len(ok) / wall_time:.2f} req/s") | ||
| print(f" Throughput : {total_tokens / wall_time:.2f} tok/s") |
There was a problem hiding this comment.
Multiple public functions missing docstrings
Low Severity
Several new public functions lack docstrings: write_resp, relative_time, print_results, main, and create_argument_parser. The project's coding conventions require all public functions to have docstrings covering what the function does, its arguments, and return values.
Additional Locations (2)
Triggered by project rule: LMCache Code Review Style Guide
| sys.path.insert(0, script_dir) | ||
| # Third Party | ||
| from parse_lmcache_log import parse_log | ||
| from parse_lmcache_log import report as cache_report |
There was a problem hiding this comment.
Local import mislabeled as third-party
Low Severity
The parse_lmcache_log imports are labeled with a # Third Party section comment, but this is a local module loaded via sys.path.insert. Per the project's import ordering convention (Standard / Third Party / First Party / Local), this should use a # Local comment instead.
Triggered by project rule: LMCache Code Review Style Guide
| "max": float(s.max()), | ||
| "p95": float(s.quantile(0.95)), | ||
| "p99": float(s.quantile(0.99)), | ||
| "std": float(s.std()), |
There was a problem hiding this comment.
NaN from single-element std produces invalid JSON output
Low Severity
When exactly one request succeeds (e.g. --num-permutations 1), s.std() with default ddof=1 returns NaN. This NaN propagates through ttft_stats into the summary dict, and json.dumps(summary) emits a NaN literal, which is not valid JSON per the spec. Strict JSON parsers (e.g. jq) will reject the output.
Additional Locations (1)
|
now in #2937 |


Add long_doc_permutator.py and README for stress testing the blend v2 server across 5 axes: context boundaries, eviction, chunk homogeneity, prefix domination, and concurrency.
Note
Medium Risk
Adds a new async benchmark script that can generate high request volume/concurrency against a Blend v2 server and write artifacts; misuse or missing optional dependencies (e.g., LMCache log parser/module) could cause runtime failures or heavy load.
Overview
Adds a new
benchmarks/blend_v2permutation-based stress test to exercise Blend v2 KV reuse across context boundary orderings, eviction pressure, chunk-hash collision risk, prefix-dominated prompts, and concurrency.Introduces
long_doc_permutator.py, which generates synthetic system prompts/contexts, enumerates or samples context permutations, drives async streaming chat-completion requests with configurable concurrency, and writes results/plots plus a combinedsummary.txt(optionally attempting to parse an--lmcache-log). A new README documents the 5 stress axes and provides runnable example configurations.Written by Cursor Bugbot for commit 968a94e. This will update automatically on new commits. Configure here.