Skip to content

[Bench]: Add Blend V2 Stress Test script#2885

Closed
sammshen wants to merge 2 commits intoLMCache:devfrom
sammshen:blend-stress-tests
Closed

[Bench]: Add Blend V2 Stress Test script#2885
sammshen wants to merge 2 commits intoLMCache:devfrom
sammshen:blend-stress-tests

Conversation

@sammshen
Copy link
Copy Markdown
Contributor

@sammshen sammshen commented Mar 26, 2026

Add long_doc_permutator.py and README for stress testing the blend v2 server across 5 axes: context boundaries, eviction, chunk homogeneity, prefix domination, and concurrency.


Note

Medium Risk
Adds a new async benchmark script that can generate high request volume/concurrency against a Blend v2 server and write artifacts; misuse or missing optional dependencies (e.g., LMCache log parser/module) could cause runtime failures or heavy load.

Overview
Adds a new benchmarks/blend_v2 permutation-based stress test to exercise Blend v2 KV reuse across context boundary orderings, eviction pressure, chunk-hash collision risk, prefix-dominated prompts, and concurrency.

Introduces long_doc_permutator.py, which generates synthetic system prompts/contexts, enumerates or samples context permutations, drives async streaming chat-completion requests with configurable concurrency, and writes results/plots plus a combined summary.txt (optionally attempting to parse an --lmcache-log). A new README documents the 5 stress axes and provides runnable example configurations.

Written by Cursor Bugbot for commit 968a94e. This will update automatically on new commits. Configure here.

Add long_doc_permutator.py and README for stress testing the blend v2
server across 5 axes: context boundaries, eviction, chunk homogeneity,
prefix domination, and concurrency.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new benchmark tool, long_doc_permutator.py, and its accompanying documentation to stress test the Blend Server V2 implementation. The tool evaluates performance across several axes, including context boundaries, eviction, chunk homogeneity, prefix domination, and concurrency. The review feedback identifies several violations of the repository's style guide, specifically regarding missing type hints and docstrings for new functions, as well as improper import practices and path construction logic.

# ---------------------------------------------------------------------------


def write_resp(text: str):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function write_resp is missing a return type hint and a docstring. According to the repository style guide (lines 24 and 25), all new public functions must have type hints and docstrings. Please add the -> None return type and a docstring explaining the function's purpose.

Suggested change
def write_resp(text: str):
def write_resp(text: str) -> None:
References
  1. All new functions must have type hints for arguments and return values. (link)
  2. All new public functions must have docstrings. (link)

# ---------------------------------------------------------------------------


def relative_time(df: pd.DataFrame, start_time: float):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function relative_time is missing a return type hint and a docstring, violating the repository style guide (lines 24 and 25). Please add the -> None return type and a docstring.

Suggested change
def relative_time(df: pd.DataFrame, start_time: float):
def relative_time(df: pd.DataFrame, start_time: float) -> None:
References
  1. All new functions must have type hints for arguments and return values. (link)
  2. All new public functions must have docstrings. (link)

}


def print_results(df: pd.DataFrame, wall_time: float, label: str):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function print_results is missing a return type hint and a docstring, violating the repository style guide (lines 24 and 25). Please add the -> None return type and a docstring.

Suggested change
def print_results(df: pd.DataFrame, wall_time: float, label: str):
def print_results(df: pd.DataFrame, wall_time: float, label: str) -> None:
References
  1. All new functions must have type hints for arguments and return values. (link)
  2. All new public functions must have docstrings. (link)

print(f" Throughput : {total_tokens / wall_time:.2f} tok/s")


def plot_ttft_distribution(df: pd.DataFrame, filename: str = "ttft_distribution.png"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function plot_ttft_distribution is missing a return type hint, violating the repository style guide (line 24). Please add the -> None return type.

Suggested change
def plot_ttft_distribution(df: pd.DataFrame, filename: str = "ttft_distribution.png"):
def plot_ttft_distribution(df: pd.DataFrame, filename: str = "ttft_distribution.png") -> None:
References
  1. All new functions must have type hints for arguments and return values. (link)

# ---------------------------------------------------------------------------


async def main(args):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The main function is missing type hints for its args parameter and its return value, as well as a docstring. This violates the repository style guide (lines 24 and 25). Please type args as argparse.Namespace, add the -> None return type, and include a docstring.

Suggested change
async def main(args):
async def main(args: argparse.Namespace) -> None:
References
  1. All new functions must have type hints for arguments and return values. (link)
  2. All new public functions must have docstrings. (link)

parser.add_argument(
"--lmcache-workers",
type=int,
default=4,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function create_argument_parser is missing a return type hint and a docstring, violating the repository style guide (lines 24 and 25). Please add the -> argparse.ArgumentParser return type and a docstring.

def create_argument_parser() -> argparse.ArgumentParser:
References
  1. All new functions must have type hints for arguments and return values. (link)
  2. All new public functions must have docstrings. (link)

Comment on lines +520 to +524
script_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, script_dir)
# Third Party
from parse_lmcache_log import parse_log
from parse_lmcache_log import report as cache_report
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying sys.path at runtime to handle imports is fragile and goes against best practices. It also violates the project's import ordering conventions (style guide line 28), which require imports to be at the top of the file. The comment # Third Party is also incorrect for this local import.

Please refactor this to use a standard import mechanism. If parse_lmcache_log is a sibling script, consider making this directory a package or adjusting PYTHONPATH externally when running the benchmark.

References
  1. Imports should be ordered: Standard / Third Party / First Party / Local, and placed at the top of the file. (link)

Comment on lines +659 to +663
if args.output:
if args.output_dir and args.output_dir != ".":
OUTPUT_FILE = os.path.join(args.output_dir, args.output)
else:
OUTPUT_FILE = args.output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for constructing the OUTPUT_FILE path can be simplified. os.path.join handles the case where args.output_dir is . correctly. You can reduce these lines to a single os.path.join call inside the if args.output: block for better readability.

    if args.output:
        OUTPUT_FILE = os.path.join(args.output_dir, args.output)

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generate_vocab_pool, the pool variable is typed as set[str] to avoid duplicates, but the uniqueness is already guaranteed structurally: the suffix f"{word}{len(pool)}" uses the current pool size as a counter, so every generated word is inherently unique regardless of the random base. Using a set here adds unnecessary overhead for large --vocab-size values; a list with a simple counter would be both cleaner and more efficient.

The sentinel value --max-inflight-requests 0 meaning "flood all requests" is a subtle footgun — zero conventionally reads as "no concurrency allowed" rather than "unlimited." A value of -1 (or a dedicated --flood flag) would align better with common CLI conventions and avoid confusion when users scan the argument help text.

One missing stress axis worth considering: the README documents five axes but there's no test scenario combining a very small --vocab-size (e.g., 6) with a high --num-permutations to simultaneously stress both chunk collision and eviction, which seems like the most adversarial real-world case for the rolling-hash logic.

@sammshen sammshen mentioned this pull request Mar 31, 2026
1 task
@deng451e deng451e added the full Run comprehensive tests on this PR label Apr 1, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

with open(OUTPUT_FILE, "a") as f:
f.write(text)
else:
sys.stdout.write(text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple public functions missing return type hints

Low Severity

Several new public functions lack return type hints: write_resp, relative_time, print_results, main, and create_argument_parser. The project's coding conventions require all functions to have type hints for arguments and return values. main also lacks a type hint for its args parameter.

Additional Locations (2)
Fix in Cursor Fix in Web

Triggered by project rule: LMCache Code Review Style Guide

if len(ok) > 0:
total_tokens = ok["prompt_tokens"].sum() + ok["completion_tokens"].sum()
print(f" Throughput : {len(ok) / wall_time:.2f} req/s")
print(f" Throughput : {total_tokens / wall_time:.2f} tok/s")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple public functions missing docstrings

Low Severity

Several new public functions lack docstrings: write_resp, relative_time, print_results, main, and create_argument_parser. The project's coding conventions require all public functions to have docstrings covering what the function does, its arguments, and return values.

Additional Locations (2)
Fix in Cursor Fix in Web

Triggered by project rule: LMCache Code Review Style Guide

sys.path.insert(0, script_dir)
# Third Party
from parse_lmcache_log import parse_log
from parse_lmcache_log import report as cache_report
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local import mislabeled as third-party

Low Severity

The parse_lmcache_log imports are labeled with a # Third Party section comment, but this is a local module loaded via sys.path.insert. Per the project's import ordering convention (Standard / Third Party / First Party / Local), this should use a # Local comment instead.

Fix in Cursor Fix in Web

Triggered by project rule: LMCache Code Review Style Guide

"max": float(s.max()),
"p95": float(s.quantile(0.95)),
"p99": float(s.quantile(0.99)),
"std": float(s.std()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN from single-element std produces invalid JSON output

Low Severity

When exactly one request succeeds (e.g. --num-permutations 1), s.std() with default ddof=1 returns NaN. This NaN propagates through ttft_stats into the summary dict, and json.dumps(summary) emits a NaN literal, which is not valid JSON per the spec. Strict JSON parsers (e.g. jq) will reject the output.

Additional Locations (1)
Fix in Cursor Fix in Web

@sammshen
Copy link
Copy Markdown
Contributor Author

sammshen commented Apr 2, 2026

now in #2937

@sammshen sammshen closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants