[1/2] L2 CI: End to End Performance#2884
Conversation
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new performance test script, run-long-doc-qa-l2.sh, which evaluates LMCache's L2 caching capabilities by restarting the server with specific L2 configurations and verifying speedup thresholds. The main test runner, run-mp-test.sh, has been updated to include this new test step. Feedback focuses on improving script consistency by using python3, enhancing modularity by moving a large embedded Python heredoc into a standalone file, and ensuring the test suite continues executing subsequent tests even if the L2 test fails.
| GPU_DEVICE="${GPU_FOR_VLLM:-0}" | ||
|
|
||
| CUDA_VISIBLE_DEVICES="${GPU_DEVICE}" \ | ||
| python -m lmcache.v1.multiprocess.server \ |
There was a problem hiding this comment.
For consistency and to avoid potential issues in environments where python might point to Python 2, it's better to use python3 here. The rest of the script already uses python3 (e.g., lines 129, 150, 195).
| python -m lmcache.v1.multiprocess.server \ | |
| python3 -m lmcache.v1.multiprocess.server \ |
| python3 << EOF | ||
| import sys | ||
|
|
||
| def sf(val): | ||
| try: return float(val) | ||
| except: return None | ||
|
|
||
| bqt = sf("$baseline_query_ttft") | ||
| bqrt = sf("$baseline_query_round_time") | ||
| bwrt = sf("$baseline_warmup_round_time") | ||
| lqt = sf("$l2_query_ttft") | ||
| lqrt = sf("$l2_query_round_time") | ||
| lwrt = sf("$l2_warmup_round_time") | ||
|
|
||
| min_spd = float("$MIN_L2_SPEEDUP") | ||
| min_ttft = float("$MIN_L2_TTFT_SPEEDUP") | ||
| max_oh = float("$MAX_WARMUP_OVERHEAD") | ||
|
|
||
| failed = False | ||
|
|
||
| print("=" * 60) | ||
| print("L2 Performance Summary") | ||
| print("=" * 60) | ||
| print(f"{'Metric':<35} {'Baseline':>12} {'L2':>12}") | ||
| print("-" * 60) | ||
| for name, bv, lv in [ | ||
| ("query_ttft_per_prompt (s)", bqt, lqt), | ||
| ("query_round_time_per_prompt (s)", bqrt, lqrt), | ||
| ("warmup_round_time_per_prompt (s)", bwrt, lwrt), | ||
| ]: | ||
| bs = f"{bv:.4f}" if bv else "N/A" | ||
| ls = f"{lv:.4f}" if lv else "N/A" | ||
| print(f"{name:<35} {bs:>12} {ls:>12}") | ||
|
|
||
| print() | ||
| print("=" * 60) | ||
| print("Threshold Verification") | ||
| print("=" * 60) | ||
|
|
||
| # 1. L2 query round-time speedup | ||
| if lqrt and bqrt and lqrt > 0: | ||
| s = bqrt / lqrt | ||
| ok = s >= min_spd | ||
| print(f"[{'PASS' if ok else 'FAIL'}] L2 query speedup: {s:.2f}x (need >= {min_spd}x)") | ||
| if not ok: failed = True | ||
| else: | ||
| print("[FAIL] Cannot compute L2 query speedup"); failed = True | ||
|
|
||
| # 2. L2 TTFT speedup | ||
| if lqt and bqt and lqt > 0: | ||
| s = bqt / lqt | ||
| ok = s >= min_ttft | ||
| print(f"[{'PASS' if ok else 'FAIL'}] L2 TTFT speedup: {s:.2f}x (need >= {min_ttft}x)") | ||
| if not ok: failed = True | ||
| else: | ||
| print("[FAIL] Cannot compute L2 TTFT speedup"); failed = True | ||
|
|
||
| # 3. Warmup overhead | ||
| if lwrt and bwrt and bwrt > 0: | ||
| o = lwrt / bwrt | ||
| ok = o <= max_oh | ||
| print(f"[{'PASS' if ok else 'FAIL'}] Warmup overhead: {o:.2f}x (need <= {max_oh}x)") | ||
| if not ok: failed = True | ||
| else: | ||
| print("[FAIL] Cannot compute warmup overhead"); failed = True | ||
|
|
||
| print() | ||
| if failed: | ||
| print("[FAIL] L2 performance verification FAILED") | ||
| sys.exit(1) | ||
| else: | ||
| print("[PASS] All L2 performance thresholds passed") | ||
| EOF |
There was a problem hiding this comment.
This large Python script embedded using a heredoc is difficult to maintain, lint, and debug. For better code cleanliness and modularity, as per the repository's style guide, this logic should be moved into a separate Python script. The new script could then be called with the performance metrics and thresholds as command-line arguments.
References
- The style guide (lines 10-11, 61) emphasizes code cleanliness, modularity, and maintainability. Embedding a large script within another script goes against these principles, and the guide suggests that code that could be more modular should be fixed. (link)
| if ! "${SCRIPT_DIR}/run-long-doc-qa-l2.sh"; then | ||
| echo "long doc QA L2 test failed" | ||
| TEST_RESULT=1 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
The exit 1 on line 105 causes the entire test suite to terminate prematurely if this test fails. This is inconsistent with the error handling in other test steps in this script (e.g., for run-long-doc-qa.sh), which only set TEST_RESULT=1 and allow subsequent tests to run. To ensure all tests are executed and all failures are reported, the exit 1 should be removed.
| if ! "${SCRIPT_DIR}/run-long-doc-qa-l2.sh"; then | |
| echo "long doc QA L2 test failed" | |
| TEST_RESULT=1 | |
| exit 1 | |
| fi | |
| if ! "${SCRIPT_DIR}/run-long-doc-qa-l2.sh"; then | |
| echo "long doc QA L2 test failed" | |
| TEST_RESULT=1 | |
| fi |
|
unblocked by: #2886 |
|
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
What this PR does / why we need it:
Special notes for your reviewers:
If applicable: