[CI] Add CI test for CB#2900
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new integration test suite for CacheBlend, featuring Buildkite pipeline definitions, environment setup scripts, a disaggregated prefill/decode proxy server, and a multi-document QA benchmark. The reviewer identified several improvement opportunities, including replacing a text-based path reference with a proper symbolic link, avoiding the ':latest' tag in CI images for reproducibility, and adhering to the repository's style guide regarding import placement. Further suggestions include refactoring the proxy server to avoid global variables and magic numbers, and making hardcoded tool paths in shell scripts configurable to reduce brittleness.
93ef4a1 to
7b4044c
Compare
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
| @@ -0,0 +1,289 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| "${TEST_PYTHON}" benchmarks/multi_doc_qa/shuffle_doc_qa.py \ | ||
| --num-documents "${SHUFFLE_NUM_DOCUMENTS}" \ | ||
| --document-length "${SHUFFLE_DOCUMENT_LENGTH}" \ | ||
| --output-len "${SHUFFLE_OUTPUT_LEN}"; then |
There was a problem hiding this comment.
Missing wait for proxy server before benchmark starts
High Severity
The CacheBlend proxy is started in the background (step 5) but the benchmark (step 6) runs immediately without waiting for the proxy to become ready. The benchmark's first action is client.models.list() which hits the proxy's /v1/models endpoint — if the proxy hasn't finished starting (uvicorn + lifespan), this fails with a connection error. A wait_for_server "$SERVICE_PORT" call is needed between steps 5 and 6, consistent with how all vLLM instances are awaited in step 4.
| echo "[FAIL] shuffle_doc_qa exceeded BENCHMARK_TIMEOUT_SEC=${BENCHMARK_TIMEOUT_SEC}s" | ||
| else | ||
| echo "[FAIL] shuffle_doc_qa exited with code ${rc}" | ||
| fi |
There was a problem hiding this comment.
Timeout exit code never captured due to bash $? semantics
Medium Severity
rc=$? on line 296 is inside the then block of if ! timeout ...; then. In bash, after if ! cmd, $? reflects the negated result (always 0 when the body is entered), not the original exit code from timeout. So rc is always 0, the [[ "$rc" -eq 124 ]] check on line 297 can never be true, and the timeout-specific diagnostic message is dead code. On a real timeout, the misleading message "exited with code 0" is shown instead.
| ) | ||
|
|
||
| for chunk in chat_completion: | ||
| chunk_message = parse_chunk_output(chunk.choices[0]) |
There was a problem hiding this comment.
Missing empty chunk.choices guard causes potential IndexError
Medium Severity
chunk.choices[0] is accessed without first checking that choices is non-empty. Streaming chat completions can yield chunks with an empty choices list (e.g., usage-reporting chunks), which would cause an IndexError. The sibling multi_doc_qa.py in the same directory explicitly guards against this with if not chunk.choices: continue.
* [CI] ci blend test Signed-off-by: deng451e <838677410@qq.com> * correct path Signed-off-by: deng451e <838677410@qq.com> * fix gpt oss encoder issue Signed-off-by: deng451e <838677410@qq.com> * update model path Signed-off-by: deng451e <838677410@qq.com> --------- Signed-off-by: deng451e <838677410@qq.com>
* [CI] ci blend test Signed-off-by: deng451e <838677410@qq.com> * correct path Signed-off-by: deng451e <838677410@qq.com> * fix gpt oss encoder issue Signed-off-by: deng451e <838677410@qq.com> * update model path Signed-off-by: deng451e <838677410@qq.com> --------- Signed-off-by: deng451e <838677410@qq.com>


What this PR does / why we need it:
• Adds CI coverage for CB to prevent regressions.
• Adds a shuffled document benchmark to test non-prefix chunk matching and recomputation.
Special notes for your reviewers:
If applicable:
Note
Medium Risk
Medium risk because it introduces new CI orchestration (GPU pods, dynamic ports, background processes) and new proxy/benchmark code that could make CI flaky, but it doesn’t change production runtime paths.
Overview
Adds a new Buildkite Blend (CacheBlend) CI job that runs on a 2×GPU K8s pod (
tensormesh/cacheblend:latest) and uploads artifacts (build_*.log).Introduces
setup-blend-env.shto do per-job setup: GPU health precheck, create/reuse/workspace/.venv, install the latest vLLM nightly wheel (with a stable fallback), install LMCache editable into both the image venv and test venv, and pintiktokenencodings via a localTIKTOKEN_ENCODINGS_BASE.Adds a Blend run harness (
run.sh+scripts/run-blend-test.sh) that starts the LMCache blend server, launches configurable pools of prefiller/decoder vLLM instances with dynamic free-port selection and GPU assignment, runs a new FastAPI disagg proxy (proxy.py) that waits on KV-cache telemetry before forwarding to decoders, then executes a new shuffled multi-doc QA benchmark (benchmarks/multi_doc_qa/shuffle_doc_qa.py) and fails the job if logs contain error/traceback/fatal patterns.Written by Cursor Bugbot for commit b64bc64. This will update automatically on new commits. Configure here.