K3 CI Refactor by sammshen · Pull Request #2663 · LMCache/LMCache

sammshen · 2026-03-01T11:13:20Z

Rewrites four tests (.buildkite/k3_tests/ with a new k3 based infra .buildkite/k3_harness/).

Benefits:

better resource scheduling (multiple concurrent tasks in comprehensive can run at once)
unified env setup (install nightly vllm and lmcache from source with wheels and no docker builds)
k3 pods make resource cleanup guaranteed

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

… socket - Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json, PR builds compare against worst-case (max) across 5-day window - upload-baselines.sh finalize step collects artifacts, prunes old files, single commit to benchmarks-main - Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push - Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first - Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX to include port (replicates old Docker volume mount path mapping) - Fix correctness: replace col -b with sed for man page formatting

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

gemini-code-assist · 2026-03-01T11:13:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the continuous integration system by migrating existing tests to a new K3s-based infrastructure. This change aims to enhance the efficiency and reliability of CI pipelines through improved resource management, standardized environment provisioning, and robust cleanup mechanisms. The new setup provides a more scalable and maintainable foundation for running various test types, from correctness checks to performance benchmarks, by leveraging Kubernetes capabilities for isolated and ephemeral test environments.

Highlights

New K3s CI Infrastructure: Introduced a new K3s-based CI harness (.buildkite/k3_harness/) to replace existing test infrastructure, enabling more efficient and isolated test execution.
Improved Resource Scheduling: The new infrastructure allows for better resource scheduling, enabling multiple concurrent tasks to run efficiently within comprehensive test suites.
Unified Environment Setup: Implemented a unified environment setup that installs nightly vLLM and LMCache from source using wheels, eliminating the need for Docker builds within test jobs.
Guaranteed Resource Cleanup: Leveraged K3s pods to ensure guaranteed resource cleanup after each test job, preventing resource leaks and improving CI stability.
Refactored Test Pipelines: Rewrote four existing test suites (comprehensive, correctness, integration, multiprocess) to utilize the new K3s harness, including detailed Buildkite pipeline definitions and setup instructions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.buildkite/k3_harness/README.md
- Added a comprehensive README detailing the K3s CI Harness, including prerequisites, setup, Buildkite integration, per-job environment, shared volumes, GPU allocation, CI base image, and teardown procedures.
.buildkite/k3_harness/ci-base.Dockerfile
- Added a Dockerfile for the CI base image, configured with CUDA, Python, uv, and build dependencies, and dynamically detects GPU compute capability for TORCH_CUDA_ARCH_LIST.
.buildkite/k3_harness/install-agent-stack.sh
- Added a script to install Buildkite agent-stack-k8s, including the creation of Kubernetes secrets for GitHub credentials to enable HTTPS repository access.
.buildkite/k3_harness/setup-cluster.sh
- Added an idempotent script to set up K3s, the NVIDIA GPU Operator, and build/import the CI base image into K3s containerd, along with creating necessary host volume directories.
.buildkite/k3_harness/setup-env.sh
- Added a script for per-job environment setup within CI pods, installing vLLM nightly from wheels and LMCache from source.
.buildkite/k3_harness/smoke-test.sh
- Added a script to verify the correct functioning of K3s and the GPU Operator by scheduling and running a pod that executes nvidia-smi.
.buildkite/k3_harness/teardown.sh
- Added a script to gracefully remove the K3s stack components, including agent-stack-k8s, GPU Operator, and K3s itself, while preserving host data volumes.
.buildkite/k3_harness/values.yaml
- Added a reference Helm values file for agent-stack-k8s, serving as documentation for configuration parameters.
.buildkite/k3_tests/README.md
- Added a README explaining the structure of K8s test pipelines, detailing required files, Buildkite UI setup, trigger strategies, and guidelines for adding new tests.
.buildkite/k3_tests/common_scripts/helpers.sh
- Added shared Bash helper functions for K3s test scripts, including utilities for finding free TCP ports, waiting for server readiness, and cleaning up background processes.
.buildkite/k3_tests/comprehensive/BK_WEB_SETUP.md
- Added Buildkite Web UI setup instructions for comprehensive tests, specifying GitHub trigger filters and details for nightly scheduled baseline uploads.
.buildkite/k3_tests/comprehensive/buildkite-pipeline.yml
- Added a Buildkite pipeline definition for comprehensive tests, configured to upload the full pipeline definition from the repository.
.buildkite/k3_tests/comprehensive/pipeline.yml
- Added the main pipeline definition for comprehensive tests, organizing steps into 1-GPU and 2-GPU groups, and including a step for uploading rolling baselines.
.buildkite/k3_tests/comprehensive/run.sh
- Added an entrypoint script for comprehensive tests, responsible for environment setup and delegating to the script that runs a single test configuration.
.buildkite/k3_tests/comprehensive/scripts/run-single-config.sh
- Added a script to execute a single comprehensive test configuration natively within a K8s pod, managing server startup, workload execution, and memory leak checks.
.buildkite/k3_tests/comprehensive/scripts/upload-baselines.sh
- Added a script to finalize nightly baseline uploads, downloading artifacts, pruning old files, and pushing a single commit to the benchmarks-main branch.
.buildkite/k3_tests/correctness/BK_WEB_SETUP.md
- Added Buildkite Web UI setup instructions for correctness tests, indicating they run on every push/PR.
.buildkite/k3_tests/correctness/buildkite-pipeline.yml
- Added a Buildkite pipeline definition for correctness tests, configured to upload the full pipeline definition from the repository.
.buildkite/k3_tests/correctness/pipeline.yml
- Added the main pipeline definition for correctness tests, designed to verify LMCache produces identical output to base vLLM.
.buildkite/k3_tests/correctness/run.sh
- Added an entrypoint script for correctness tests, handling environment setup and delegating to the core correctness test script.
.buildkite/k3_tests/correctness/scripts/run-correctness.sh
- Added a self-contained script for correctness tests, launching base vLLM and LMCache vLLM servers, running ShareGPT and 'man bash' tests, and comparing outputs.
.buildkite/k3_tests/integration/BK_WEB_SETUP.md
- Added Buildkite Web UI setup instructions for integration tests, indicating they run on every push/PR.
.buildkite/k3_tests/integration/buildkite-pipeline.yml
- Added a Buildkite pipeline definition for integration tests, configured to upload the full pipeline definition from the repository.
.buildkite/k3_tests/integration/pipeline.yml
- Added the main pipeline definition for integration tests, focusing on starting vLLM with LMCache directly and testing CPU and disk backends via the OpenAI API.
.buildkite/k3_tests/integration/run.sh
- Added an entrypoint script for integration tests, setting up the environment and delegating to the core integration test script.
.buildkite/k3_tests/integration/scripts/run-integration.sh
- Added a script for integration tests, launching vLLM with LMCache for CPU and disk backends, and sending requests to verify caching behavior.
.buildkite/k3_tests/multiprocess/BK_WEB_SETUP.md
- Added Buildkite Web UI setup instructions for multiprocess tests, including specific trigger labels.
.buildkite/k3_tests/multiprocess/buildkite-pipeline.yml
- Added a Buildkite pipeline definition for multiprocess tests, configured to upload the full pipeline definition from the repository.
.buildkite/k3_tests/multiprocess/pipeline.yml
- Added the main pipeline definition for multiprocess tests, outlining steps for LMCache MP server, vLLM, and vLLM baseline, requiring 2 GPUs.
.buildkite/k3_tests/multiprocess/run.sh
- Added an entrypoint script for multiprocess tests, setting up the environment and delegating to the main multiprocess test orchestrator.
.buildkite/k3_tests/multiprocess/scripts/cleanup.sh
- Added a script to clean up background processes launched during multiprocess tests, ensuring all PIDs are terminated and logs are collected.
.buildkite/k3_tests/multiprocess/scripts/launch-processes.sh
- Added a script to launch LMCache MP server, vLLM with LMCache, and a vLLM baseline as native background processes, allocating GPUs and managing PIDs.
.buildkite/k3_tests/multiprocess/scripts/run-lm-eval.sh
- Added a script to run lm_eval workload tests against vLLM servers, performing two runs to verify LMCache caching behavior and output consistency.
.buildkite/k3_tests/multiprocess/scripts/run-long-doc-qa.sh
- Added a script to run long_doc_qa workload tests, comparing performance between LMCache-enabled and baseline vLLM, and verifying against defined thresholds.
.buildkite/k3_tests/multiprocess/scripts/run-mp-test.sh
- Added the main orchestrator script for multiprocess tests, coordinating the launch of servers, waiting for readiness, and executing lm_eval, vllm bench, and long_doc_qa workloads.
.buildkite/k3_tests/multiprocess/scripts/wait-for-servers.sh
- Added a script to wait for both LMCache-enabled vLLM and baseline vLLM servers to become ready, with a configurable timeout.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the CI infrastructure to a K3s-based system, aiming for improved resource scheduling, a unified environment, and more reliable cleanup. However, critical security vulnerabilities were identified in the newly added shell scripts, including shell command injection via unsanitized variables in yq and bash -c commands, and potential secret exposure by passing tokens through command-line arguments and URLs. Addressing these security concerns is paramount, especially since these scripts process configuration files that could be manipulated in a pull request. Additionally, general issues were found, such as missing system dependencies (yq, jq) in the base Docker image and an incorrect Helm release name in the teardown script.

gemini-code-assist · 2026-03-01T11:16:12Z

+source .buildkite/k3_harness/setup-env.sh
+
+# Install test utilities (yq for YAML parsing, jq for JSON, openai/pandas/matplotlib for benchmarks)
+uv pip install yq jq openai pandas matplotlib 2>/dev/null || true


This command attempts to install yq and jq using uv pip, but they are not Python packages and cannot be installed this way. This will fail (though the error is suppressed by || true), and the subsequent test script will fail because yq and jq are not found. These dependencies should be installed in the base Docker image via apt-get.

Suggested change

uv pip install yq jq openai pandas matplotlib 2>/dev/null || true

uv pip install openai pandas matplotlib 2>/dev/null || true

gemini-code-assist · 2026-03-01T11:16:12Z

+    alloc=$(yq -er '.["docker-decoder"]["alloc-port"]' "$cfg_file" 2>/dev/null || echo "7400")
+
+    # Inject PD-specific env vars into docker sections
+    prefiller_docker=$(echo "$prefiller_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")


The proxy variable, which is extracted from a configuration file, is interpolated directly into a shell command string for yq. This allows for shell command injection if the configuration file contains malicious values. An attacker could exploit this by submitting a pull request with a modified configuration file.

Suggested change

prefiller_docker=$(echo "$prefiller_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")

prefiller_docker=$(echo "$prefiller_docker" | yq -y --arg proxy "$proxy" '. + {"env": (.env + ["LMCACHE_PD_PROXY_PORT=" + $proxy])}')

gemini-code-assist · 2026-03-01T11:16:13Z

+
+    # Inject PD-specific env vars into docker sections
+    prefiller_docker=$(echo "$prefiller_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")
+    decoder_docker=$(echo "$decoder_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PEER_INIT_PORT=$init\", \"LMCACHE_PD_PEER_ALLOC_PORT=$alloc\"])}")


The init and alloc variables are interpolated directly into a shell command string for yq, leading to a potential shell command injection vulnerability similar to the one found on line 131.

Suggested change

decoder_docker=$(echo "$decoder_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PEER_INIT_PORT=$init\", \"LMCACHE_PD_PEER_ALLOC_PORT=$alloc\"])}")

decoder_docker=$(echo "$decoder_docker" | yq -y --arg init "$init" --arg alloc "$alloc" '. + {"env": (.env + ["LMCACHE_PD_PEER_INIT_PORT=" + $init, "LMCACHE_PD_PEER_ALLOC_PORT=" + $alloc])}')

gemini-code-assist · 2026-03-01T11:16:13Z

+    reply=$(yq -er '.docker1["reply-port"]' "$cfg_file" 2>/dev/null || echo "8400")
+
+    # Inject controller URLs
+    docker1=$(echo "$docker1" | yq -y ". + {\"env\": (.env + [\"LMCACHE_CONTROLLER_PULL_URL=localhost:$pull\", \"LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply\", \"UCX_TLS=tcp\"])}")


The pull and reply variables are interpolated into a yq command string, creating a shell command injection vulnerability.

Suggested change

docker1=$(echo "$docker1" | yq -y ". + {\"env\": (.env + [\"LMCACHE_CONTROLLER_PULL_URL=localhost:$pull\", \"LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply\", \"UCX_TLS=tcp\"])}")

docker1=$(echo "$docker1" | yq -y --arg pull "$pull" --arg reply "$reply" '. + {"env": (.env + ["LMCACHE_CONTROLLER_PULL_URL=localhost:" + $pull, "LMCACHE_CONTROLLER_REPLY_URL=localhost:" + $reply, "UCX_TLS=tcp"])}')

gemini-code-assist · 2026-03-01T11:16:13Z

+    # Build workload JSON (merge workload section with model, strip non-CLI fields)
+    # Fields like expected-latency-gain are used by the checking logic, not long_doc_qa.py.
+    # "completion" -> "completions" rename to match the argparse flag.
+    workload_yaml="$(yq "(.workload * {\"model\": \"$model\"}) | del(.type) | del(.[\"expected-latency-gain\"]) | if .completion then .completions = .completion | del(.completion) else . end" "$cfg_file")"


The model variable is interpolated into a yq command string, which can lead to command injection if the model name in the configuration file contains shell metacharacters or yq filter delimiters.

Suggested change

workload_yaml="$(yq "(.workload * {\"model\": \"$model\"}) | del(.type) | del(.[\"expected-latency-gain\"]) | if .completion then .completions = .completion | del(.completion) else . end" "$cfg_file")"

workload_yaml="$(yq --arg model "$model" '(.workload * {"model": $model}) | del(.type) | del(.["expected-latency-gain"]) | if .completion then .completions = .completion | del(.completion) else . end' "$cfg_file")"

gemini-code-assist · 2026-03-01T11:16:13Z

+    && apt-get install -y --no-install-recommends \
+        ccache software-properties-common git curl sudo \
+        python3 python3-dev python3-venv python3-pip tzdata libxcb1-dev \


The comprehensive tests rely on yq and jq for parsing YAML and JSON files. These tools are not installed in this base image, which will cause test failures. They should be added to the apt-get install command.

&& apt-get install -y --no-install-recommends \ ccache software-properties-common git curl sudo yq jq \ python3 python3-dev python3-venv python3-pip tzdata libxcb1-dev \

gemini-code-assist · 2026-03-01T11:16:13Z

+if helm status buildkite-agent -n buildkite &>/dev/null; then
+    echo "→ Removing agent-stack-k8s..."
+    helm uninstall buildkite-agent -n buildkite --wait
+fi


The Helm release for the Buildkite agent is named agent-stack-k8s in install-agent-stack.sh, but this script uses buildkite-agent to check the status and uninstall it. This will cause the teardown for the agent to fail. The release name should be consistent.

Suggested change

if helm status buildkite-agent -n buildkite &>/dev/null; then

echo "→ Removing agent-stack-k8s..."

helm uninstall buildkite-agent -n buildkite --wait

fi

if helm status agent-stack-k8s -n buildkite &>/dev/null; then

echo "→ Removing agent-stack-k8s..."

helm uninstall agent-stack-k8s -n buildkite --wait

fi

gemini-code-assist · 2026-03-01T11:16:13Z

+# - pod-spec-patch: injects GITHUB_TOKEN into job containers for push operations
+helm upgrade --install agent-stack-k8s oci://ghcr.io/buildkite/helm/agent-stack-k8s \
+    --namespace buildkite --create-namespace \
+    --set agentToken="${TOKEN}" \


The Buildkite agent token is passed to helm upgrade using the --set flag, which can expose the token in the process list (e.g., via ps aux) to other users on the system. It is more secure to pass sensitive values using environment variables, secret files, or by referencing an existing Kubernetes secret. Additionally, for better readability and maintainability, consider moving the JSON configuration passed to --set-json into a temporary YAML file and using helm upgrade --values <file>.

gemini-code-assist · 2026-03-01T11:16:13Z

+    local port="${1:-8000}"
+    while [ "$port" -lt 65536 ]; do
+        if ! lsof -iTCP:"$port" -sTCP:LISTEN >/dev/null 2>&1 &&
+           ! timeout 1 bash -c "</dev/tcp/127.0.0.1/${port}" 2>/dev/null; then


The find_free_port function is vulnerable to command injection because the port variable is interpolated directly into a bash -c command string. Although currently called with hardcoded values, this utility function is inherently unsafe if used with any external input.

Suggested change

! timeout 1 bash -c "</dev/tcp/127.0.0.1/${port}" 2>/dev/null; then

if ! lsof -iTCP:"$port" -sTCP:LISTEN >/dev/null 2>&1 &&

! timeout 1 bash -c "</dev/tcp/127.0.0.1/$((port))" 2>/dev/null; then

gemini-code-assist · 2026-03-01T11:16:13Z

+if [[ -n "${GITHUB_TOKEN:-}" ]]; then
+    # Extract owner/repo from any URL format (SSH or HTTPS)
+    REPO_PATH="$(echo "$ORIGIN_URL" | sed -E 's|.*github\.com[:/]||' | sed 's/\.git$//')"
+    PUSH_URL="https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO_PATH}.git"


The GITHUB_TOKEN is embedded directly into the PUSH_URL. This can lead to the token being leaked if the URL is logged or if the script is run with shell tracing enabled (set -x). It is safer to use a credential helper or a .netrc file to provide credentials to git.

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

ApostaC · 2026-03-01T23:07:23Z

+GPU_MEMORY_GB=$((GPU_MEMORY_MB / 1024))
+echo "Detected GPU memory: ${GPU_MEMORY_GB}GB (${GPU_MEMORY_MB}MB)"
+
+if [ "$GPU_MEMORY_GB" -gt 100 ]; then


change this to 90

ApostaC

LGTM! Let's put it online for a few days and see if what will happen

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen · 2026-03-03T00:15:53Z

 echo "Detected GPU memory: ${GPU_MEMORY_GB}GB (${GPU_MEMORY_MB}MB)"

-if [ "$GPU_MEMORY_GB" -gt 100 ]; then
+if [ "$GPU_MEMORY_GB" -gt 90 ]; then


to stay consistent with the new RTX 6000 96 GB in the k3_tests

KuntaiDu · 2026-03-03T00:26:05Z

I understand the high-level design (migrating to k3s) but I am not sure about the concrete details. Do we have any designs for CI that I can refer to?

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen · 2026-03-03T01:37:54Z

@KuntaiDu thanks for the suggestion! Just added a pretty concise ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* Add smoke test for new yotta-lab queues * Add K3s CI harness and test pipelines K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines * forgot to add target queue Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix README Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix container name Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change image pull policy Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix relative paths Signed-off-by: Samuel Shen <slshen@uchciago.edu> * non-container integration run script Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rewrite scripts Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket - Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json, PR builds compare against worst-case (max) across 5-day window - upload-baselines.sh finalize step collects artifacts, prunes old files, single commit to benchmarks-main - Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push - Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first - Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX to include port (replicates old Docker volume mount path mapping) - Fix correctness: replace col -b with sed for man page formatting * fix mp ci Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change priority Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change installation back to editable Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix p2p mem check Signed-off-by: Samuel Shen <slshen@uchciago.edu> * speed up mp Signed-off-by: Samuel Shen <slshen@uchciago.edu> * relative thresholds for mp long_doc_qa, parallel vllm startup * set both thresholds to 10% * skip mem leak check for p2p * revert to sequential vllm startup, --master-port doesnt help * fix parallel vllm startup by unsetting VLLM_PORT env var * add integration nightly docs, fix teardown helm name, use yq --arg * Remove health checks Signed-off-by: Samuel Shen <slshen@uchciago.edu> * parallellize integration tests Signed-off-by: Samuel Shen <slshen@uchciago.edu> * loosen MP test Signed-off-by: Samuel Shen <slshen@uchciago.edu> * reduce gpu util Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness in MP Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add ARCHITECTURE.md Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add local_cpu_mla.yaml test back Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu> Signed-off-by: Ofer Kiselov Nahman <ofer.kiselovnahman@weka.io>

* Add smoke test for new yotta-lab queues * Add K3s CI harness and test pipelines K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines * forgot to add target queue Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix README Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix container name Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change image pull policy Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix relative paths Signed-off-by: Samuel Shen <slshen@uchciago.edu> * non-container integration run script Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rewrite scripts Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket - Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json, PR builds compare against worst-case (max) across 5-day window - upload-baselines.sh finalize step collects artifacts, prunes old files, single commit to benchmarks-main - Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push - Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first - Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX to include port (replicates old Docker volume mount path mapping) - Fix correctness: replace col -b with sed for man page formatting * fix mp ci Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change priority Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change installation back to editable Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix p2p mem check Signed-off-by: Samuel Shen <slshen@uchciago.edu> * speed up mp Signed-off-by: Samuel Shen <slshen@uchciago.edu> * relative thresholds for mp long_doc_qa, parallel vllm startup * set both thresholds to 10% * skip mem leak check for p2p * revert to sequential vllm startup, --master-port doesnt help * fix parallel vllm startup by unsetting VLLM_PORT env var * add integration nightly docs, fix teardown helm name, use yq --arg * Remove health checks Signed-off-by: Samuel Shen <slshen@uchciago.edu> * parallellize integration tests Signed-off-by: Samuel Shen <slshen@uchciago.edu> * loosen MP test Signed-off-by: Samuel Shen <slshen@uchciago.edu> * reduce gpu util Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness in MP Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add ARCHITECTURE.md Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add local_cpu_mla.yaml test back Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

* Add smoke test for new yotta-lab queues * Add K3s CI harness and test pipelines K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines * forgot to add target queue Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix README Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix container name Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change image pull policy Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix relative paths Signed-off-by: Samuel Shen <slshen@uchciago.edu> * non-container integration run script Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rewrite scripts Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket - Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json, PR builds compare against worst-case (max) across 5-day window - upload-baselines.sh finalize step collects artifacts, prunes old files, single commit to benchmarks-main - Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push - Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first - Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX to include port (replicates old Docker volume mount path mapping) - Fix correctness: replace col -b with sed for man page formatting * fix mp ci Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change priority Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change installation back to editable Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix p2p mem check Signed-off-by: Samuel Shen <slshen@uchciago.edu> * speed up mp Signed-off-by: Samuel Shen <slshen@uchciago.edu> * relative thresholds for mp long_doc_qa, parallel vllm startup * set both thresholds to 10% * skip mem leak check for p2p * revert to sequential vllm startup, --master-port doesnt help * fix parallel vllm startup by unsetting VLLM_PORT env var * add integration nightly docs, fix teardown helm name, use yq --arg * Remove health checks Signed-off-by: Samuel Shen <slshen@uchciago.edu> * parallellize integration tests Signed-off-by: Samuel Shen <slshen@uchciago.edu> * loosen MP test Signed-off-by: Samuel Shen <slshen@uchciago.edu> * reduce gpu util Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness in MP Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add ARCHITECTURE.md Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add local_cpu_mla.yaml test back Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

* Add smoke test for new yotta-lab queues * Add K3s CI harness and test pipelines K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines * forgot to add target queue Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix README Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix container name Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change image pull policy Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix relative paths Signed-off-by: Samuel Shen <slshen@uchciago.edu> * non-container integration run script Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rewrite scripts Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket - Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json, PR builds compare against worst-case (max) across 5-day window - upload-baselines.sh finalize step collects artifacts, prunes old files, single commit to benchmarks-main - Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push - Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first - Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX to include port (replicates old Docker volume mount path mapping) - Fix correctness: replace col -b with sed for man page formatting * fix mp ci Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change priority Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change installation back to editable Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix p2p mem check Signed-off-by: Samuel Shen <slshen@uchciago.edu> * speed up mp Signed-off-by: Samuel Shen <slshen@uchciago.edu> * relative thresholds for mp long_doc_qa, parallel vllm startup * set both thresholds to 10% * skip mem leak check for p2p * revert to sequential vllm startup, --master-port doesnt help * fix parallel vllm startup by unsetting VLLM_PORT env var * add integration nightly docs, fix teardown helm name, use yq --arg * Remove health checks Signed-off-by: Samuel Shen <slshen@uchciago.edu> * parallellize integration tests Signed-off-by: Samuel Shen <slshen@uchciago.edu> * loosen MP test Signed-off-by: Samuel Shen <slshen@uchciago.edu> * reduce gpu util Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness in MP Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add ARCHITECTURE.md Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add local_cpu_mla.yaml test back Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

* Add smoke test for new yotta-lab queues * Add K3s CI harness and test pipelines K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines * forgot to add target queue Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix README Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix container name Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change image pull policy Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix relative paths Signed-off-by: Samuel Shen <slshen@uchciago.edu> * non-container integration run script Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rewrite scripts Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness Signed-off-by: Samuel Shen <slshen@uchciago.edu> * rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket - Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json, PR builds compare against worst-case (max) across 5-day window - upload-baselines.sh finalize step collects artifacts, prunes old files, single commit to benchmarks-main - Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push - Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first - Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX to include port (replicates old Docker volume mount path mapping) - Fix correctness: replace col -b with sed for man page formatting * fix mp ci Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change priority Signed-off-by: Samuel Shen <slshen@uchciago.edu> * change installation back to editable Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix p2p mem check Signed-off-by: Samuel Shen <slshen@uchciago.edu> * speed up mp Signed-off-by: Samuel Shen <slshen@uchciago.edu> * relative thresholds for mp long_doc_qa, parallel vllm startup * set both thresholds to 10% * skip mem leak check for p2p * revert to sequential vllm startup, --master-port doesnt help * fix parallel vllm startup by unsetting VLLM_PORT env var * add integration nightly docs, fix teardown helm name, use yq --arg * Remove health checks Signed-off-by: Samuel Shen <slshen@uchciago.edu> * parallellize integration tests Signed-off-by: Samuel Shen <slshen@uchciago.edu> * loosen MP test Signed-off-by: Samuel Shen <slshen@uchciago.edu> * reduce gpu util Signed-off-by: Samuel Shen <slshen@uchciago.edu> * fix correctness in MP Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add ARCHITECTURE.md Signed-off-by: Samuel Shen <slshen@uchciago.edu> * add local_cpu_mla.yaml test back Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

Samuel Shen added 18 commits March 1, 2026 02:50

Add smoke test for new yotta-lab queues

4a73edf

Add K3s CI harness and test pipelines

cb619f0

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s: - k3_harness/: cluster setup, env setup, base image, teardown scripts - k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

forgot to add target queue

1c99783

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix README

17caf3b

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix container name

818449c

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

change image pull policy

2ae485e

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix relative paths

58cc276

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

non-container integration run script

3ac075f

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

rewrite scripts

6b87bc2

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix correctness

15f2f57

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix mp ci

371407e

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

change priority

0a5700f

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

change installation back to editable

1acbc11

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

fix p2p mem check

fc08320

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

speed up mp

f0fc9b2

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

relative thresholds for mp long_doc_qa, parallel vllm startup

2c29725

set both thresholds to 10%

ed32ab2

sammshen requested a review from ApostaC March 1, 2026 11:13

gemini-code-assist Bot reviewed Mar 1, 2026

View reviewed changes

sammshen force-pushed the ci-refactor branch from 7158e59 to ed32ab2 Compare March 1, 2026 11:22

Samuel Shen added 8 commits March 1, 2026 06:28

skip mem leak check for p2p

4e59de9

revert to sequential vllm startup, --master-port doesnt help

866206d

fix parallel vllm startup by unsetting VLLM_PORT env var

5030b48

add integration nightly docs, fix teardown helm name, use yq --arg

3a91336

Remove health checks

01a0da6

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

parallellize integration tests

fcbfd6a

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

Merge branch 'dev' of github-samuel:sammshen/LMCache into ci-refactor

76f3b20

loosen MP test

75724bc

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

ApostaC reviewed Mar 1, 2026

View reviewed changes

ApostaC approved these changes Mar 1, 2026

View reviewed changes

reduce gpu util

8d53dc9

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen requested review from Shaoting-Feng and kobe0938 March 1, 2026 23:29

fix correctness in MP

1177360

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen commented Mar 3, 2026

View reviewed changes

add ARCHITECTURE.md

f9d6ecc

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

Shaoting-Feng approved these changes Mar 3, 2026

View reviewed changes

add local_cpu_mla.yaml test back

2dca63b

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

sammshen mentioned this pull request Mar 3, 2026

[Correctness]: Avoid overwriting APC overlap #2671

Merged

sammshen enabled auto-merge (squash) March 3, 2026 04:58

github-actions Bot added the full Run comprehensive tests on this PR label Mar 3, 2026

sammshen merged commit e133932 into LMCache:dev Mar 3, 2026
45 of 48 checks passed

This was referenced Mar 4, 2026

[Roadmap] LMCache Roadmap for 2026 Q1 #2350

Open

[WARNING]: CI Migration (ignore failing k3- tests) #2684

Open

	uv pip install yq jq openai pandas matplotlib 2>/dev/null \|\| true
	uv pip install openai pandas matplotlib 2>/dev/null \|\| true

	prefiller_docker=$(echo "$prefiller_docker" \| yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")
	prefiller_docker=$(echo "$prefiller_docker" \| yq -y --arg proxy "$proxy" '. + {"env": (.env + ["LMCACHE_PD_PROXY_PORT=" + $proxy])}')

	decoder_docker=$(echo "$decoder_docker" \| yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PEER_INIT_PORT=$init\", \"LMCACHE_PD_PEER_ALLOC_PORT=$alloc\"])}")
	decoder_docker=$(echo "$decoder_docker" \| yq -y --arg init "$init" --arg alloc "$alloc" '. + {"env": (.env + ["LMCACHE_PD_PEER_INIT_PORT=" + $init, "LMCACHE_PD_PEER_ALLOC_PORT=" + $alloc])}')

	docker1=$(echo "$docker1" \| yq -y ". + {\"env\": (.env + [\"LMCACHE_CONTROLLER_PULL_URL=localhost:$pull\", \"LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply\", \"UCX_TLS=tcp\"])}")
	docker1=$(echo "$docker1" \| yq -y --arg pull "$pull" --arg reply "$reply" '. + {"env": (.env + ["LMCACHE_CONTROLLER_PULL_URL=localhost:" + $pull, "LMCACHE_CONTROLLER_REPLY_URL=localhost:" + $reply, "UCX_TLS=tcp"])}')

	workload_yaml="$(yq "(.workload * {\"model\": \"$model\"}) \| del(.type) \| del(.[\"expected-latency-gain\"]) \| if .completion then .completions = .completion \| del(.completion) else . end" "$cfg_file")"
	workload_yaml="$(yq --arg model "$model" '(.workload * {"model": $model}) \| del(.type) \| del(.["expected-latency-gain"]) \| if .completion then .completions = .completion \| del(.completion) else . end' "$cfg_file")"

	! timeout 1 bash -c "</dev/tcp/127.0.0.1/${port}" 2>/dev/null; then
	if ! lsof -iTCP:"$port" -sTCP:LISTEN >/dev/null 2>&1 &&
	! timeout 1 bash -c "</dev/tcp/127.0.0.1/$((port))" 2>/dev/null; then

Conversation

sammshen commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

KuntaiDu commented Mar 3, 2026

Uh oh!

sammshen commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants