Skip to content

K3 CI Refactor#2663

Merged
sammshen merged 30 commits intoLMCache:devfrom
sammshen:ci-refactor
Mar 3, 2026
Merged

K3 CI Refactor#2663
sammshen merged 30 commits intoLMCache:devfrom
sammshen:ci-refactor

Conversation

@sammshen
Copy link
Copy Markdown
Contributor

@sammshen sammshen commented Mar 1, 2026

Rewrites four tests (.buildkite/k3_tests/ with a new k3 based infra .buildkite/k3_harness/).

Benefits:

  1. better resource scheduling (multiple concurrent tasks in comprehensive can run at once)
  2. unified env setup (install nightly vllm and lmcache from source with wheels and no docker builds)
  3. k3 pods make resource cleanup guaranteed

Samuel Shen added 18 commits March 1, 2026 02:50
K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
… socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the continuous integration system by migrating existing tests to a new K3s-based infrastructure. This change aims to enhance the efficiency and reliability of CI pipelines through improved resource management, standardized environment provisioning, and robust cleanup mechanisms. The new setup provides a more scalable and maintainable foundation for running various test types, from correctness checks to performance benchmarks, by leveraging Kubernetes capabilities for isolated and ephemeral test environments.

Highlights

  • New K3s CI Infrastructure: Introduced a new K3s-based CI harness (.buildkite/k3_harness/) to replace existing test infrastructure, enabling more efficient and isolated test execution.
  • Improved Resource Scheduling: The new infrastructure allows for better resource scheduling, enabling multiple concurrent tasks to run efficiently within comprehensive test suites.
  • Unified Environment Setup: Implemented a unified environment setup that installs nightly vLLM and LMCache from source using wheels, eliminating the need for Docker builds within test jobs.
  • Guaranteed Resource Cleanup: Leveraged K3s pods to ensure guaranteed resource cleanup after each test job, preventing resource leaks and improving CI stability.
  • Refactored Test Pipelines: Rewrote four existing test suites (comprehensive, correctness, integration, multiprocess) to utilize the new K3s harness, including detailed Buildkite pipeline definitions and setup instructions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .buildkite/k3_harness/README.md
    • Added a comprehensive README detailing the K3s CI Harness, including prerequisites, setup, Buildkite integration, per-job environment, shared volumes, GPU allocation, CI base image, and teardown procedures.
  • .buildkite/k3_harness/ci-base.Dockerfile
    • Added a Dockerfile for the CI base image, configured with CUDA, Python, uv, and build dependencies, and dynamically detects GPU compute capability for TORCH_CUDA_ARCH_LIST.
  • .buildkite/k3_harness/install-agent-stack.sh
    • Added a script to install Buildkite agent-stack-k8s, including the creation of Kubernetes secrets for GitHub credentials to enable HTTPS repository access.
  • .buildkite/k3_harness/setup-cluster.sh
    • Added an idempotent script to set up K3s, the NVIDIA GPU Operator, and build/import the CI base image into K3s containerd, along with creating necessary host volume directories.
  • .buildkite/k3_harness/setup-env.sh
    • Added a script for per-job environment setup within CI pods, installing vLLM nightly from wheels and LMCache from source.
  • .buildkite/k3_harness/smoke-test.sh
    • Added a script to verify the correct functioning of K3s and the GPU Operator by scheduling and running a pod that executes nvidia-smi.
  • .buildkite/k3_harness/teardown.sh
    • Added a script to gracefully remove the K3s stack components, including agent-stack-k8s, GPU Operator, and K3s itself, while preserving host data volumes.
  • .buildkite/k3_harness/values.yaml
    • Added a reference Helm values file for agent-stack-k8s, serving as documentation for configuration parameters.
  • .buildkite/k3_tests/README.md
    • Added a README explaining the structure of K8s test pipelines, detailing required files, Buildkite UI setup, trigger strategies, and guidelines for adding new tests.
  • .buildkite/k3_tests/common_scripts/helpers.sh
    • Added shared Bash helper functions for K3s test scripts, including utilities for finding free TCP ports, waiting for server readiness, and cleaning up background processes.
  • .buildkite/k3_tests/comprehensive/BK_WEB_SETUP.md
    • Added Buildkite Web UI setup instructions for comprehensive tests, specifying GitHub trigger filters and details for nightly scheduled baseline uploads.
  • .buildkite/k3_tests/comprehensive/buildkite-pipeline.yml
    • Added a Buildkite pipeline definition for comprehensive tests, configured to upload the full pipeline definition from the repository.
  • .buildkite/k3_tests/comprehensive/pipeline.yml
    • Added the main pipeline definition for comprehensive tests, organizing steps into 1-GPU and 2-GPU groups, and including a step for uploading rolling baselines.
  • .buildkite/k3_tests/comprehensive/run.sh
    • Added an entrypoint script for comprehensive tests, responsible for environment setup and delegating to the script that runs a single test configuration.
  • .buildkite/k3_tests/comprehensive/scripts/run-single-config.sh
    • Added a script to execute a single comprehensive test configuration natively within a K8s pod, managing server startup, workload execution, and memory leak checks.
  • .buildkite/k3_tests/comprehensive/scripts/upload-baselines.sh
    • Added a script to finalize nightly baseline uploads, downloading artifacts, pruning old files, and pushing a single commit to the benchmarks-main branch.
  • .buildkite/k3_tests/correctness/BK_WEB_SETUP.md
    • Added Buildkite Web UI setup instructions for correctness tests, indicating they run on every push/PR.
  • .buildkite/k3_tests/correctness/buildkite-pipeline.yml
    • Added a Buildkite pipeline definition for correctness tests, configured to upload the full pipeline definition from the repository.
  • .buildkite/k3_tests/correctness/pipeline.yml
    • Added the main pipeline definition for correctness tests, designed to verify LMCache produces identical output to base vLLM.
  • .buildkite/k3_tests/correctness/run.sh
    • Added an entrypoint script for correctness tests, handling environment setup and delegating to the core correctness test script.
  • .buildkite/k3_tests/correctness/scripts/run-correctness.sh
    • Added a self-contained script for correctness tests, launching base vLLM and LMCache vLLM servers, running ShareGPT and 'man bash' tests, and comparing outputs.
  • .buildkite/k3_tests/integration/BK_WEB_SETUP.md
    • Added Buildkite Web UI setup instructions for integration tests, indicating they run on every push/PR.
  • .buildkite/k3_tests/integration/buildkite-pipeline.yml
    • Added a Buildkite pipeline definition for integration tests, configured to upload the full pipeline definition from the repository.
  • .buildkite/k3_tests/integration/pipeline.yml
    • Added the main pipeline definition for integration tests, focusing on starting vLLM with LMCache directly and testing CPU and disk backends via the OpenAI API.
  • .buildkite/k3_tests/integration/run.sh
    • Added an entrypoint script for integration tests, setting up the environment and delegating to the core integration test script.
  • .buildkite/k3_tests/integration/scripts/run-integration.sh
    • Added a script for integration tests, launching vLLM with LMCache for CPU and disk backends, and sending requests to verify caching behavior.
  • .buildkite/k3_tests/multiprocess/BK_WEB_SETUP.md
    • Added Buildkite Web UI setup instructions for multiprocess tests, including specific trigger labels.
  • .buildkite/k3_tests/multiprocess/buildkite-pipeline.yml
    • Added a Buildkite pipeline definition for multiprocess tests, configured to upload the full pipeline definition from the repository.
  • .buildkite/k3_tests/multiprocess/pipeline.yml
    • Added the main pipeline definition for multiprocess tests, outlining steps for LMCache MP server, vLLM, and vLLM baseline, requiring 2 GPUs.
  • .buildkite/k3_tests/multiprocess/run.sh
    • Added an entrypoint script for multiprocess tests, setting up the environment and delegating to the main multiprocess test orchestrator.
  • .buildkite/k3_tests/multiprocess/scripts/cleanup.sh
    • Added a script to clean up background processes launched during multiprocess tests, ensuring all PIDs are terminated and logs are collected.
  • .buildkite/k3_tests/multiprocess/scripts/launch-processes.sh
    • Added a script to launch LMCache MP server, vLLM with LMCache, and a vLLM baseline as native background processes, allocating GPUs and managing PIDs.
  • .buildkite/k3_tests/multiprocess/scripts/run-lm-eval.sh
    • Added a script to run lm_eval workload tests against vLLM servers, performing two runs to verify LMCache caching behavior and output consistency.
  • .buildkite/k3_tests/multiprocess/scripts/run-long-doc-qa.sh
    • Added a script to run long_doc_qa workload tests, comparing performance between LMCache-enabled and baseline vLLM, and verifying against defined thresholds.
  • .buildkite/k3_tests/multiprocess/scripts/run-mp-test.sh
    • Added the main orchestrator script for multiprocess tests, coordinating the launch of servers, waiting for readiness, and executing lm_eval, vllm bench, and long_doc_qa workloads.
  • .buildkite/k3_tests/multiprocess/scripts/wait-for-servers.sh
    • Added a script to wait for both LMCache-enabled vLLM and baseline vLLM servers to become ready, with a configurable timeout.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sammshen sammshen requested a review from ApostaC March 1, 2026 11:13
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CI infrastructure to a K3s-based system, aiming for improved resource scheduling, a unified environment, and more reliable cleanup. However, critical security vulnerabilities were identified in the newly added shell scripts, including shell command injection via unsanitized variables in yq and bash -c commands, and potential secret exposure by passing tokens through command-line arguments and URLs. Addressing these security concerns is paramount, especially since these scripts process configuration files that could be manipulated in a pull request. Additionally, general issues were found, such as missing system dependencies (yq, jq) in the base Docker image and an incorrect Helm release name in the teardown script.

source .buildkite/k3_harness/setup-env.sh

# Install test utilities (yq for YAML parsing, jq for JSON, openai/pandas/matplotlib for benchmarks)
uv pip install yq jq openai pandas matplotlib 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This command attempts to install yq and jq using uv pip, but they are not Python packages and cannot be installed this way. This will fail (though the error is suppressed by || true), and the subsequent test script will fail because yq and jq are not found. These dependencies should be installed in the base Docker image via apt-get.

Suggested change
uv pip install yq jq openai pandas matplotlib 2>/dev/null || true
uv pip install openai pandas matplotlib 2>/dev/null || true

alloc=$(yq -er '.["docker-decoder"]["alloc-port"]' "$cfg_file" 2>/dev/null || echo "7400")

# Inject PD-specific env vars into docker sections
prefiller_docker=$(echo "$prefiller_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The proxy variable, which is extracted from a configuration file, is interpolated directly into a shell command string for yq. This allows for shell command injection if the configuration file contains malicious values. An attacker could exploit this by submitting a pull request with a modified configuration file.

Suggested change
prefiller_docker=$(echo "$prefiller_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")
prefiller_docker=$(echo "$prefiller_docker" | yq -y --arg proxy "$proxy" '. + {"env": (.env + ["LMCACHE_PD_PROXY_PORT=" + $proxy])}')


# Inject PD-specific env vars into docker sections
prefiller_docker=$(echo "$prefiller_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PROXY_PORT=$proxy\"])}")
decoder_docker=$(echo "$decoder_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PEER_INIT_PORT=$init\", \"LMCACHE_PD_PEER_ALLOC_PORT=$alloc\"])}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The init and alloc variables are interpolated directly into a shell command string for yq, leading to a potential shell command injection vulnerability similar to the one found on line 131.

Suggested change
decoder_docker=$(echo "$decoder_docker" | yq -y ". + {\"env\": (.env + [\"LMCACHE_PD_PEER_INIT_PORT=$init\", \"LMCACHE_PD_PEER_ALLOC_PORT=$alloc\"])}")
decoder_docker=$(echo "$decoder_docker" | yq -y --arg init "$init" --arg alloc "$alloc" '. + {"env": (.env + ["LMCACHE_PD_PEER_INIT_PORT=" + $init, "LMCACHE_PD_PEER_ALLOC_PORT=" + $alloc])}')

reply=$(yq -er '.docker1["reply-port"]' "$cfg_file" 2>/dev/null || echo "8400")

# Inject controller URLs
docker1=$(echo "$docker1" | yq -y ". + {\"env\": (.env + [\"LMCACHE_CONTROLLER_PULL_URL=localhost:$pull\", \"LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply\", \"UCX_TLS=tcp\"])}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The pull and reply variables are interpolated into a yq command string, creating a shell command injection vulnerability.

Suggested change
docker1=$(echo "$docker1" | yq -y ". + {\"env\": (.env + [\"LMCACHE_CONTROLLER_PULL_URL=localhost:$pull\", \"LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply\", \"UCX_TLS=tcp\"])}")
docker1=$(echo "$docker1" | yq -y --arg pull "$pull" --arg reply "$reply" '. + {"env": (.env + ["LMCACHE_CONTROLLER_PULL_URL=localhost:" + $pull, "LMCACHE_CONTROLLER_REPLY_URL=localhost:" + $reply, "UCX_TLS=tcp"])}')

# Build workload JSON (merge workload section with model, strip non-CLI fields)
# Fields like expected-latency-gain are used by the checking logic, not long_doc_qa.py.
# "completion" -> "completions" rename to match the argparse flag.
workload_yaml="$(yq "(.workload * {\"model\": \"$model\"}) | del(.type) | del(.[\"expected-latency-gain\"]) | if .completion then .completions = .completion | del(.completion) else . end" "$cfg_file")"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The model variable is interpolated into a yq command string, which can lead to command injection if the model name in the configuration file contains shell metacharacters or yq filter delimiters.

Suggested change
workload_yaml="$(yq "(.workload * {\"model\": \"$model\"}) | del(.type) | del(.[\"expected-latency-gain\"]) | if .completion then .completions = .completion | del(.completion) else . end" "$cfg_file")"
workload_yaml="$(yq --arg model "$model" '(.workload * {"model": $model}) | del(.type) | del(.["expected-latency-gain"]) | if .completion then .completions = .completion | del(.completion) else . end' "$cfg_file")"

Comment on lines +15 to +17
&& apt-get install -y --no-install-recommends \
ccache software-properties-common git curl sudo \
python3 python3-dev python3-venv python3-pip tzdata libxcb1-dev \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comprehensive tests rely on yq and jq for parsing YAML and JSON files. These tools are not installed in this base image, which will cause test failures. They should be added to the apt-get install command.

    && apt-get install -y --no-install-recommends \
        ccache software-properties-common git curl sudo yq jq \
        python3 python3-dev python3-venv python3-pip tzdata libxcb1-dev \

Comment thread .buildkite/k3_harness/teardown.sh Outdated
Comment on lines +11 to +14
if helm status buildkite-agent -n buildkite &>/dev/null; then
echo "→ Removing agent-stack-k8s..."
helm uninstall buildkite-agent -n buildkite --wait
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Helm release for the Buildkite agent is named agent-stack-k8s in install-agent-stack.sh, but this script uses buildkite-agent to check the status and uninstall it. This will cause the teardown for the agent to fail. The release name should be consistent.

Suggested change
if helm status buildkite-agent -n buildkite &>/dev/null; then
echo "→ Removing agent-stack-k8s..."
helm uninstall buildkite-agent -n buildkite --wait
fi
if helm status agent-stack-k8s -n buildkite &>/dev/null; then
echo "→ Removing agent-stack-k8s..."
helm uninstall agent-stack-k8s -n buildkite --wait
fi

# - pod-spec-patch: injects GITHUB_TOKEN into job containers for push operations
helm upgrade --install agent-stack-k8s oci://ghcr.io/buildkite/helm/agent-stack-k8s \
--namespace buildkite --create-namespace \
--set agentToken="${TOKEN}" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The Buildkite agent token is passed to helm upgrade using the --set flag, which can expose the token in the process list (e.g., via ps aux) to other users on the system. It is more secure to pass sensitive values using environment variables, secret files, or by referencing an existing Kubernetes secret. Additionally, for better readability and maintainability, consider moving the JSON configuration passed to --set-json into a temporary YAML file and using helm upgrade --values <file>.

local port="${1:-8000}"
while [ "$port" -lt 65536 ]; do
if ! lsof -iTCP:"$port" -sTCP:LISTEN >/dev/null 2>&1 &&
! timeout 1 bash -c "</dev/tcp/127.0.0.1/${port}" 2>/dev/null; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The find_free_port function is vulnerable to command injection because the port variable is interpolated directly into a bash -c command string. Although currently called with hardcoded values, this utility function is inherently unsafe if used with any external input.

Suggested change
! timeout 1 bash -c "</dev/tcp/127.0.0.1/${port}" 2>/dev/null; then
if ! lsof -iTCP:"$port" -sTCP:LISTEN >/dev/null 2>&1 &&
! timeout 1 bash -c "</dev/tcp/127.0.0.1/$((port))" 2>/dev/null; then

if [[ -n "${GITHUB_TOKEN:-}" ]]; then
# Extract owner/repo from any URL format (SSH or HTTPS)
REPO_PATH="$(echo "$ORIGIN_URL" | sed -E 's|.*github\.com[:/]||' | sed 's/\.git$//')"
PUSH_URL="https://x-access-token:${GITHUB_TOKEN}@github.com/${REPO_PATH}.git"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The GITHUB_TOKEN is embedded directly into the PUSH_URL. This can lead to the token being leaked if the URL is logged or if the script is run with shell tracing enabled (set -x). It is safer to use a credential helper or a .netrc file to provide credentials to git.

GPU_MEMORY_GB=$((GPU_MEMORY_MB / 1024))
echo "Detected GPU memory: ${GPU_MEMORY_GB}GB (${GPU_MEMORY_MB}MB)"

if [ "$GPU_MEMORY_GB" -gt 100 ]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to 90

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's put it online for a few days and see if what will happen

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
echo "Detected GPU memory: ${GPU_MEMORY_GB}GB (${GPU_MEMORY_MB}MB)"

if [ "$GPU_MEMORY_GB" -gt 100 ]; then
if [ "$GPU_MEMORY_GB" -gt 90 ]; then
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to stay consistent with the new RTX 6000 96 GB in the k3_tests

@KuntaiDu
Copy link
Copy Markdown
Contributor

KuntaiDu commented Mar 3, 2026

I understand the high-level design (migrating to k3s) but I am not sure about the concrete details. Do we have any designs for CI that I can refer to?

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@sammshen
Copy link
Copy Markdown
Contributor Author

sammshen commented Mar 3, 2026

@KuntaiDu thanks for the suggestion! Just added a pretty concise ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@sammshen sammshen enabled auto-merge (squash) March 3, 2026 04:58
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 3, 2026
@sammshen sammshen merged commit e133932 into LMCache:dev Mar 3, 2026
45 of 48 checks passed
oferki pushed a commit to oferki/LMCache that referenced this pull request Mar 3, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Ofer Kiselov Nahman <ofer.kiselovnahman@weka.io>
oferki pushed a commit to oferki/LMCache that referenced this pull request Mar 3, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
hlin99 pushed a commit to hlin99/LMCache that referenced this pull request Mar 4, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* Add smoke test for new yotta-lab queues

* Add K3s CI harness and test pipelines

K8s-based CI infrastructure using K3s + NVIDIA GPU Operator + agent-stack-k8s:
- k3_harness/: cluster setup, env setup, base image, teardown scripts
- k3_tests/: comprehensive, correctness, integration, multiprocess pipelines

* forgot to add target queue

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix README

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix container name

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change image pull policy

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix relative paths

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* non-container integration run script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rewrite scripts

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* rolling 5-day baseline, HTTPS auth, priority scheduling, fix mem leak socket

- Rolling baselines: nightly writes date-stamped <feature>-YYYYMMDD.json,
  PR builds compare against worst-case (max) across 5-day window
- upload-baselines.sh finalize step collects artifacts, prunes old files,
  single commit to benchmarks-main
- Switch from SSH key to GITHUB_TOKEN (HTTPS) for repo checkout and push
- Priority 1 for 2-GPU steps (pd, p2p, multiprocess) so they schedule first
- Fix memory leak check: override LMCACHE_INTERNAL_API_SERVER_SOCKET_PATH_PREFIX
  to include port (replicates old Docker volume mount path mapping)
- Fix correctness: replace col -b with sed for man page formatting

* fix mp ci

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change priority

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* change installation back to editable

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix p2p mem check

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* speed up mp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* relative thresholds for mp long_doc_qa, parallel vllm startup

* set both thresholds to 10%

* skip mem leak check for p2p

* revert to sequential vllm startup, --master-port doesnt help

* fix parallel vllm startup by unsetting VLLM_PORT env var

* add integration nightly docs, fix teardown helm name, use yq --arg

* Remove health checks

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* parallellize integration tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* loosen MP test

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* reduce gpu util

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix correctness in MP

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add ARCHITECTURE.md

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add local_cpu_mla.yaml test back

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants