[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install by sammshen · Pull Request #3093 · LMCache/LMCache

sammshen · 2026-04-21T06:05:21Z

The k3 integration tests have been red since 2026-04-21 ~04:00 UTC with:

ImportError: cannot import name 'GenerationConfig' from 'transformers'
(/opt/venv/lib/python3.12/site-packages/transformers/__init__.py)

at vllm/transformers_utils/config.py line 18. The failure surfaces 180s after the test starts as a generic "vLLM failed to start on port 8000 within 180s" in wait_for_server, and only then does the harness tail vllm.log to show the real traceback.

Root cause is that setup-env.sh declared the environment "ready" without exercising the CLI import chain that vllm serve runs at startup. The existing sequence was:

Install vLLM nightly
Probe from vllm.entrypoints.cli.main import main (auto-heal)
uv pip install -e . --no-build-isolation (LMCache install)
python -c "import vllm; import lmcache" (final probe)

Step 3 silently downgrades 9 transitive packages (opentelemetry-* 1.41->1.40, prometheus-client 0.25->0.24.1) to honor the caps in requirements/common.txt. Step 4 is the only post-install check, but plain import vllm doesn't pull vllm.entrypoints.cli.main -> vllm.config -> vllm.transformers_utils.config, so any CLI-chain breakage introduced by the downgrades slips through until the first vllm serve subprocess fails 180s later.

Fixes:

Extract the CLI import probe into a probe_vllm_cli function so the same check runs both during the auto-heal loop (pre-install) and as a hard probe after the LMCache install.
Add a post-install CLI probe that fails fast with the actual traceback and a full uv pip freeze if the env is broken, instead of letting the 180s test-harness timeout hide the real failure.
Snapshot uv pip freeze before and after uv pip install -e . and diff them, so the silent downgrades done by LMCache's pins are visible in the build log instead of having to be reconstructed from package-install stderr.

With this change, the current k3 failure mode surfaces in ~10s at setup time with a clear ImportError traceback and the exact package versions at fault, instead of a 180s port-wait timeout.

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Medium Risk
CI-only changes but they directly mutate the installed vllm package and alter dependency installation behavior, which could introduce new breakages or mask upstream issues if the patch/reinstall assumptions drift.

Overview
Improves K3 CI environment setup to fail fast when the vllm serve import chain is broken, instead of surfacing later as a 180s server-start timeout.

setup-env.sh now clears Python/uv caches before installs, forces a reinstall of key vllm/transformers-stack packages, patches the installed vllm CLI entrypoint to pre-import transformers (avoiding a background-thread race), and replaces the previous light import check with a vllm --help probe used both during auto-heal and after the LMCache editable install. It also snapshots/diffs uv pip freeze around the LMCache install and emits a diagnostic dump / full package list on failures to aid debugging.

^{Reviewed by Cursor Bugbot for commit b700632. Bugbot is set up for automated code reviews on this repo. Configure here.}

The k3 integration tests have been red since 2026-04-21 ~04:00 UTC with: ImportError: cannot import name 'GenerationConfig' from 'transformers' (/opt/venv/lib/python3.12/site-packages/transformers/__init__.py) at vllm/transformers_utils/config.py line 18. The failure surfaces 180s after the test starts as a generic "vLLM failed to start on port 8000 within 180s" in wait_for_server, and only then does the harness tail vllm.log to show the real traceback. Root cause is that setup-env.sh declared the environment "ready" without exercising the CLI import chain that `vllm serve` runs at startup. The existing sequence was: 1. Install vLLM nightly 2. Probe `from vllm.entrypoints.cli.main import main` (auto-heal) 3. `uv pip install -e . --no-build-isolation` (LMCache install) 4. `python -c "import vllm; import lmcache"` (final probe) Step 3 silently downgrades 9 transitive packages (opentelemetry-* 1.41->1.40, prometheus-client 0.25->0.24.1) to honor the caps in requirements/common.txt. Step 4 is the only post-install check, but plain `import vllm` doesn't pull vllm.entrypoints.cli.main -> vllm.config -> vllm.transformers_utils.config, so any CLI-chain breakage introduced by the downgrades slips through until the first `vllm serve` subprocess fails 180s later. Fixes: - Extract the CLI import probe into a `probe_vllm_cli` function so the same check runs both during the auto-heal loop (pre-install) and as a hard probe after the LMCache install. - Add a post-install CLI probe that fails fast with the actual traceback and a full `uv pip freeze` if the env is broken, instead of letting the 180s test-harness timeout hide the real failure. - Snapshot `uv pip freeze` before and after `uv pip install -e .` and diff them, so the silent downgrades done by LMCache's pins are visible in the build log instead of having to be reconstructed from package-install stderr. With this change, the current k3 failure mode surfaces in ~10s at setup time with a clear ImportError traceback and the exact package versions at fault, instead of a 180s port-wait timeout. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

gemini-code-assist

Code Review

This pull request enhances the CI environment setup script by introducing a probe_vllm_cli function to detect broken vLLM import chains early. It also adds environment snapshotting before and after the LMCache installation to track package changes and includes a post-install probe to ensure environment integrity. Feedback suggests avoiding fixed filenames in /tmp to prevent race conditions in shared CI environments and reusing existing environment snapshots for error reporting instead of re-running uv pip freeze on a potentially broken environment.

gemini-code-assist · 2026-04-21T06:07:08Z

+uv pip freeze | sort > /tmp/env-before-lmcache.txt
 uv pip install -e . --no-build-isolation
+uv pip freeze | sort > /tmp/env-after-lmcache.txt
+if ! diff -q /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt >/dev/null; then
+    echo "--- :warning: Packages changed during LMCache install"
+    diff /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt || true
+fi


Using fixed filenames in /tmp can lead to race conditions or permission conflicts if multiple CI jobs run concurrently on the same host (e.g., on a shared Buildkite agent). It is safer to use local files within the job's workspace (which is typically isolated per job in Buildkite) or use mktemp to ensure isolation.

Suggested change

uv pip freeze | sort > /tmp/env-before-lmcache.txt

uv pip install -e . --no-build-isolation

uv pip freeze | sort > /tmp/env-after-lmcache.txt

if ! diff -q /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt >/dev/null; then

echo "--- :warning: Packages changed during LMCache install"

diff /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt || true

fi

uv pip freeze | sort > env-before-lmcache.txt

uv pip install -e . --no-build-isolation

uv pip freeze | sort > env-after-lmcache.txt

if ! diff -q env-before-lmcache.txt env-after-lmcache.txt >/dev/null; then

echo "--- :warning: Packages changed during LMCache install"

diff env-before-lmcache.txt env-after-lmcache.txt || true

fi

gemini-code-assist · 2026-04-21T06:07:08Z

+    echo "--- Traceback ---" >&2
+    echo "$err" >&2
+    echo "--- Installed packages ---" >&2
+    uv pip freeze >&2 || true


Instead of running uv pip freeze again on a potentially broken environment, it is more reliable and efficient to display the snapshot already captured during the installation step. This avoids redundant processing and potential failures if the environment is in a severely degraded state.

Suggested change

uv pip freeze >&2 || true

cat env-after-lmcache.txt >&2

chunxiaozheng · 2026-04-21T06:07:41Z

@maobaolong could you help take a look?

maobaolong

lgtm

sammshen · 2026-04-21T06:19:37Z

once this PR passes CI, it will be fixed

…e CLI Two bugs in the last fix, both now addressed: 1. The probe did not exercise the failing import chain. `from vllm.entrypoints.cli.main import main` only resolves the `main` symbol; the problematic `import vllm.entrypoints.cli.benchmark.main` lives *inside* main()'s body and is only reached when the CLI is actually invoked. Build LMCache#2599 confirmed this: the post-install probe printed "vLLM CLI import chain OK post-install" and then `vllm serve` immediately failed with the same `ImportError: cannot import name 'GenerationConfig' from 'transformers'` that started this whole thread. Switch the probe to `vllm --help`, which runs main() as a subprocess end-to-end and walks the full vllm.entrypoints.cli.main -> vllm.entrypoints.cli.benchmark.main -> vllm.config -> vllm.transformers_utils.config chain. 2. Root cause of the env breakage: stale bytecode from base-image layers. The CI base image pre-installs packages from requirements/*.txt at image build time, which populates /opt/venv/.../<pkg>/__pycache__/*.pyc with mtimes from the image build. When setup-env.sh later runs `uv pip install -U vllm ...`, uv extracts the new wheel using the mtimes recorded in the wheel itself -- often *older* than the pre-existing .pyc. Python's import system compares .py vs .pyc mtimes and keeps using the older .pyc, so Python executes 5.5.0's bytecode for transformers/__init__.py even though the .py on disk is 5.5.4 -- and 5.5.0's _import_structure differs enough from 5.5.4's that GenerationConfig doesn't get exposed at the top level. The result is the ImportError observed only on the CI pods (base image cached), not on any fresh venv. Wipe /opt/venv/**/__pycache__ after all upgrades so Python is forced to re-byte-compile from the current .py sources on first import. This is mechanically idempotent and cheap (a few seconds on first-use recompile, no network). This combination fixes the observed CI failure and, more importantly, closes the class of failure: any future base-image -> per-job upgrade that would otherwise leave stale bytecode behind now self-heals, and any future import-chain break that wouldn't have tripped the old probe now fails fast with the real traceback. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

…state Build LMCache#2599 with the `vllm --help` probe in place proved the env is already broken immediately after `uv pip install -U vllm ...`, before LMCache install and before any post-install eviction: the auto-heal loop trips the "non-ModuleNotFoundError" branch with the exact ImportError traceback from vllm/transformers_utils/config.py:18. The same install recipe replayed in a fresh local venv (including a full requirements/cuda.txt-based base-image emulation) always succeeds. The divergence is therefore filesystem state on the K3s pod coming out of the cached base image, not something we can fix by regenerating bytecode after the fact. Apply the minimum-blast-radius fix: tell uv to uninstall-and- reinstall the full vllm serve import chain (transformers, tokenizers, huggingface-hub, safetensors, vllm) even when it thinks the existing install is already up to date. `--reinstall-package` implies `--refresh-package`, so the wheels come down fresh and are extracted over freshly cleared paths. Combined with a pre-install `uv cache clean` + `__pycache__` wipe and the existing post-install eviction, this puts the import chain on guaranteed-clean ground regardless of what the base image had. Cost is a few extra seconds of re-download; the base image stays the same. If a future job hits the same failure, the setup still fails fast with the full traceback (via the pre-install auto-heal loop), pointing at whatever upstream break is actually at fault. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

…gging Build LMCache#2652 with --reinstall-package on the whole import chain still fails with the same ImportError: freshly extracted transformers 5.5.4 wheel, GenerationConfig still missing from the top-level namespace according to Python, while an identical recipe in any fresh local venv produces a working transformers import. I'm out of remote-debuggable hypotheses for why this is CI-specific. Add a diagnostic block that the auto-heal loop runs when the probe hits the "non-ModuleNotFoundError" branch. It dumps: - `uv pip list` for the transformers chain - ls+stat of transformers/__init__.py and its .pyc - the dist-info METADATA Version - the __version__ and _import_structure["generation"] block from the actual __init__.py on disk - what Python itself sees: sys.executable, sys.path, transformers.__file__, whether GenerationConfig is in dir() and in _class_to_module / _import_structure, and the traceback of an isolated `from transformers import GenerationConfig` attempt Three outcomes, each unblocks the next step: 1. The file-on-disk _import_structure does *not* contain GenerationConfig -> the wheel or its extraction is corrupt; pin transformers or change the index. 2. Python loads a different transformers.__file__ than we expect, or _import_structure is absent -> shadowing/.pth/PYTHONPATH issue; inspect sys.path. 3. Isolated `from transformers import GenerationConfig` WORKS in the diagnostic block -> the failure depends on vllm's prior imports; we can then bisect the vllm import chain. This commit just adds the dump. Once a build runs with this script the real fix will be obvious from the diagnostic output. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

Build LMCache#2653's diagnostic dump proved the transformers install is correct and that isolated `from transformers import GenerationConfig` works fine inside the failing pod. The failure only manifests through vllm's CLI entry point. Root cause is in vllm/entrypoints/cli/main.py itself: the module spawns a daemon thread (`_bg_preload_torch`) that calls `import torch` and then `import transformers` at module-scope, racing the main thread which proceeds into main() -> vllm.entrypoints.cli.benchmark.main -> ... -> vllm.transformers_utils.config:18 -> `from transformers import GenerationConfig, PretrainedConfig`. On the K3s pods the race lands deterministically in a state where transformers' _LazyModule._class_to_module cannot resolve 'GenerationConfig' (even though, as the diagnostic confirms, the fully-initialized module contains it). A fresh local venv with identical versions cannot reproduce it, consistent with a timing-sensitive race. The diagnostic ran `import transformers` on the main thread as its first action, which is exactly why it didn't trip the race. Fix: after `uv pip install -U vllm ...`, patch vllm/entrypoints/cli/main.py to add `import transformers` at module top, before the BG thread is spawned. Once transformers is already in sys.modules with _LazyModule fully initialized, the BG thread's `import transformers` becomes a no-op and the later `from transformers import ...` on the main thread is just an attribute lookup against a fully-ready module. The patch is idempotent (marker comment prevents double-application) and fails loudly if vllm restructures the file. Once upstream vllm fixes this on their side, this patch block can be removed. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

…CM describe Two changes to setup-env.sh: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from LMCache#3093 with a sitecustomize.py write. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

…CM describe Three changes to the setup harness: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from LMCache#3093 with a sitecustomize.py write in setup-env.sh. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install in setup-env.sh. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. 3. Same SCM pretend-version export in setup-blend-env.sh, which has its own `uv pip install -e . --no-build-isolation` calls (one per venv) that hit the identical `nightly-cu13` assertion. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

* [Hotfix][CI] Use sitecustomize.py for transformers pre-import; skip SCM describe Two changes to setup-env.sh: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from #3093 with a sitecustomize.py write. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. Signed-off-by: Samuel Shen <slshen@tensormesh.ai> * ci: trigger rerun on correctness to check determinism Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@tensormesh.ai> Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

sammshen requested review from ApostaC, deng451e and hickeyma as code owners April 21, 2026 06:05

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

royyhuang approved these changes Apr 21, 2026

View reviewed changes

maobaolong approved these changes Apr 21, 2026

View reviewed changes

sammshen added 4 commits April 21, 2026 02:19

sammshen enabled auto-merge (squash) April 21, 2026 06:45

sammshen added the full Run comprehensive tests on this PR label Apr 21, 2026

sammshen merged commit f9034fe into LMCache:dev Apr 21, 2026
31 of 34 checks passed

sammshen mentioned this pull request Apr 22, 2026

[Hotfix][CI] Replace vllm main.py patch with sitecustomize.py #3100

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install#3093

[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install#3093
sammshen merged 5 commits intoLMCache:devfrom
sammshen:hotfix/ci-robust-env-probes

sammshen commented Apr 21, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

chunxiaozheng commented Apr 21, 2026

Uh oh!

maobaolong left a comment

Uh oh!

sammshen commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sammshen commented Apr 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng commented Apr 21, 2026

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sammshen commented Apr 21, 2026 •

edited by cursor Bot

Loading