[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install#3093
[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install#3093sammshen merged 5 commits intoLMCache:devfrom
Conversation
The k3 integration tests have been red since 2026-04-21 ~04:00 UTC with:
ImportError: cannot import name 'GenerationConfig' from 'transformers'
(/opt/venv/lib/python3.12/site-packages/transformers/__init__.py)
at vllm/transformers_utils/config.py line 18. The failure surfaces
180s after the test starts as a generic "vLLM failed to start on
port 8000 within 180s" in wait_for_server, and only then does the
harness tail vllm.log to show the real traceback.
Root cause is that setup-env.sh declared the environment "ready"
without exercising the CLI import chain that `vllm serve` runs at
startup. The existing sequence was:
1. Install vLLM nightly
2. Probe `from vllm.entrypoints.cli.main import main` (auto-heal)
3. `uv pip install -e . --no-build-isolation` (LMCache install)
4. `python -c "import vllm; import lmcache"` (final probe)
Step 3 silently downgrades 9 transitive packages (opentelemetry-*
1.41->1.40, prometheus-client 0.25->0.24.1) to honor the caps in
requirements/common.txt. Step 4 is the only post-install check, but
plain `import vllm` doesn't pull vllm.entrypoints.cli.main ->
vllm.config -> vllm.transformers_utils.config, so any CLI-chain
breakage introduced by the downgrades slips through until the first
`vllm serve` subprocess fails 180s later.
Fixes:
- Extract the CLI import probe into a `probe_vllm_cli` function so
the same check runs both during the auto-heal loop (pre-install)
and as a hard probe after the LMCache install.
- Add a post-install CLI probe that fails fast with the actual
traceback and a full `uv pip freeze` if the env is broken, instead
of letting the 180s test-harness timeout hide the real failure.
- Snapshot `uv pip freeze` before and after `uv pip install -e .`
and diff them, so the silent downgrades done by LMCache's pins
are visible in the build log instead of having to be reconstructed
from package-install stderr.
With this change, the current k3 failure mode surfaces in ~10s at
setup time with a clear ImportError traceback and the exact package
versions at fault, instead of a 180s port-wait timeout.
Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
There was a problem hiding this comment.
Code Review
This pull request enhances the CI environment setup script by introducing a probe_vllm_cli function to detect broken vLLM import chains early. It also adds environment snapshotting before and after the LMCache installation to track package changes and includes a post-install probe to ensure environment integrity. Feedback suggests avoiding fixed filenames in /tmp to prevent race conditions in shared CI environments and reusing existing environment snapshots for error reporting instead of re-running uv pip freeze on a potentially broken environment.
| uv pip freeze | sort > /tmp/env-before-lmcache.txt | ||
| uv pip install -e . --no-build-isolation | ||
| uv pip freeze | sort > /tmp/env-after-lmcache.txt | ||
| if ! diff -q /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt >/dev/null; then | ||
| echo "--- :warning: Packages changed during LMCache install" | ||
| diff /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt || true | ||
| fi |
There was a problem hiding this comment.
Using fixed filenames in /tmp can lead to race conditions or permission conflicts if multiple CI jobs run concurrently on the same host (e.g., on a shared Buildkite agent). It is safer to use local files within the job's workspace (which is typically isolated per job in Buildkite) or use mktemp to ensure isolation.
| uv pip freeze | sort > /tmp/env-before-lmcache.txt | |
| uv pip install -e . --no-build-isolation | |
| uv pip freeze | sort > /tmp/env-after-lmcache.txt | |
| if ! diff -q /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt >/dev/null; then | |
| echo "--- :warning: Packages changed during LMCache install" | |
| diff /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt || true | |
| fi | |
| uv pip freeze | sort > env-before-lmcache.txt | |
| uv pip install -e . --no-build-isolation | |
| uv pip freeze | sort > env-after-lmcache.txt | |
| if ! diff -q env-before-lmcache.txt env-after-lmcache.txt >/dev/null; then | |
| echo "--- :warning: Packages changed during LMCache install" | |
| diff env-before-lmcache.txt env-after-lmcache.txt || true | |
| fi |
| echo "--- Traceback ---" >&2 | ||
| echo "$err" >&2 | ||
| echo "--- Installed packages ---" >&2 | ||
| uv pip freeze >&2 || true |
There was a problem hiding this comment.
Instead of running uv pip freeze again on a potentially broken environment, it is more reliable and efficient to display the snapshot already captured during the installation step. This avoids redundant processing and potential failures if the environment is in a severely degraded state.
| uv pip freeze >&2 || true | |
| cat env-after-lmcache.txt >&2 |
|
@maobaolong could you help take a look? |
|
once this PR passes CI, it will be fixed |
…e CLI Two bugs in the last fix, both now addressed: 1. The probe did not exercise the failing import chain. `from vllm.entrypoints.cli.main import main` only resolves the `main` symbol; the problematic `import vllm.entrypoints.cli.benchmark.main` lives *inside* main()'s body and is only reached when the CLI is actually invoked. Build LMCache#2599 confirmed this: the post-install probe printed "vLLM CLI import chain OK post-install" and then `vllm serve` immediately failed with the same `ImportError: cannot import name 'GenerationConfig' from 'transformers'` that started this whole thread. Switch the probe to `vllm --help`, which runs main() as a subprocess end-to-end and walks the full vllm.entrypoints.cli.main -> vllm.entrypoints.cli.benchmark.main -> vllm.config -> vllm.transformers_utils.config chain. 2. Root cause of the env breakage: stale bytecode from base-image layers. The CI base image pre-installs packages from requirements/*.txt at image build time, which populates /opt/venv/.../<pkg>/__pycache__/*.pyc with mtimes from the image build. When setup-env.sh later runs `uv pip install -U vllm ...`, uv extracts the new wheel using the mtimes recorded in the wheel itself -- often *older* than the pre-existing .pyc. Python's import system compares .py vs .pyc mtimes and keeps using the older .pyc, so Python executes 5.5.0's bytecode for transformers/__init__.py even though the .py on disk is 5.5.4 -- and 5.5.0's _import_structure differs enough from 5.5.4's that GenerationConfig doesn't get exposed at the top level. The result is the ImportError observed only on the CI pods (base image cached), not on any fresh venv. Wipe /opt/venv/**/__pycache__ after all upgrades so Python is forced to re-byte-compile from the current .py sources on first import. This is mechanically idempotent and cheap (a few seconds on first-use recompile, no network). This combination fixes the observed CI failure and, more importantly, closes the class of failure: any future base-image -> per-job upgrade that would otherwise leave stale bytecode behind now self-heals, and any future import-chain break that wouldn't have tripped the old probe now fails fast with the real traceback. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…state Build LMCache#2599 with the `vllm --help` probe in place proved the env is already broken immediately after `uv pip install -U vllm ...`, before LMCache install and before any post-install eviction: the auto-heal loop trips the "non-ModuleNotFoundError" branch with the exact ImportError traceback from vllm/transformers_utils/config.py:18. The same install recipe replayed in a fresh local venv (including a full requirements/cuda.txt-based base-image emulation) always succeeds. The divergence is therefore filesystem state on the K3s pod coming out of the cached base image, not something we can fix by regenerating bytecode after the fact. Apply the minimum-blast-radius fix: tell uv to uninstall-and- reinstall the full vllm serve import chain (transformers, tokenizers, huggingface-hub, safetensors, vllm) even when it thinks the existing install is already up to date. `--reinstall-package` implies `--refresh-package`, so the wheels come down fresh and are extracted over freshly cleared paths. Combined with a pre-install `uv cache clean` + `__pycache__` wipe and the existing post-install eviction, this puts the import chain on guaranteed-clean ground regardless of what the base image had. Cost is a few extra seconds of re-download; the base image stays the same. If a future job hits the same failure, the setup still fails fast with the full traceback (via the pre-install auto-heal loop), pointing at whatever upstream break is actually at fault. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…gging Build LMCache#2652 with --reinstall-package on the whole import chain still fails with the same ImportError: freshly extracted transformers 5.5.4 wheel, GenerationConfig still missing from the top-level namespace according to Python, while an identical recipe in any fresh local venv produces a working transformers import. I'm out of remote-debuggable hypotheses for why this is CI-specific. Add a diagnostic block that the auto-heal loop runs when the probe hits the "non-ModuleNotFoundError" branch. It dumps: - `uv pip list` for the transformers chain - ls+stat of transformers/__init__.py and its .pyc - the dist-info METADATA Version - the __version__ and _import_structure["generation"] block from the actual __init__.py on disk - what Python itself sees: sys.executable, sys.path, transformers.__file__, whether GenerationConfig is in dir() and in _class_to_module / _import_structure, and the traceback of an isolated `from transformers import GenerationConfig` attempt Three outcomes, each unblocks the next step: 1. The file-on-disk _import_structure does *not* contain GenerationConfig -> the wheel or its extraction is corrupt; pin transformers or change the index. 2. Python loads a different transformers.__file__ than we expect, or _import_structure is absent -> shadowing/.pth/PYTHONPATH issue; inspect sys.path. 3. Isolated `from transformers import GenerationConfig` WORKS in the diagnostic block -> the failure depends on vllm's prior imports; we can then bisect the vllm import chain. This commit just adds the dump. Once a build runs with this script the real fix will be obvious from the diagnostic output. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
Build LMCache#2653's diagnostic dump proved the transformers install is correct and that isolated `from transformers import GenerationConfig` works fine inside the failing pod. The failure only manifests through vllm's CLI entry point. Root cause is in vllm/entrypoints/cli/main.py itself: the module spawns a daemon thread (`_bg_preload_torch`) that calls `import torch` and then `import transformers` at module-scope, racing the main thread which proceeds into main() -> vllm.entrypoints.cli.benchmark.main -> ... -> vllm.transformers_utils.config:18 -> `from transformers import GenerationConfig, PretrainedConfig`. On the K3s pods the race lands deterministically in a state where transformers' _LazyModule._class_to_module cannot resolve 'GenerationConfig' (even though, as the diagnostic confirms, the fully-initialized module contains it). A fresh local venv with identical versions cannot reproduce it, consistent with a timing-sensitive race. The diagnostic ran `import transformers` on the main thread as its first action, which is exactly why it didn't trip the race. Fix: after `uv pip install -U vllm ...`, patch vllm/entrypoints/cli/main.py to add `import transformers` at module top, before the BG thread is spawned. Once transformers is already in sys.modules with _LazyModule fully initialized, the BG thread's `import transformers` becomes a no-op and the later `from transformers import ...` on the main thread is just an attribute lookup against a fully-ready module. The patch is idempotent (marker comment prevents double-application) and fails loudly if vllm restructures the file. Once upstream vllm fixes this on their side, this patch block can be removed. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…CM describe Two changes to setup-env.sh: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from LMCache#3093 with a sitecustomize.py write. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…CM describe Three changes to the setup harness: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from LMCache#3093 with a sitecustomize.py write in setup-env.sh. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install in setup-env.sh. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. 3. Same SCM pretend-version export in setup-blend-env.sh, which has its own `uv pip install -e . --no-build-isolation` calls (one per venv) that hit the identical `nightly-cu13` assertion. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…CM describe Three changes to the setup harness: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from LMCache#3093 with a sitecustomize.py write in setup-env.sh. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install in setup-env.sh. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. 3. Same SCM pretend-version export in setup-blend-env.sh, which has its own `uv pip install -e . --no-build-isolation` calls (one per venv) that hit the identical `nightly-cu13` assertion. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
* [Hotfix][CI] Use sitecustomize.py for transformers pre-import; skip SCM describe Two changes to setup-env.sh: 1. Replace the vllm/entrypoints/cli/main.py text-patch block from #3093 with a sitecustomize.py write. Python auto-runs sitecustomize on interpreter startup, before vllm loads, so transformers' _LazyModule is fully initialized on the main thread before any BG-thread preload can race it. Works regardless of how vllm structures its CLI module; the previous text-match approach broke the moment vllm restructured that file. 2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the editable install. The repo has non-PEP-440 tags (nightly, nightly-cu13) that crash setuptools_scm / vcs_versioning during git describe. Signed-off-by: Samuel Shen <slshen@tensormesh.ai> * ci: trigger rerun on correctness to check determinism Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@tensormesh.ai> Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>
The k3 integration tests have been red since 2026-04-21 ~04:00 UTC with:
at vllm/transformers_utils/config.py line 18. The failure surfaces 180s after the test starts as a generic "vLLM failed to start on port 8000 within 180s" in wait_for_server, and only then does the harness tail vllm.log to show the real traceback.
Root cause is that setup-env.sh declared the environment "ready" without exercising the CLI import chain that
vllm serveruns at startup. The existing sequence was:from vllm.entrypoints.cli.main import main(auto-heal)uv pip install -e . --no-build-isolation(LMCache install)python -c "import vllm; import lmcache"(final probe)Step 3 silently downgrades 9 transitive packages (opentelemetry-* 1.41->1.40, prometheus-client 0.25->0.24.1) to honor the caps in requirements/common.txt. Step 4 is the only post-install check, but plain
import vllmdoesn't pull vllm.entrypoints.cli.main -> vllm.config -> vllm.transformers_utils.config, so any CLI-chain breakage introduced by the downgrades slips through until the firstvllm servesubprocess fails 180s later.Fixes:
probe_vllm_clifunction so the same check runs both during the auto-heal loop (pre-install) and as a hard probe after the LMCache install.uv pip freezeif the env is broken, instead of letting the 180s test-harness timeout hide the real failure.uv pip freezebefore and afteruv pip install -e .and diff them, so the silent downgrades done by LMCache's pins are visible in the build log instead of having to be reconstructed from package-install stderr.With this change, the current k3 failure mode surfaces in ~10s at setup time with a clear ImportError traceback and the exact package versions at fault, instead of a 180s port-wait timeout.
What this PR does / why we need it:
Special notes for your reviewers:
If applicable:
Note
Medium Risk
CI-only changes but they directly mutate the installed
vllmpackage and alter dependency installation behavior, which could introduce new breakages or mask upstream issues if the patch/reinstall assumptions drift.Overview
Improves K3 CI environment setup to fail fast when the
vllm serveimport chain is broken, instead of surfacing later as a 180s server-start timeout.setup-env.shnow clears Python/uv caches before installs, forces a reinstall of keyvllm/transformers-stack packages, patches the installedvllmCLI entrypoint to pre-importtransformers(avoiding a background-thread race), and replaces the previous light import check with avllm --helpprobe used both during auto-heal and after the LMCache editable install. It also snapshots/diffsuv pip freezearound the LMCache install and emits a diagnostic dump / full package list on failures to aid debugging.Reviewed by Cursor Bugbot for commit b700632. Bugbot is set up for automated code reviews on this repo. Configure here.