Skip to content

[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install#3093

Merged
sammshen merged 5 commits intoLMCache:devfrom
sammshen:hotfix/ci-robust-env-probes
Apr 21, 2026
Merged

[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install#3093
sammshen merged 5 commits intoLMCache:devfrom
sammshen:hotfix/ci-robust-env-probes

Conversation

@sammshen
Copy link
Copy Markdown
Contributor

@sammshen sammshen commented Apr 21, 2026

The k3 integration tests have been red since 2026-04-21 ~04:00 UTC with:

ImportError: cannot import name 'GenerationConfig' from 'transformers'
(/opt/venv/lib/python3.12/site-packages/transformers/__init__.py)

at vllm/transformers_utils/config.py line 18. The failure surfaces 180s after the test starts as a generic "vLLM failed to start on port 8000 within 180s" in wait_for_server, and only then does the harness tail vllm.log to show the real traceback.

Root cause is that setup-env.sh declared the environment "ready" without exercising the CLI import chain that vllm serve runs at startup. The existing sequence was:

  1. Install vLLM nightly
  2. Probe from vllm.entrypoints.cli.main import main (auto-heal)
  3. uv pip install -e . --no-build-isolation (LMCache install)
  4. python -c "import vllm; import lmcache" (final probe)

Step 3 silently downgrades 9 transitive packages (opentelemetry-* 1.41->1.40, prometheus-client 0.25->0.24.1) to honor the caps in requirements/common.txt. Step 4 is the only post-install check, but plain import vllm doesn't pull vllm.entrypoints.cli.main -> vllm.config -> vllm.transformers_utils.config, so any CLI-chain breakage introduced by the downgrades slips through until the first vllm serve subprocess fails 180s later.

Fixes:

  • Extract the CLI import probe into a probe_vllm_cli function so the same check runs both during the auto-heal loop (pre-install) and as a hard probe after the LMCache install.
  • Add a post-install CLI probe that fails fast with the actual traceback and a full uv pip freeze if the env is broken, instead of letting the 180s test-harness timeout hide the real failure.
  • Snapshot uv pip freeze before and after uv pip install -e . and diff them, so the silent downgrades done by LMCache's pins are visible in the build log instead of having to be reconstructed from package-install stderr.

With this change, the current k3 failure mode surfaces in ~10s at setup time with a clear ImportError traceback and the exact package versions at fault, instead of a 180s port-wait timeout.

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Note

Medium Risk
CI-only changes but they directly mutate the installed vllm package and alter dependency installation behavior, which could introduce new breakages or mask upstream issues if the patch/reinstall assumptions drift.

Overview
Improves K3 CI environment setup to fail fast when the vllm serve import chain is broken, instead of surfacing later as a 180s server-start timeout.

setup-env.sh now clears Python/uv caches before installs, forces a reinstall of key vllm/transformers-stack packages, patches the installed vllm CLI entrypoint to pre-import transformers (avoiding a background-thread race), and replaces the previous light import check with a vllm --help probe used both during auto-heal and after the LMCache editable install. It also snapshots/diffs uv pip freeze around the LMCache install and emits a diagnostic dump / full package list on failures to aid debugging.

Reviewed by Cursor Bugbot for commit b700632. Bugbot is set up for automated code reviews on this repo. Configure here.

The k3 integration tests have been red since 2026-04-21 ~04:00 UTC with:

    ImportError: cannot import name 'GenerationConfig' from 'transformers'
    (/opt/venv/lib/python3.12/site-packages/transformers/__init__.py)

at vllm/transformers_utils/config.py line 18. The failure surfaces
180s after the test starts as a generic "vLLM failed to start on
port 8000 within 180s" in wait_for_server, and only then does the
harness tail vllm.log to show the real traceback.

Root cause is that setup-env.sh declared the environment "ready"
without exercising the CLI import chain that `vllm serve` runs at
startup. The existing sequence was:

  1. Install vLLM nightly
  2. Probe `from vllm.entrypoints.cli.main import main` (auto-heal)
  3. `uv pip install -e . --no-build-isolation` (LMCache install)
  4. `python -c "import vllm; import lmcache"` (final probe)

Step 3 silently downgrades 9 transitive packages (opentelemetry-*
1.41->1.40, prometheus-client 0.25->0.24.1) to honor the caps in
requirements/common.txt. Step 4 is the only post-install check, but
plain `import vllm` doesn't pull vllm.entrypoints.cli.main ->
vllm.config -> vllm.transformers_utils.config, so any CLI-chain
breakage introduced by the downgrades slips through until the first
`vllm serve` subprocess fails 180s later.

Fixes:

- Extract the CLI import probe into a `probe_vllm_cli` function so
  the same check runs both during the auto-heal loop (pre-install)
  and as a hard probe after the LMCache install.
- Add a post-install CLI probe that fails fast with the actual
  traceback and a full `uv pip freeze` if the env is broken, instead
  of letting the 180s test-harness timeout hide the real failure.
- Snapshot `uv pip freeze` before and after `uv pip install -e .`
  and diff them, so the silent downgrades done by LMCache's pins
  are visible in the build log instead of having to be reconstructed
  from package-install stderr.

With this change, the current k3 failure mode surfaces in ~10s at
setup time with a clear ImportError traceback and the exact package
versions at fault, instead of a 180s port-wait timeout.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the CI environment setup script by introducing a probe_vllm_cli function to detect broken vLLM import chains early. It also adds environment snapshotting before and after the LMCache installation to track package changes and includes a post-install probe to ensure environment integrity. Feedback suggests avoiding fixed filenames in /tmp to prevent race conditions in shared CI environments and reusing existing environment snapshots for error reporting instead of re-running uv pip freeze on a potentially broken environment.

Comment on lines +103 to +109
uv pip freeze | sort > /tmp/env-before-lmcache.txt
uv pip install -e . --no-build-isolation
uv pip freeze | sort > /tmp/env-after-lmcache.txt
if ! diff -q /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt >/dev/null; then
echo "--- :warning: Packages changed during LMCache install"
diff /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt || true
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using fixed filenames in /tmp can lead to race conditions or permission conflicts if multiple CI jobs run concurrently on the same host (e.g., on a shared Buildkite agent). It is safer to use local files within the job's workspace (which is typically isolated per job in Buildkite) or use mktemp to ensure isolation.

Suggested change
uv pip freeze | sort > /tmp/env-before-lmcache.txt
uv pip install -e . --no-build-isolation
uv pip freeze | sort > /tmp/env-after-lmcache.txt
if ! diff -q /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt >/dev/null; then
echo "--- :warning: Packages changed during LMCache install"
diff /tmp/env-before-lmcache.txt /tmp/env-after-lmcache.txt || true
fi
uv pip freeze | sort > env-before-lmcache.txt
uv pip install -e . --no-build-isolation
uv pip freeze | sort > env-after-lmcache.txt
if ! diff -q env-before-lmcache.txt env-after-lmcache.txt >/dev/null; then
echo "--- :warning: Packages changed during LMCache install"
diff env-before-lmcache.txt env-after-lmcache.txt || true
fi

echo "--- Traceback ---" >&2
echo "$err" >&2
echo "--- Installed packages ---" >&2
uv pip freeze >&2 || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of running uv pip freeze again on a potentially broken environment, it is more reliable and efficient to display the snapshot already captured during the installation step. This avoids redundant processing and potential failures if the environment is in a severely degraded state.

Suggested change
uv pip freeze >&2 || true
cat env-after-lmcache.txt >&2

@chunxiaozheng
Copy link
Copy Markdown
Collaborator

@maobaolong could you help take a look?

Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sammshen
Copy link
Copy Markdown
Contributor Author

once this PR passes CI, it will be fixed

…e CLI

Two bugs in the last fix, both now addressed:

1. The probe did not exercise the failing import chain. `from
   vllm.entrypoints.cli.main import main` only resolves the `main`
   symbol; the problematic `import vllm.entrypoints.cli.benchmark.main`
   lives *inside* main()'s body and is only reached when the CLI is
   actually invoked. Build LMCache#2599 confirmed this: the post-install
   probe printed "vLLM CLI import chain OK post-install" and then
   `vllm serve` immediately failed with the same
   `ImportError: cannot import name 'GenerationConfig' from
   'transformers'` that started this whole thread.

   Switch the probe to `vllm --help`, which runs main() as a
   subprocess end-to-end and walks the full vllm.entrypoints.cli.main
   -> vllm.entrypoints.cli.benchmark.main -> vllm.config ->
   vllm.transformers_utils.config chain.

2. Root cause of the env breakage: stale bytecode from base-image
   layers. The CI base image pre-installs packages from
   requirements/*.txt at image build time, which populates
   /opt/venv/.../<pkg>/__pycache__/*.pyc with mtimes from the image
   build. When setup-env.sh later runs `uv pip install -U vllm ...`,
   uv extracts the new wheel using the mtimes recorded in the wheel
   itself -- often *older* than the pre-existing .pyc. Python's
   import system compares .py vs .pyc mtimes and keeps using the
   older .pyc, so Python executes 5.5.0's bytecode for
   transformers/__init__.py even though the .py on disk is 5.5.4 --
   and 5.5.0's _import_structure differs enough from 5.5.4's that
   GenerationConfig doesn't get exposed at the top level. The result
   is the ImportError observed only on the CI pods (base image
   cached), not on any fresh venv.

   Wipe /opt/venv/**/__pycache__ after all upgrades so Python is
   forced to re-byte-compile from the current .py sources on first
   import. This is mechanically idempotent and cheap (a few seconds
   on first-use recompile, no network).

This combination fixes the observed CI failure and, more
importantly, closes the class of failure: any future base-image ->
per-job upgrade that would otherwise leave stale bytecode behind
now self-heals, and any future import-chain break that wouldn't
have tripped the old probe now fails fast with the real traceback.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…state

Build LMCache#2599 with the `vllm --help` probe in place proved the env is
already broken immediately after `uv pip install -U vllm ...`, before
LMCache install and before any post-install eviction: the auto-heal
loop trips the "non-ModuleNotFoundError" branch with the exact
ImportError traceback from vllm/transformers_utils/config.py:18.

The same install recipe replayed in a fresh local venv (including a
full requirements/cuda.txt-based base-image emulation) always
succeeds. The divergence is therefore filesystem state on the K3s
pod coming out of the cached base image, not something we can fix
by regenerating bytecode after the fact.

Apply the minimum-blast-radius fix: tell uv to uninstall-and-
reinstall the full vllm serve import chain (transformers, tokenizers,
huggingface-hub, safetensors, vllm) even when it thinks the existing
install is already up to date. `--reinstall-package` implies
`--refresh-package`, so the wheels come down fresh and are extracted
over freshly cleared paths. Combined with a pre-install
`uv cache clean` + `__pycache__` wipe and the existing post-install
eviction, this puts the import chain on guaranteed-clean ground
regardless of what the base image had.

Cost is a few extra seconds of re-download; the base image stays
the same. If a future job hits the same failure, the setup still
fails fast with the full traceback (via the pre-install auto-heal
loop), pointing at whatever upstream break is actually at fault.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
…gging

Build LMCache#2652 with --reinstall-package on the whole import chain still
fails with the same ImportError: freshly extracted transformers 5.5.4
wheel, GenerationConfig still missing from the top-level namespace
according to Python, while an identical recipe in any fresh local
venv produces a working transformers import.

I'm out of remote-debuggable hypotheses for why this is CI-specific.
Add a diagnostic block that the auto-heal loop runs when the probe
hits the "non-ModuleNotFoundError" branch. It dumps:

- `uv pip list` for the transformers chain
- ls+stat of transformers/__init__.py and its .pyc
- the dist-info METADATA Version
- the __version__ and _import_structure["generation"] block from the
  actual __init__.py on disk
- what Python itself sees: sys.executable, sys.path,
  transformers.__file__, whether GenerationConfig is in dir() and in
  _class_to_module / _import_structure, and the traceback of an
  isolated `from transformers import GenerationConfig` attempt

Three outcomes, each unblocks the next step:

1. The file-on-disk _import_structure does *not* contain
   GenerationConfig -> the wheel or its extraction is corrupt; pin
   transformers or change the index.
2. Python loads a different transformers.__file__ than we expect, or
   _import_structure is absent -> shadowing/.pth/PYTHONPATH issue;
   inspect sys.path.
3. Isolated `from transformers import GenerationConfig` WORKS in
   the diagnostic block -> the failure depends on vllm's prior
   imports; we can then bisect the vllm import chain.

This commit just adds the dump. Once a build runs with this script
the real fix will be obvious from the diagnostic output.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
Build LMCache#2653's diagnostic dump proved the transformers install is
correct and that isolated `from transformers import GenerationConfig`
works fine inside the failing pod. The failure only manifests through
vllm's CLI entry point.

Root cause is in vllm/entrypoints/cli/main.py itself: the module spawns
a daemon thread (`_bg_preload_torch`) that calls `import torch` and
then `import transformers` at module-scope, racing the main thread
which proceeds into main() -> vllm.entrypoints.cli.benchmark.main ->
... -> vllm.transformers_utils.config:18 ->
`from transformers import GenerationConfig, PretrainedConfig`.

On the K3s pods the race lands deterministically in a state where
transformers' _LazyModule._class_to_module cannot resolve
'GenerationConfig' (even though, as the diagnostic confirms, the
fully-initialized module contains it). A fresh local venv with
identical versions cannot reproduce it, consistent with a
timing-sensitive race. The diagnostic ran `import transformers` on
the main thread as its first action, which is exactly why it didn't
trip the race.

Fix: after `uv pip install -U vllm ...`, patch
vllm/entrypoints/cli/main.py to add `import transformers` at module
top, before the BG thread is spawned. Once transformers is already
in sys.modules with _LazyModule fully initialized, the BG thread's
`import transformers` becomes a no-op and the later
`from transformers import ...` on the main thread is just an
attribute lookup against a fully-ready module.

The patch is idempotent (marker comment prevents double-application)
and fails loudly if vllm restructures the file. Once upstream vllm
fixes this on their side, this patch block can be removed.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
@sammshen sammshen enabled auto-merge (squash) April 21, 2026 06:45
@sammshen sammshen added the full Run comprehensive tests on this PR label Apr 21, 2026
@sammshen sammshen merged commit f9034fe into LMCache:dev Apr 21, 2026
31 of 34 checks passed
sammshen added a commit to sammshen/LMCache that referenced this pull request Apr 22, 2026
…CM describe

Two changes to setup-env.sh:

1. Replace the vllm/entrypoints/cli/main.py text-patch block from
   LMCache#3093 with a sitecustomize.py write. Python auto-runs
   sitecustomize on interpreter startup, before vllm loads, so
   transformers' _LazyModule is fully initialized on the main thread
   before any BG-thread preload can race it. Works regardless of
   how vllm structures its CLI module; the previous text-match
   approach broke the moment vllm restructured that file.

2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the
   editable install. The repo has non-PEP-440 tags (nightly,
   nightly-cu13) that crash setuptools_scm / vcs_versioning during
   git describe.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
sammshen added a commit to sammshen/LMCache that referenced this pull request Apr 22, 2026
…CM describe

Three changes to the setup harness:

1. Replace the vllm/entrypoints/cli/main.py text-patch block from
   LMCache#3093 with a sitecustomize.py write in setup-env.sh. Python
   auto-runs sitecustomize on interpreter startup, before vllm
   loads, so transformers' _LazyModule is fully initialized on the
   main thread before any BG-thread preload can race it. Works
   regardless of how vllm structures its CLI module; the previous
   text-match approach broke the moment vllm restructured that file.

2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the
   editable install in setup-env.sh. The repo has non-PEP-440 tags
   (nightly, nightly-cu13) that crash setuptools_scm /
   vcs_versioning during git describe.

3. Same SCM pretend-version export in setup-blend-env.sh, which
   has its own `uv pip install -e . --no-build-isolation` calls
   (one per venv) that hit the identical `nightly-cu13` assertion.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
sammshen added a commit to sammshen/LMCache that referenced this pull request Apr 22, 2026
…CM describe

Three changes to the setup harness:

1. Replace the vllm/entrypoints/cli/main.py text-patch block from
   LMCache#3093 with a sitecustomize.py write in setup-env.sh. Python
   auto-runs sitecustomize on interpreter startup, before vllm
   loads, so transformers' _LazyModule is fully initialized on the
   main thread before any BG-thread preload can race it. Works
   regardless of how vllm structures its CLI module; the previous
   text-match approach broke the moment vllm restructured that file.

2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the
   editable install in setup-env.sh. The repo has non-PEP-440 tags
   (nightly, nightly-cu13) that crash setuptools_scm /
   vcs_versioning during git describe.

3. Same SCM pretend-version export in setup-blend-env.sh, which
   has its own `uv pip install -e . --no-build-isolation` calls
   (one per venv) that hit the identical `nightly-cu13` assertion.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
sammshen added a commit that referenced this pull request Apr 22, 2026
* [Hotfix][CI] Use sitecustomize.py for transformers pre-import; skip SCM describe

Two changes to setup-env.sh:

1. Replace the vllm/entrypoints/cli/main.py text-patch block from
   #3093 with a sitecustomize.py write. Python auto-runs
   sitecustomize on interpreter startup, before vllm loads, so
   transformers' _LazyModule is fully initialized on the main thread
   before any BG-thread preload can race it. Works regardless of
   how vllm structures its CLI module; the previous text-match
   approach broke the moment vllm restructured that file.

2. Set SETUPTOOLS_SCM_PRETEND_VERSION_FOR_LMCACHE before the
   editable install. The repo has non-PEP-440 tags (nightly,
   nightly-cu13) that crash setuptools_scm / vcs_versioning during
   git describe.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

* ci: trigger rerun on correctness to check determinism

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants