tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test#5376
Conversation
Locks in the combined fix from #5322 + #5324 with a synthetic Windows scenario that CI runners without GPUs can execute. The test packs the real PyPI win_amd64 wheel layouts (cu12 modular and the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the exact filename set of the upstream b9103 cudart-llama-bin-win-cuda bundles, then mocks nvidia-smi output and asserts that: * Studio's nvidia-smi probe parses the CSV and reports the GPU. * After PR #5322 the install_dir/build/bin/Release/ tree contains all three cudart bundle DLLs alongside llama-server.exe. * After PR #5324 the PATH built by start_llama_server's win32 branch lists pip nvidia + torch/lib dirs in addition to the binary_dir. * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are each reachable from at least one PATH entry, with cudart specifically reachable from BOTH the install dir and a pip nvidia dir (defence in depth). * Bare venvs without pip nvidia wheels still work via #5322's binary_dir drop; pre-#5322 installs still work via #5324's PATH augmentation. * A reconstructed pre-PR scenario (cudart absent from binary_dir and pip dirs not on PATH) leaves cudart unreachable, confirming the test would catch a future regression. Bonus housekeeping in studio/install_llama_prebuilt.py: drop the pointless f-prefix on the literal "llama-" in the windows_cuda_attempts pairing guard (no behaviour change; lint nit flagged in the post-merge review). The mocks model real artifact contents I verified empirically: * pip download nvidia-cuda-runtime --platform win_amd64 produces nvidia/cu13/bin/x86_64/cudart64_13.dll. * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip produces exactly cudart64_13.dll + cublas64_13.dll + cublasLt64_13.dll, no executables. * objdump -p on the b9103 ggml-cuda.dll shows a static PE import on cublas64_13.dll (the root cause of #5106 when cublas64_13.dll is unreachable). Refs #5106 #5322 #5324
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d14f36613
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "__exit__": lambda self, *a: None, | ||
| }, | ||
| ) | ||
| sys.modules.setdefault("httpx", _httpx_stub) |
There was a problem hiding this comment.
Keep the httpx stub from leaking across tests
When this new test file is collected before the rest of studio/backend/tests (its test_5106... name makes that the normal alphabetical order), this setdefault installs the minimal fake httpx module into global sys.modules before any test has imported the real library. Later tests and backend modules then receive this fake instead of real httpx, but the fake lacks APIs such as Request, RequestError, AsyncClient, and Limits, so suites that include passthrough/inference tests can fail or exercise the wrong module. Please mirror the existing guarded pattern that first tries to import real httpx, or scope/remove the stub after importing LlamaCppBackend.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive end-to-end regression test for Windows GPU detection issues, specifically addressing cases where CUDA DLLs were not correctly resolved. It also includes a minor cleanup in the installation script to remove an unnecessary f-string. The reviewer pointed out that the test duplicates PATH-building logic from the production code, which creates a maintenance risk; they recommend refactoring this logic into a shared helper method within the LlamaCppBackend class to ensure consistency.
| def _build_path_dirs_like_start_llama_server( | ||
| binary_dir: Path, prefix: Path, cuda_path: str = "" | ||
| ) -> list[str]: | ||
| """Faithful reproduction of the win32 branch in | ||
| LlamaCppBackend.start_llama_server. Returns the ordered list of | ||
| PATH entries we prepend to the inherited env. Production code: | ||
| studio/backend/core/inference/llama_cpp.py:2340-2363. | ||
| """ | ||
| pip_dirs = LlamaCppBackend._windows_pip_nvidia_dll_dirs(str(prefix)) | ||
| path_dirs = [str(binary_dir)] | ||
| path_dirs.extend(pip_dirs) | ||
| if cuda_path: | ||
| cuda_bin = os.path.join(cuda_path, "bin") | ||
| if os.path.isdir(cuda_bin): | ||
| path_dirs.append(cuda_bin) | ||
| cuda_bin_x64 = os.path.join(cuda_path, "bin", "x64") | ||
| if os.path.isdir(cuda_bin_x64): | ||
| path_dirs.append(cuda_bin_x64) | ||
| return path_dirs |
There was a problem hiding this comment.
This function is a faithful reproduction of logic in LlamaCppBackend.start_llama_server, which introduces code duplication. This creates a maintenance risk, as changes in the production code might not be reflected here, leading to the test becoming outdated. To improve maintainability and adhere to the DRY principle, consider refactoring the PATH-building logic from LlamaCppBackend.start_llama_server into its own helper method within the LlamaCppBackend class. This new method could then be called from both start_llama_server and this test file, ensuring consistency and eliminating the duplicated code.
References
- Centralize recurring or complex logical checks (like determining if a host is external) into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.
This file's name sorts before every other file in studio/backend/tests/
(starts with the digit '5'), so pytest collects it first. The previous
``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other
test imported real httpx, which meant the stub permanently shadowed
the real module for the rest of the collection. Tests that did
``from httpx import HTTPError, Response`` (test_anthropic_messages,
test_browse_folders_route, test_training_*, etc) then failed at
collection with ``ImportError: cannot import name 'HTTPError'``
because the stub did not define those names. The existing
test_llama_cpp_windows_nvidia_path.py did not trigger the same issue
because it sorts after test_a* / test_b* / etc, by which point the
real httpx has already been imported and setdefault is a no-op.
Switch the stub installation to ``importlib.util.find_spec(name) is
None`` so we only fall back to the stub when the real module truly is
not installed. Backend CI installs httpx, structlog, and the
studio/backend/loggers package is reachable via the sys.path
augmentation a few lines above, so on CI all three find_spec calls
succeed and no stubs are installed at all.
Also add HTTPError and Response to the stub module for the offline
case, so anyone running this test outside CI with httpx absent still
gets a stub that satisfies the broader test suite's imports.
Refs #5106
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7437e73508
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _maybe_stub(name: str, builder): | ||
| if _importlib_util.find_spec(name) is None: | ||
| sys.modules[name] = builder() |
There was a problem hiding this comment.
Stub local loggers before importing llama_cpp
In lightweight test runs that do not install FastAPI, this guard still lets the real studio/backend/loggers package load because _BACKEND_DIR is already on sys.path, so find_spec("loggers") succeeds even though loggers/handlers.py immediately imports fastapi. The new test then fails during collection before any case runs; neighboring llama_cpp tests avoid this by stubbing the local loggers module unconditionally while only being careful not to shadow real third-party modules like httpx.
Useful? React with 👍 / 👎.
| pip_dirs = LlamaCppBackend._windows_pip_nvidia_dll_dirs(str(prefix)) | ||
| path_dirs = [str(binary_dir)] | ||
| path_dirs.extend(pip_dirs) |
There was a problem hiding this comment.
Exercise the real PATH assembly instead
Because this helper hand-copies the start_llama_server Windows PATH logic, the regression test will still pass if the production win32 branch later drops or reorders self._windows_pip_nvidia_dll_dirs(sys.prefix): the assertions only validate this duplicate implementation plus the resolver. For the #5324 regression this PR is meant to lock down, the test needs to observe the actual environment passed by start_llama_server (for example by mocking subprocess.Popen) rather than rebuilding path_dirs here.
Useful? React with 👍 / 👎.
|
CI green: 33 success, 0 failures, 0 pending on |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c6de7e0ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| with _mock_nvidia_smi_run(fake_csv): | ||
| gpus = LlamaCppBackend._get_gpu_free_memory() |
There was a problem hiding this comment.
Pin CUDA_VISIBLE_DEVICES for the mocked nvidia-smi probe
In test environments that inherit CUDA_VISIBLE_DEVICES (for example CUDA_VISIBLE_DEVICES=1 on a shared GPU runner, or an empty mask), _get_gpu_free_memory() filters the mocked nvidia-smi rows before returning them, so this test can return [] or fall through to the torch fallback instead of [(0, 22805)]. Please clear the variable or set it to 0 for this probe so the synthetic CSV is independent of the parent environment.
Useful? React with 👍 / 👎.
…ssion test Follow-up to PR #5376's review feedback. Three real findings from the bot reviewers, plus one stale one. 1. (codex P2 line 201, gemini medium line 209) The regression test's _build_path_dirs_like_start_llama_server hand-copied the win32 branch of LlamaCppBackend.start_llama_server, so a future drop or reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production would have passed the test silently. Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs (binary_dir, prefix, cuda_path). Production start_llama_server now calls this helper. The test's wrapper is reduced to a one-line delegate that forwards to the staticmethod, so the regression asserts against the exact production logic instead of a parallel copy of it. 2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the variable set in the parent shell, _get_gpu_free_memory() filters the mocked CSV and returns [] or falls through to the torch fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES via monkeypatch.delenv(..., raising=False). 3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec ("loggers"), which returns a spec because studio/backend/loggers/ is on sys.path. But the actual import chain loads loggers/handlers.py which does `from fastapi import Request, Response` at module load. In a lightweight env without fastapi installed, the stub never lands and `from core.inference.llama_cpp import LlamaCppBackend` raises during collection. Switched _maybe_stub to a real import attempt under try / except ImportError so the stub falls into place when the package is discoverable but not importable. CI has fastapi so this is purely a developer- machine ergonomics fix. The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking across tests") was already addressed by 7437e73, which replaced the unconditional sys.modules.setdefault with the find_spec-gated _maybe_stub. No code change needed. Production behaviour is unchanged: _build_windows_path_dirs returns exactly the same ordering start_llama_server used inline ([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]). Verification (run inside studio/backend): pytest tests/test_5106_windows_gpu_detection_mock.py -v -> 10 passed pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py tests/test_5106_windows_gpu_detection_mock.py -q -> 171 passed CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu -> 1 passed
for more information, see https://pre-commit.ci
|
Pushed 1. codex P2 line 201 + gemini medium line 209 (same concern, dedup'd). The local
Same observable PATH ordering as before ( 2. codex P2 line 245.
3. codex P2 line 66.
4. codex P1 line 85 (stale). Already addressed by Verification, run inside |
…ai#5423) PR unslothai#5414's drift detectors and the corresponding import_fixes helpers were written against transformers 4.x. The Repo tests (CPU) step (which installs transformers>=4.51,<5.5 and currently resolves to 5.4.0) surfaces three real predicate gaps: * test_pretrained_model_enable_input_require_grads_uses_old_pattern fires DRIFT DETECTED whenever the source contains "for module in self.modules()". But the unsloth replacement that patch_enable_input_require_grads installs ALSO uses that pattern -- deliberately, just wrapped in try / except NotImplementedError. So the predicate cannot distinguish broken upstream from the working patch. Accept either pre-HF#41993 shape (no self.modules() loop) or the post-patch shape (loop + NotImplementedError handler). * test_transformers_torchcodec_available_flag_is_present asserts the pre-5.x module-level _torchcodec_available flag. transformers 5 replaced it with an lru_cache'd is_torchcodec_available() callable. Accept either symbol. Also update disable_torchcodec_if_broken to actually disable on 5.x: clear the cache and rebind the function to return False. Local verification: * transformers 4.57.6 + trl 0.25.1 + peft 0.19.1 + triton 3.5.1 + vllm 0.15.1+cu130 (the Core HF=4.57.6 cell shape): 18 passed. * transformers 5.8.1 + peft 0.19.1 + torch 2.9 CPU (the Repo tests (CPU) shape, drop trl / vllm / datasets / xformers): 12 passed, 6 skipped on missing optional libs, 0 failed. PR unslothai#5376's Repo tests (CPU) failure was a triple: - triton + enable_input_require_grads: fixed by merging current main (PR unslothai#5421's relaxed triton predicate + conftest 'import unsloth'). - torchcodec: fixed by THIS PR.
for more information, see https://pre-commit.ci
Resolve conflict in studio/backend/core/inference/llama_cpp.py: take main's load_model body wrapped in the serial_load_lock (#5401 / #5161) and the #5347 mmproj guard, layered with this PR's two intended edits: - New @staticmethod _build_windows_path_dirs(binary_dir, prefix, cuda_path) next to _windows_pip_nvidia_dll_dirs, returning the win32 PATH ordering binary_dir, pip-nvidia wheels, CUDA_PATH/bin, CUDA_PATH/bin/x64. - start_llama_server's win32 branch now calls the helper instead of re-implementing the same ordering inline. The PR's own install_llama_prebuilt.py nit (f"llama-" -> "llama-") and the new test file under studio/backend/tests/ auto-merged cleanly. Picks up PR #5423's transformers 5.x drift-detector predicate fixes via the merge, which is what was failing on the three Core CI jobs.
…ents - studio/backend/tests/test_5106_windows_gpu_detection_mock.py -> studio/backend/tests/test_windows_gpu_detection_mock.py The file is the generic regression suite for Windows GPU detection; encoding the issue number in the filename is noise. - Shorten module docstring, helper docstrings, per-test docstrings and inline comments in the renamed test file. No behaviour change, all 10 cases still pass. - Shorten the _build_windows_path_dirs docstring in studio/backend/core/inference/llama_cpp.py and update the test-path reference; trim the win32 call-site comment to one line. Local verification: - pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed. - pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py studio/backend/tests/test_llama_server_args.py studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed.
The probe loop in LlamaCppBackend._wait_for_health only caught
ConnectError and TimeoutException. On Windows, when llama-server.exe
accepts the TCP probe and then dies before sending HTTP headers, the
peer process RST closes the socket. httpx maps this to ReadError
("WinError 10054 -- An existing connection was forcibly closed by the
remote host"), which fell through the except clause and bubbled out of
_wait_for_health, the routes/inference.py load_model handler, and back
to /api/inference/load as an opaque 500.
The crash diagnostic Studio actually wants to surface lives on the
self._process.poll() branch at the top of the loop body: "llama-server
exited with code X. Output: ...". We never reached that branch on the
WinError 10054 path because the very first probe blew up.
Expand the except to also swallow ReadError and RemoteProtocolError so
the next 0.5-second iteration runs the poll() branch. Outcomes:
* Process really died: structured exit-code + last-stdout log line.
* Single transient probe blip: silently retried; load succeeds.
Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five
cases covering happy-path 200, transient ReadError + dead process,
RemoteProtocolError + dead process, ConnectError cycling until success,
and dead process before the first probe. The new cases would have
failed against the old except clause -- ReadError / RemoteProtocolError
would have propagated instead of returning False.
Found while triaging the Windows Studio GGUF CI flake on this PR's
5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of
launch on the GPU-less runner, and Studio reported "WinError 10054"
instead of an upstream-tag-attributable exit-code line.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56537b6978
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sys.modules.setdefault("loggers", _loggers_stub) | ||
| sys.modules.setdefault("structlog", _types.ModuleType("structlog")) | ||
|
|
||
| import httpx # noqa: E402 |
There was a problem hiding this comment.
Guard the httpx import in the health tests
When this new test file is run by itself in a lightweight backend test environment without httpx installed, collection fails here before the fallback code below can add the exception attributes; I reproduced this with python -m pytest studio/backend/tests/test_llama_cpp_wait_for_health.py ... in the current checkout, which raises ModuleNotFoundError: No module named 'httpx'. Neighboring llama_cpp tests stub httpx before importing the production module for this scenario, so this file should use the same guarded/stubbed import path instead of requiring the full dependency just to collect.
Useful? React with 👍 / 👎.
Resolve conflict in studio/backend/core/inference/llama_cpp.py: _wait_for_health's except clause. Main's #5505 landed essentially the same fix this branch shipped in 56537b6 -- catching httpx.ReadError and RemoteProtocolError to keep a transient TCP RST from masking a real subprocess crash -- and added httpx.WriteError on top. Took main's version (strict superset) and dropped this branch's duplicate. Extend tests/test_llama_cpp_wait_for_health.py: - Add WriteError to the stub-completion list so the file imports cleanly against either real httpx or a sibling test's stub. - Add test_write_error_also_swallowed alongside the existing ReadError and RemoteProtocolError cases. All other files in this merge auto-resolved cleanly (install_llama_prebuilt.py, workflows, etc.). Local verification: - pytest test_llama_cpp_wait_for_health.py + windows_nvidia_path + windows_gpu_detection_mock + llama_server_args -- 116 passed. - pytest test_import_fixes_drift.py -- 18 passed. - pytest test_offline_gguf_cache_fallback.py (main's #5505 addition) -- 44 passed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06fb256e0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sys.path.insert(0, _BACKEND_DIR) | ||
|
|
||
| # Match the stubbing pattern in sibling tests so the module imports in | ||
| # a lightweight env without fastapi. |
There was a problem hiding this comment.
Avoid leaking the fake loggers module
When pytest collects this file before tests that import loggers.handlers (for example studio/backend/tests/test_log_filter_no_truncation.py, which sorts after test_llama_cpp_wait_for_health.py), this unconditional setdefault leaves a plain ModuleType in sys.modules instead of the real loggers package. The later import then fails with 'loggers' is not a package even in environments where the real package is available, because this file never tries to import it before stubbing.
Useful? React with 👍 / 👎.
Summary
End-to-end regression test that locks in the Windows GPU detection fix shipped by #5322 + #5324. Runs as a normal pytest in
studio/backend/tests/(Ubuntu pytest matrix instudio-backend-ci.yml); also touchesstudio/install_llama_prebuilt.pyso the Windows update / inference smoke CI runs on this PR as a post-merge sanity check.Why we need a synthetic test even after the two fixes landed: the failure mode is on Windows hosts without a system CUDA toolkit. CI runners do not have GPUs and
nvidia-smiis absent. This test mocksnvidia-smioutput and packs a synthetic Windows layout that mirrors real artefacts, so any future regression in the win32 PATH builder or the paired cudart download is caught without needing a real GPU.What is asserted, against the realistic synthetic layout:
nvidia-smiprobe parses the mocked CSV and reports the GPU.install_dir/build/bin/Release/contains all three DLLs the upstream cudart bundle ships (cudart64_X.dll,cublas64_X.dll,cublasLt64_X.dll).start_llama_server's win32 branch builds lists the pip nvidia + torch/lib directories in front of the inherited PATH.binary_dir(Studio: download paired cudart bundle on Windows CUDA installs #5322's contribution) AND a pip nvidia dir (Studio: add torch's pip nvidia DLL dirs to PATH on Windows #5324's contribution) — defence in depth.The mocks reflect real artefact contents verified empirically:
pip download nvidia-cuda-runtime --platform win_amd64producesnvidia/cu13/bin/x86_64/cudart64_13.dll.unzipon the upstreamb9103/cudart-llama-bin-win-cuda-13.1-x64.zipproduces exactlycudart64_13.dll+cublas64_13.dll+cublasLt64_13.dlland nothing else.objdump -pon the upstreamb9103/ggml-cuda.dllshows a static PE import oncublas64_13.dll, which is the root cause of [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106 whencublas64_13.dllis unreachable.Housekeeping: one lint nit dropped from #5322's
windows_cuda_attempts(f"llama-"->"llama-"; no behaviour change).Test plan
pytest studio/backend/tests/test_5106_windows_gpu_detection_mock.py-- 10 new cases pass.pytest studio/backend/tests/test_llama_cpp_*.py studio/backend/tests/test_llama_server_args.py tests/studio/install/{test_install_llama_prebuilt_logic,test_selection_logic,test_pr4562_bugfixes,test_llama_pr_force_and_source}.py-- 424 passed, no regressions.studio/install_llama_prebuilt.pytouch and the new file understudio/backend/**.Refs #5106 #5322 #5324