Skip to content

Studio: add torch's pip nvidia DLL dirs to PATH on Windows#5324

Merged
danielhanchen merged 6 commits into
mainfrom
dh/fix-5106-windows-nvidia-pip-path
May 11, 2026
Merged

Studio: add torch's pip nvidia DLL dirs to PATH on Windows#5324
danielhanchen merged 6 commits into
mainfrom
dh/fix-5106-windows-nvidia-pip-path

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Most direct fix for #5106 ("GPU detected but model loaded entirely on RAM/CPU" on Windows). This is the canonical fix per Studio's install design: install_python_stack already bundles torch with matching CUDA wheels (nvidia-cuda-runtime-cu13, nvidia-cublas-cu13, etc.) which ship cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll under <prefix>/Lib/site-packages/nvidia/<pkg>/(bin|Library/bin)/. The Linux runtime env block in start_llama_server already pulls the equivalent nvidia/cu*/lib paths into LD_LIBRARY_PATH, but the Windows block did not. Without this, the prebuilt llama-server.exe could not resolve cudart64_X.dll at runtime unless the user had a matching system CUDA toolkit on PATH, which is exactly the workaround Roland keeps recommending in #5106.

What changed

  • New LlamaCppBackend._windows_pip_nvidia_dll_dirs(prefix) resolver globs <prefix>/Lib/site-packages/nvidia/<pkg>/bin and <prefix>/Lib/site-packages/nvidia/<pkg>/Library/bin. Both layouts are seen in the wild across cuda_runtime / cublas / cudnn / nvjitlink wheels.
  • The Windows env block in start_llama_server extends path_dirs with the resolver output before falling back to CUDA_PATH/bin. Pip-installed wheels are the canonical source (mirrors the Linux LD_LIBRARY_PATH ordering); system CUDA toolkit remains a valid fallback.
  • Independent of the upstream cudart asset naming, so it stays robust if cudart-llama-bin-win-cuda-X.Y-x64.zip ever gets renamed.

How this relates to #5322 and #5323

  • This PR (dh/fix-5106-windows-nvidia-pip-path): aligns Windows with Linux's existing pattern. This is the canonical mechanism per the install design.
  • Studio: download paired cudart bundle on Windows CUDA installs #5322 (dh/fix-5106-windows-cudart-pair): belt-and-suspenders -- downloads upstream's cudart bundle next to llama-server.exe. Useful when torch's nvidia wheels are absent (CPU-only torch, unsloth run standalone, custom torch installs). Optional given this PR.
  • Studio: pin GPU at 95% headroom and warn on silent CPU fallback #5323 (dh/fix-5106-gpu-pin-threshold-and-cpu-fallback-warn): orthogonal runtime fix for the close-fit case + CPU-fallback diagnostic. Still needed even when CUDA loads fine.

If you want a single canonical fix, this PR is sufficient for the Windows symptom. #5322 can be closed in favor of this one. #5323 is independent.

Test plan

  • python -m pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py -- 7 new cases
  • python -m pytest studio/backend/tests/test_llama_cpp_*.py studio/backend/tests/test_llama_server_args.py -- 110 passed, no regressions
  • Validate on a Windows host without a system CUDA toolkit but with torch's nvidia wheels installed: nvidia-smi shows VRAM usage during inference (no winget install Nvidia.CUDA needed)

New tests in studio/backend/tests/test_llama_cpp_windows_nvidia_path.py:

  • test_returns_empty_when_no_nvidia_wheels
  • test_picks_up_bin_layout
  • test_picks_up_library_bin_layout
  • test_mixed_layouts_all_resolved
  • test_does_not_walk_outside_nvidia
  • test_skips_non_directories
  • test_missing_prefix_does_not_raise

Refs #5106

Studio's install_python_stack bundles torch with matching CUDA
wheels (nvidia-cuda-runtime-cu13, nvidia-cublas-cu13, etc.) which
ship cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll under
the prefix's Lib/site-packages/nvidia/<pkg>/(bin|Library/bin)/
tree. The Linux runtime env block in start_llama_server already
pulls the equivalent nvidia/cu*/lib paths into LD_LIBRARY_PATH,
but the Windows block did not do this, so the prebuilt
llama-server.exe could not resolve cudart64_X.dll at runtime
unless the user had a matching system CUDA toolkit on PATH. That
is the root cause of the Windows reports in
#5106 ("GPU detected but model loaded entirely
on RAM/CPU"), and matches Roland's repeated workaround in that
issue: install matching CUDA toolkit version.

Brings the Windows env block in line with the Linux pattern:

* New LlamaCppBackend._windows_pip_nvidia_dll_dirs resolver
  globs <prefix>/Lib/site-packages/nvidia/<pkg>/bin and
  <prefix>/Lib/site-packages/nvidia/<pkg>/Library/bin. Both
  layouts are seen in the wild across cuda_runtime / cublas /
  cudnn / nvjitlink wheels.

* The Windows env block now extends path_dirs with the
  resolver's output before falling back to CUDA_PATH/bin, so
  pip-installed wheels are the canonical source (mirroring the
  Linux LD_LIBRARY_PATH ordering). System CUDA toolkit remains a
  valid fallback.

Tests: 7 new cases in
studio/backend/tests/test_llama_cpp_windows_nvidia_path.py:

* empty resolver when no nvidia wheels installed
* nvidia/<pkg>/bin layout resolved
* nvidia/<pkg>/Library/bin layout resolved
* mixed bin and Library/bin layouts both resolved
* unrelated site-packages contents not walked
* non-directory entries skipped
* missing prefix does not raise

110 backend tests pass. No regressions.

Refs #5106

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to locate and include CUDA DLL directories from pip-installed NVIDIA wheels on Windows, ensuring llama-server.exe can function without a system-wide CUDA installation. It adds a new static method _windows_pip_nvidia_dll_dirs to the LlamaCppBackend class and updates the load_model function to incorporate these paths. Additionally, a comprehensive test suite has been added to verify the directory resolution logic. The review feedback suggests refactoring the new method to use pathlib.Path instead of os.path to align with modern path handling practices in the codebase.

Comment on lines +967 to +978
import glob as _glob

nvidia_root = os.path.join(prefix, "Lib", "site-packages", "nvidia")
out: list[str] = []
for pattern in (
os.path.join(nvidia_root, "*", "bin"),
os.path.join(nvidia_root, "*", "Library", "bin"),
):
for nv_dir in _glob.glob(pattern):
if os.path.isdir(nv_dir):
out.append(nv_dir)
return out

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Refactoring to pathlib.Path is encouraged to align with the codebase's move toward modern path handling. Please ensure this change is consistent with the module's existing import patterns, particularly regarding the handling of local imports, as per the repository's file-level conventions.

        nvidia_root = Path(prefix) / "Lib" / "site-packages" / "nvidia"
        if not nvidia_root.is_dir():
            return []

        return [
            str(p)
            for pattern in ("*/bin", "*/Library/bin")
            for p in nvidia_root.glob(pattern)
            if p.is_dir()
        ]
References
  1. Follow existing file-level conventions for imports, such as keeping certain imports inline if that is the established pattern in the file. (link)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afb1f7cf59

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +969 to +973
nvidia_root = os.path.join(prefix, "Lib", "site-packages", "nvidia")
out: list[str] = []
for pattern in (
os.path.join(nvidia_root, "*", "bin"),
os.path.join(nvidia_root, "*", "Library", "bin"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include torch/lib in Windows DLL search

On Windows installs where PyTorch's CUDA wheel bundles the runtime DLLs under Lib/site-packages/torch/lib rather than separate nvidia/*/bin wheels, this resolver returns no pip-provided DLL directories, so llama-server.exe still falls back to requiring a system CUDA toolkit. The installer already treats torch/lib as a Python runtime DLL source (install_llama_prebuilt.py::python_runtime_dirs), and setup.ps1 installs PyTorch CUDA wheels specifically because they bundle CUDA, so the runtime PATH construction should include that same location when the nvidia namespace dirs are absent.

Useful? React with 👍 / 👎.

PyTorch's Windows CUDA wheels frequently bundle cudart64_X.dll and
cublas64_X.dll directly under Lib/site-packages/torch/lib/ instead of
shipping separate nvidia-cuda-runtime-cuXX / nvidia-cublas-cuXX wheels.
On those installs _windows_pip_nvidia_dll_dirs previously returned
nothing useful, and llama-server.exe fell back to needing a system CUDA
toolkit on PATH -- the original #5106 failure mode.

The install-side equivalent python_runtime_dirs in
install_llama_prebuilt.py already treats torch/lib as a Python runtime
DLL source for the same reason. Bring the runtime resolver in parity
so torch-bundled-CUDA installs find their cudart at llama-server start.

Updates the existing test that codified the bug (asserted torch/lib was
excluded), and adds three new cases: pickup, combined-with-nvidia, and
the must-be-a-directory guard.
@danielhanchen

Copy link
Copy Markdown
Member Author

Pushed c1c8a074 to address the torch/lib gap (matches the codex bot P2 comment at llama_cpp.py:973).

The mechanism in the PR was correct -- _windows_pip_nvidia_dll_dirs covers both nvidia/PKG/bin and nvidia/PKG/Library/bin, the Windows env block prepends them before CUDA_PATH/bin, llama-server is launched via subprocess.Popen(env=env, ...) so the child actually sees the modified PATH, and binary_dir is first in path_dirs so anything dropped there by #5322 wins over pip nvidia version skew.

The hole was that PyTorch's Windows CUDA wheels frequently bundle cudart64_X.dll / cublas64_X.dll directly under Lib/site-packages/torch/lib/ instead of shipping separate nvidia-cuda-runtime-cuXX wheels. On those installs the resolver returned [] and llama-server.exe was back to relying on a system CUDA toolkit -- the original #5106 failure mode. The install-side equivalent python_runtime_dirs in install_llama_prebuilt.py:4318-4320 already scans nvidia/*/lib, nvidia/*/bin, and torch/lib for the same reason, so the asymmetry was unintentional.

What the commit changes:

  1. _windows_pip_nvidia_dll_dirs now also appends PREFIX/Lib/site-packages/torch/lib when present. Same os.path.isdir guard as the nvidia entries.
  2. Docstring updated to list all three patterns and tie torch/lib back to the installer's python_runtime_dirs.

Tests in studio/backend/tests/test_llama_cpp_windows_nvidia_path.py:

  • Renamed test_does_not_walk_outside_nvidia to test_does_not_walk_outside_known_paths and replaced the torch/lib exclusion assertion (which actively codified the bug) with a check that unrelated packages like numpy / scipy are still ignored.
  • New test_picks_up_torch_lib -- positive coverage for the torch case.
  • New test_torch_lib_combined_with_nvidia_wheels -- both pickup paths together.
  • New test_torch_lib_must_be_a_directory -- guard against a broken install where torch/lib exists as a file.

Full suite stays green: 10 cases here, 146 across test_llama_cpp_*.py + test_llama_server_args.py.

Minor open items (not addressed):

  • The gemini-code-assist pathlib.Path refactor at :978 is purely cosmetic. Worth doing as a follow-up if you want the file in the new style.
  • The resolver and its install-side cousin python_runtime_dirs are now near-duplicates with slightly different scopes. Consolidating them would be a nice future cleanup; out of scope for [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106.
  • No test covers the actual start_llama_server PATH assembly. The new tests only exercise the helper. Worth adding a test that asserts env["PATH"] is built as binary_dir;PIP_DIRS;CUDA_PATH\bin;EXISTING_PATH so a future reorder is caught.

danielhanchen and others added 2 commits May 11, 2026 12:02
Three follow-ups from a 12-reviewer batch over c1c8a07 (PR #5324):

1. The current nvidia-cuda-runtime (unsuffixed) 13.2.75 and
   nvidia-cublas 13.4.0.1 Windows wheels on PyPI ship under
   nvidia/cu13/bin/x86_64/cudart64_13.dll etc, not under
   nvidia/PKG/bin/. The previous resolver matched only one
   directory level past nvidia/PKG/ and silently missed the
   actual cu13 DLL location, leaving CUDA 13 users on the same
   failure mode as before #5106. Verified against:
       pip download nvidia-cuda-runtime --platform win_amd64
   which produces nvidia/cu13/bin/x86_64/cudart64_13.dll.

2. glob.glob over sys.prefix interprets [ and ] as a
   character class. Valid Windows usernames / install paths can
   contain those characters (for example C:\Users\alice[work]\studio),
   so the previous resolver silently returned an empty list for such
   prefixes even when DLL dirs were present.

3. The resolver only ever returned nvidia/PKG/bin -- if both
   bin and bin/x86_64 exist (current wheels do), Windows
   DLL search should land on the arch-specific subdir first so the
   explicit cudart64_X.dll location wins.

Rewritten as a pathlib.Path.iterdir walk to fix all three:
no glob escaping needed, arch-specific subdirs added explicitly,
and ordering puts bin/x86_64 before bin. Conda-style
Library/bin/x86_64 and Library/bin/x64 are also covered for
parity. A seen set dedupes when wheels happen to expose the
same directory through multiple layouts.

New tests:
 - test_picks_up_cu13_bin_x86_64_layout (the actual real-world cu13 case)
 - test_picks_up_bin_x64_layout
 - test_mixed_cu12_and_cu13_layouts
 - test_glob_meta_in_prefix_is_safe (bracket repro)
 - test_arch_subdir_listed_before_parent_bin (ordering)

Verified empirically against PyPI:
       nvidia-cuda-runtime 13.2.75 -> nvidia/cu13/bin/x86_64/cudart64_13.dll
       nvidia-cublas       13.4.0.1 -> nvidia/cu13/bin/x86_64/cublas64_13.dll
                                       nvidia/cu13/bin/x86_64/cublasLt64_13.dll
       nvidia-cudnn-cu13   9.22.0.52 -> nvidia/cudnn/bin/cudnn64_9.dll (already covered)

Refs #5106
@danielhanchen

Copy link
Copy Markdown
Member Author

Pushed ee8cd941 to address the three real findings from a 12-reviewer batch over c1c8a074:

  1. Resolver misses the current CUDA 13 Windows wheel layout (3/12 reviewers). Verified empirically by pip download on PyPI:

    • nvidia-cuda-runtime (unsuffixed) 13.2.75 ships nvidia/cu13/bin/x86_64/cudart64_13.dll.
    • nvidia-cublas 13.4.0.1 ships nvidia/cu13/bin/x86_64/cublas64_13.dll and cublasLt64_13.dll.
      The previous glob set (nvidia/*/bin, nvidia/*/Library/bin) silently returned [] for these. CUDA 13 users would have stayed on the original [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106 failure mode. The resolver now walks bin/x86_64, bin/x64, bin, plus the Library/bin variants. Note: the older modular nvidia-cuda-runtime-cu12 12.9.79 still ships under nvidia/cuda_runtime/bin/cudart64_12.dll (verified) and stays covered.
  2. Glob metacharacters in sys.prefix (2/12). glob.glob treats [/] as a character class, so a Windows username like alice[work] would silently produce an empty result. Rewritten as a Path.iterdir walk -- no escaping needed.

  3. Arch-subdir ordering. When both nvidia/<pkg>/bin and nvidia/<pkg>/bin/x86_64 exist (the current cu13 wheels do), the explicit arch subdir is listed first so Windows DLL search lands on the actual cudart64_X.dll location even if the parent bin is empty.

Reviewer-flagged but assessed and deferred:

  • nvidia/*/lib on the backend (P1, 1/12) -- false positive on Windows. I inspected the actual cu12 and cu13 win_amd64 wheels (cuda-runtime, cublas, cudnn) and none ship DLLs under lib/. The Linux convention is lib, Windows is bin. Adding it would be harmless but the scenario doesn't exist in the wild.
  • Installer-side python_runtime_dirs asymmetry (P1, 7/12) -- real and worth fixing, but lives in install_llama_prebuilt.py which is owned by Studio: download paired cudart bundle on Windows CUDA installs #5322. Mirrored fix pushed there in 8b1288f7.
  • Sys.prefix vs sys.path coverage (P2, 1/12) -- only matters for --system-site-packages venvs, which Studio's install_python_stack does not create.

New tests:

  • test_picks_up_cu13_bin_x86_64_layout (the actual cu13 case).
  • test_picks_up_bin_x64_layout.
  • test_mixed_cu12_and_cu13_layouts.
  • test_glob_meta_in_prefix_is_safe (bracket repro).
  • test_arch_subdir_listed_before_parent_bin (ordering guarantee).

Regression: 15 cases in this file pass, 151 across test_llama_cpp_*.py + test_llama_server_args.py. The 35-scenario behavioural harness in temp/pr_simulation/ covers fresh-install, existing-install, PATH assembly, glob-meta safety, and cu13 layout end-to-end.

@danielhanchen danielhanchen merged commit 379f5a5 into main May 11, 2026
30 of 31 checks passed
@danielhanchen danielhanchen deleted the dh/fix-5106-windows-nvidia-pip-path branch May 11, 2026 12:42
danielhanchen added a commit that referenced this pull request May 18, 2026
…tic CI test (#5376)

* tests/studio: end-to-end Windows GPU detection mock test (#5106)

Locks in the combined fix from #5322 + #5324 with a synthetic
Windows scenario that CI runners without GPUs can execute. The
test packs the real PyPI win_amd64 wheel layouts (cu12 modular and
the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the
exact filename set of the upstream b9103 cudart-llama-bin-win-cuda
bundles, then mocks nvidia-smi output and asserts that:

 * Studio's nvidia-smi probe parses the CSV and reports the GPU.
 * After PR #5322 the install_dir/build/bin/Release/ tree contains
   all three cudart bundle DLLs alongside llama-server.exe.
 * After PR #5324 the PATH built by start_llama_server's win32
   branch lists pip nvidia + torch/lib dirs in addition to the
   binary_dir.
 * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are
   each reachable from at least one PATH entry, with cudart
   specifically reachable from BOTH the install dir and a pip
   nvidia dir (defence in depth).
 * Bare venvs without pip nvidia wheels still work via #5322's
   binary_dir drop; pre-#5322 installs still work via #5324's
   PATH augmentation.
 * A reconstructed pre-PR scenario (cudart absent from binary_dir
   and pip dirs not on PATH) leaves cudart unreachable, confirming
   the test would catch a future regression.

Bonus housekeeping in studio/install_llama_prebuilt.py: drop the
pointless f-prefix on the literal "llama-" in the
windows_cuda_attempts pairing guard (no behaviour change; lint
nit flagged in the post-merge review).

The mocks model real artifact contents I verified empirically:
 * pip download nvidia-cuda-runtime --platform win_amd64
   produces nvidia/cu13/bin/x86_64/cudart64_13.dll.
 * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip
   produces exactly cudart64_13.dll + cublas64_13.dll +
   cublasLt64_13.dll, no executables.
 * objdump -p on the b9103 ggml-cuda.dll shows a static PE
   import on cublas64_13.dll (the root cause of #5106 when
   cublas64_13.dll is unreachable).

Refs #5106 #5322 #5324

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test_5106_windows_gpu_detection_mock: don't shadow real httpx

This file's name sorts before every other file in studio/backend/tests/
(starts with the digit '5'), so pytest collects it first. The previous
``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other
test imported real httpx, which meant the stub permanently shadowed
the real module for the rest of the collection. Tests that did
``from httpx import HTTPError, Response`` (test_anthropic_messages,
test_browse_folders_route, test_training_*, etc) then failed at
collection with ``ImportError: cannot import name 'HTTPError'``
because the stub did not define those names. The existing
test_llama_cpp_windows_nvidia_path.py did not trigger the same issue
because it sorts after test_a* / test_b* / etc, by which point the
real httpx has already been imported and setdefault is a no-op.

Switch the stub installation to ``importlib.util.find_spec(name) is
None`` so we only fall back to the stub when the real module truly is
not installed. Backend CI installs httpx, structlog, and the
studio/backend/loggers package is reachable via the sys.path
augmentation a few lines above, so on CI all three find_spec calls
succeed and no stubs are installed at all.

Also add HTTPError and Response to the stub module for the offline
case, so anyone running this test outside CI with httpx absent still
gets a stub that satisfies the broader test suite's imports.

Refs #5106

* test_5106 + llama_cpp: extract win32 PATH helper and harden the regression test

Follow-up to PR #5376's review feedback. Three real findings from the
bot reviewers, plus one stale one.

1. (codex P2 line 201, gemini medium line 209) The regression test's
   _build_path_dirs_like_start_llama_server hand-copied the win32
   branch of LlamaCppBackend.start_llama_server, so a future drop or
   reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production
   would have passed the test silently.

   Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs
   (binary_dir, prefix, cuda_path). Production start_llama_server now
   calls this helper. The test's wrapper is reduced to a one-line
   delegate that forwards to the staticmethod, so the regression
   asserts against the exact production logic instead of a parallel
   copy of it.

2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did
   not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the
   variable set in the parent shell, _get_gpu_free_memory() filters
   the mocked CSV and returns [] or falls through to the torch
   fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES
   via monkeypatch.delenv(..., raising=False).

3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec
   ("loggers"), which returns a spec because studio/backend/loggers/
   is on sys.path. But the actual import chain loads
   loggers/handlers.py which does `from fastapi import Request,
   Response` at module load. In a lightweight env without fastapi
   installed, the stub never lands and `from core.inference.llama_cpp
   import LlamaCppBackend` raises during collection. Switched
   _maybe_stub to a real import attempt under try / except ImportError
   so the stub falls into place when the package is discoverable but
   not importable. CI has fastapi so this is purely a developer-
   machine ergonomics fix.

The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking
across tests") was already addressed by 7437e73, which replaced the
unconditional sys.modules.setdefault with the find_spec-gated
_maybe_stub. No code change needed.

Production behaviour is unchanged: _build_windows_path_dirs returns
exactly the same ordering start_llama_server used inline
([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]).

Verification (run inside studio/backend):
  pytest tests/test_5106_windows_gpu_detection_mock.py -v
    -> 10 passed
  pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py
       tests/test_5106_windows_gpu_detection_mock.py -q
    -> 171 passed
  CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu
    -> 1 passed

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename Windows GPU detection test to a generic filename and trim comments

- studio/backend/tests/test_5106_windows_gpu_detection_mock.py
  -> studio/backend/tests/test_windows_gpu_detection_mock.py
  The file is the generic regression suite for Windows GPU detection;
  encoding the issue number in the filename is noise.
- Shorten module docstring, helper docstrings, per-test docstrings and
  inline comments in the renamed test file. No behaviour change,
  all 10 cases still pass.
- Shorten the _build_windows_path_dirs docstring in
  studio/backend/core/inference/llama_cpp.py and update the test-path
  reference; trim the win32 call-site comment to one line.

Local verification:
- pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed.
- pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py
  studio/backend/tests/test_llama_server_args.py
  studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed.

* Studio: harden _wait_for_health against transient httpx ReadError

The probe loop in LlamaCppBackend._wait_for_health only caught
ConnectError and TimeoutException. On Windows, when llama-server.exe
accepts the TCP probe and then dies before sending HTTP headers, the
peer process RST closes the socket. httpx maps this to ReadError
("WinError 10054 -- An existing connection was forcibly closed by the
remote host"), which fell through the except clause and bubbled out of
_wait_for_health, the routes/inference.py load_model handler, and back
to /api/inference/load as an opaque 500.

The crash diagnostic Studio actually wants to surface lives on the
self._process.poll() branch at the top of the loop body: "llama-server
exited with code X. Output: ...". We never reached that branch on the
WinError 10054 path because the very first probe blew up.

Expand the except to also swallow ReadError and RemoteProtocolError so
the next 0.5-second iteration runs the poll() branch. Outcomes:
  * Process really died: structured exit-code + last-stdout log line.
  * Single transient probe blip: silently retried; load succeeds.

Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five
cases covering happy-path 200, transient ReadError + dead process,
RemoteProtocolError + dead process, ConnectError cycling until success,
and dead process before the first probe. The new cases would have
failed against the old except clause -- ReadError / RemoteProtocolError
would have propagated instead of returning False.

Found while triaging the Windows Studio GGUF CI flake on this PR's
5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of
launch on the GPU-less runner, and Studio reported "WinError 10054"
instead of an upstream-tag-attributable exit-code line.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant