Studio: add torch's pip nvidia DLL dirs to PATH on Windows by danielhanchen · Pull Request #5324 · unslothai/unsloth

danielhanchen · 2026-05-07T11:34:15Z

Summary

Most direct fix for #5106 ("GPU detected but model loaded entirely on RAM/CPU" on Windows). This is the canonical fix per Studio's install design: install_python_stack already bundles torch with matching CUDA wheels (nvidia-cuda-runtime-cu13, nvidia-cublas-cu13, etc.) which ship cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll under <prefix>/Lib/site-packages/nvidia/<pkg>/(bin|Library/bin)/. The Linux runtime env block in start_llama_server already pulls the equivalent nvidia/cu*/lib paths into LD_LIBRARY_PATH, but the Windows block did not. Without this, the prebuilt llama-server.exe could not resolve cudart64_X.dll at runtime unless the user had a matching system CUDA toolkit on PATH, which is exactly the workaround Roland keeps recommending in #5106.

What changed

New LlamaCppBackend._windows_pip_nvidia_dll_dirs(prefix) resolver globs <prefix>/Lib/site-packages/nvidia/<pkg>/bin and <prefix>/Lib/site-packages/nvidia/<pkg>/Library/bin. Both layouts are seen in the wild across cuda_runtime / cublas / cudnn / nvjitlink wheels.
The Windows env block in start_llama_server extends path_dirs with the resolver output before falling back to CUDA_PATH/bin. Pip-installed wheels are the canonical source (mirrors the Linux LD_LIBRARY_PATH ordering); system CUDA toolkit remains a valid fallback.
Independent of the upstream cudart asset naming, so it stays robust if cudart-llama-bin-win-cuda-X.Y-x64.zip ever gets renamed.

How this relates to #5322 and #5323

This PR (dh/fix-5106-windows-nvidia-pip-path): aligns Windows with Linux's existing pattern. This is the canonical mechanism per the install design.
Studio: download paired cudart bundle on Windows CUDA installs #5322 (dh/fix-5106-windows-cudart-pair): belt-and-suspenders -- downloads upstream's cudart bundle next to llama-server.exe. Useful when torch's nvidia wheels are absent (CPU-only torch, unsloth run standalone, custom torch installs). Optional given this PR.
Studio: pin GPU at 95% headroom and warn on silent CPU fallback #5323 (dh/fix-5106-gpu-pin-threshold-and-cpu-fallback-warn): orthogonal runtime fix for the close-fit case + CPU-fallback diagnostic. Still needed even when CUDA loads fine.

If you want a single canonical fix, this PR is sufficient for the Windows symptom. #5322 can be closed in favor of this one. #5323 is independent.

Test plan

python -m pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py -- 7 new cases
python -m pytest studio/backend/tests/test_llama_cpp_*.py studio/backend/tests/test_llama_server_args.py -- 110 passed, no regressions
Validate on a Windows host without a system CUDA toolkit but with torch's nvidia wheels installed: nvidia-smi shows VRAM usage during inference (no winget install Nvidia.CUDA needed)

New tests in studio/backend/tests/test_llama_cpp_windows_nvidia_path.py:

test_returns_empty_when_no_nvidia_wheels
test_picks_up_bin_layout
test_picks_up_library_bin_layout
test_mixed_layouts_all_resolved
test_does_not_walk_outside_nvidia
test_skips_non_directories
test_missing_prefix_does_not_raise

Refs #5106

Studio's install_python_stack bundles torch with matching CUDA wheels (nvidia-cuda-runtime-cu13, nvidia-cublas-cu13, etc.) which ship cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll under the prefix's Lib/site-packages/nvidia/<pkg>/(bin|Library/bin)/ tree. The Linux runtime env block in start_llama_server already pulls the equivalent nvidia/cu*/lib paths into LD_LIBRARY_PATH, but the Windows block did not do this, so the prebuilt llama-server.exe could not resolve cudart64_X.dll at runtime unless the user had a matching system CUDA toolkit on PATH. That is the root cause of the Windows reports in #5106 ("GPU detected but model loaded entirely on RAM/CPU"), and matches Roland's repeated workaround in that issue: install matching CUDA toolkit version. Brings the Windows env block in line with the Linux pattern: * New LlamaCppBackend._windows_pip_nvidia_dll_dirs resolver globs <prefix>/Lib/site-packages/nvidia/<pkg>/bin and <prefix>/Lib/site-packages/nvidia/<pkg>/Library/bin. Both layouts are seen in the wild across cuda_runtime / cublas / cudnn / nvjitlink wheels. * The Windows env block now extends path_dirs with the resolver's output before falling back to CUDA_PATH/bin, so pip-installed wheels are the canonical source (mirroring the Linux LD_LIBRARY_PATH ordering). System CUDA toolkit remains a valid fallback. Tests: 7 new cases in studio/backend/tests/test_llama_cpp_windows_nvidia_path.py: * empty resolver when no nvidia wheels installed * nvidia/<pkg>/bin layout resolved * nvidia/<pkg>/Library/bin layout resolved * mixed bin and Library/bin layouts both resolved * unrelated site-packages contents not walked * non-directory entries skipped * missing prefix does not raise 110 backend tests pass. No regressions. Refs #5106

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request introduces a mechanism to locate and include CUDA DLL directories from pip-installed NVIDIA wheels on Windows, ensuring llama-server.exe can function without a system-wide CUDA installation. It adds a new static method _windows_pip_nvidia_dll_dirs to the LlamaCppBackend class and updates the load_model function to incorporate these paths. Additionally, a comprehensive test suite has been added to verify the directory resolution logic. The review feedback suggests refactoring the new method to use pathlib.Path instead of os.path to align with modern path handling practices in the codebase.

gemini-code-assist · 2026-05-07T11:48:09Z

+        import glob as _glob
+
+        nvidia_root = os.path.join(prefix, "Lib", "site-packages", "nvidia")
+        out: list[str] = []
+        for pattern in (
+            os.path.join(nvidia_root, "*", "bin"),
+            os.path.join(nvidia_root, "*", "Library", "bin"),
+        ):
+            for nv_dir in _glob.glob(pattern):
+                if os.path.isdir(nv_dir):
+                    out.append(nv_dir)
+        return out


Refactoring to pathlib.Path is encouraged to align with the codebase's move toward modern path handling. Please ensure this change is consistent with the module's existing import patterns, particularly regarding the handling of local imports, as per the repository's file-level conventions.

nvidia_root = Path(prefix) / "Lib" / "site-packages" / "nvidia" if not nvidia_root.is_dir(): return [] return [ str(p) for pattern in ("*/bin", "*/Library/bin") for p in nvidia_root.glob(pattern) if p.is_dir() ]

References

Follow existing file-level conventions for imports, such as keeping certain imports inline if that is the established pattern in the file. ^(link)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afb1f7cf59

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T10:23:30Z

+        nvidia_root = os.path.join(prefix, "Lib", "site-packages", "nvidia")
+        out: list[str] = []
+        for pattern in (
+            os.path.join(nvidia_root, "*", "bin"),
+            os.path.join(nvidia_root, "*", "Library", "bin"),


Include torch/lib in Windows DLL search

On Windows installs where PyTorch's CUDA wheel bundles the runtime DLLs under Lib/site-packages/torch/lib rather than separate nvidia/*/bin wheels, this resolver returns no pip-provided DLL directories, so llama-server.exe still falls back to requiring a system CUDA toolkit. The installer already treats torch/lib as a Python runtime DLL source (install_llama_prebuilt.py::python_runtime_dirs), and setup.ps1 installs PyTorch CUDA wheels specifically because they bundle CUDA, so the runtime PATH construction should include that same location when the nvidia namespace dirs are absent.

Useful? React with 👍 / 👎.

PyTorch's Windows CUDA wheels frequently bundle cudart64_X.dll and cublas64_X.dll directly under Lib/site-packages/torch/lib/ instead of shipping separate nvidia-cuda-runtime-cuXX / nvidia-cublas-cuXX wheels. On those installs _windows_pip_nvidia_dll_dirs previously returned nothing useful, and llama-server.exe fell back to needing a system CUDA toolkit on PATH -- the original #5106 failure mode. The install-side equivalent python_runtime_dirs in install_llama_prebuilt.py already treats torch/lib as a Python runtime DLL source for the same reason. Bring the runtime resolver in parity so torch-bundled-CUDA installs find their cudart at llama-server start. Updates the existing test that codified the bug (asserted torch/lib was excluded), and adds three new cases: pickup, combined-with-nvidia, and the must-be-a-directory guard.

danielhanchen · 2026-05-11T10:56:52Z

Pushed c1c8a074 to address the torch/lib gap (matches the codex bot P2 comment at llama_cpp.py:973).

The mechanism in the PR was correct -- _windows_pip_nvidia_dll_dirs covers both nvidia/PKG/bin and nvidia/PKG/Library/bin, the Windows env block prepends them before CUDA_PATH/bin, llama-server is launched via subprocess.Popen(env=env, ...) so the child actually sees the modified PATH, and binary_dir is first in path_dirs so anything dropped there by #5322 wins over pip nvidia version skew.

The hole was that PyTorch's Windows CUDA wheels frequently bundle cudart64_X.dll / cublas64_X.dll directly under Lib/site-packages/torch/lib/ instead of shipping separate nvidia-cuda-runtime-cuXX wheels. On those installs the resolver returned [] and llama-server.exe was back to relying on a system CUDA toolkit -- the original #5106 failure mode. The install-side equivalent python_runtime_dirs in install_llama_prebuilt.py:4318-4320 already scans nvidia/*/lib, nvidia/*/bin, and torch/lib for the same reason, so the asymmetry was unintentional.

What the commit changes:

_windows_pip_nvidia_dll_dirs now also appends PREFIX/Lib/site-packages/torch/lib when present. Same os.path.isdir guard as the nvidia entries.
Docstring updated to list all three patterns and tie torch/lib back to the installer's python_runtime_dirs.

Tests in studio/backend/tests/test_llama_cpp_windows_nvidia_path.py:

Renamed test_does_not_walk_outside_nvidia to test_does_not_walk_outside_known_paths and replaced the torch/lib exclusion assertion (which actively codified the bug) with a check that unrelated packages like numpy / scipy are still ignored.
New test_picks_up_torch_lib -- positive coverage for the torch case.
New test_torch_lib_combined_with_nvidia_wheels -- both pickup paths together.
New test_torch_lib_must_be_a_directory -- guard against a broken install where torch/lib exists as a file.

Full suite stays green: 10 cases here, 146 across test_llama_cpp_*.py + test_llama_server_args.py.

Minor open items (not addressed):

The gemini-code-assist pathlib.Path refactor at :978 is purely cosmetic. Worth doing as a follow-up if you want the file in the new style.
The resolver and its install-side cousin python_runtime_dirs are now near-duplicates with slightly different scopes. Consolidating them would be a nice future cleanup; out of scope for [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106.
No test covers the actual start_llama_server PATH assembly. The new tests only exercise the helper. Worth adding a test that asserts env["PATH"] is built as binary_dir;PIP_DIRS;CUDA_PATH\bin;EXISTING_PATH so a future reorder is caught.

Three follow-ups from a 12-reviewer batch over c1c8a07 (PR #5324): 1. The current nvidia-cuda-runtime (unsuffixed) 13.2.75 and nvidia-cublas 13.4.0.1 Windows wheels on PyPI ship under nvidia/cu13/bin/x86_64/cudart64_13.dll etc, not under nvidia/PKG/bin/. The previous resolver matched only one directory level past nvidia/PKG/ and silently missed the actual cu13 DLL location, leaving CUDA 13 users on the same failure mode as before #5106. Verified against: pip download nvidia-cuda-runtime --platform win_amd64 which produces nvidia/cu13/bin/x86_64/cudart64_13.dll. 2. glob.glob over sys.prefix interprets [ and ] as a character class. Valid Windows usernames / install paths can contain those characters (for example C:\Users\alice[work]\studio), so the previous resolver silently returned an empty list for such prefixes even when DLL dirs were present. 3. The resolver only ever returned nvidia/PKG/bin -- if both bin and bin/x86_64 exist (current wheels do), Windows DLL search should land on the arch-specific subdir first so the explicit cudart64_X.dll location wins. Rewritten as a pathlib.Path.iterdir walk to fix all three: no glob escaping needed, arch-specific subdirs added explicitly, and ordering puts bin/x86_64 before bin. Conda-style Library/bin/x86_64 and Library/bin/x64 are also covered for parity. A seen set dedupes when wheels happen to expose the same directory through multiple layouts. New tests: - test_picks_up_cu13_bin_x86_64_layout (the actual real-world cu13 case) - test_picks_up_bin_x64_layout - test_mixed_cu12_and_cu13_layouts - test_glob_meta_in_prefix_is_safe (bracket repro) - test_arch_subdir_listed_before_parent_bin (ordering) Verified empirically against PyPI: nvidia-cuda-runtime 13.2.75 -> nvidia/cu13/bin/x86_64/cudart64_13.dll nvidia-cublas 13.4.0.1 -> nvidia/cu13/bin/x86_64/cublas64_13.dll nvidia/cu13/bin/x86_64/cublasLt64_13.dll nvidia-cudnn-cu13 9.22.0.52 -> nvidia/cudnn/bin/cudnn64_9.dll (already covered) Refs #5106

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-11T12:09:21Z

Pushed ee8cd941 to address the three real findings from a 12-reviewer batch over c1c8a074:

Resolver misses the current CUDA 13 Windows wheel layout (3/12 reviewers). Verified empirically by pip download on PyPI:
- nvidia-cuda-runtime (unsuffixed) 13.2.75 ships nvidia/cu13/bin/x86_64/cudart64_13.dll.
- nvidia-cublas 13.4.0.1 ships nvidia/cu13/bin/x86_64/cublas64_13.dll and cublasLt64_13.dll.
  The previous glob set (nvidia/*/bin, nvidia/*/Library/bin) silently returned [] for these. CUDA 13 users would have stayed on the original [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106 failure mode. The resolver now walks bin/x86_64, bin/x64, bin, plus the Library/bin variants. Note: the older modular nvidia-cuda-runtime-cu12 12.9.79 still ships under nvidia/cuda_runtime/bin/cudart64_12.dll (verified) and stays covered.
Glob metacharacters in sys.prefix (2/12). glob.glob treats [/] as a character class, so a Windows username like alice[work] would silently produce an empty result. Rewritten as a Path.iterdir walk -- no escaping needed.
Arch-subdir ordering. When both nvidia/<pkg>/bin and nvidia/<pkg>/bin/x86_64 exist (the current cu13 wheels do), the explicit arch subdir is listed first so Windows DLL search lands on the actual cudart64_X.dll location even if the parent bin is empty.

Reviewer-flagged but assessed and deferred:

nvidia/*/lib on the backend (P1, 1/12) -- false positive on Windows. I inspected the actual cu12 and cu13 win_amd64 wheels (cuda-runtime, cublas, cudnn) and none ship DLLs under lib/. The Linux convention is lib, Windows is bin. Adding it would be harmless but the scenario doesn't exist in the wild.
Installer-side python_runtime_dirs asymmetry (P1, 7/12) -- real and worth fixing, but lives in install_llama_prebuilt.py which is owned by Studio: download paired cudart bundle on Windows CUDA installs #5322. Mirrored fix pushed there in 8b1288f7.
Sys.prefix vs sys.path coverage (P2, 1/12) -- only matters for --system-site-packages venvs, which Studio's install_python_stack does not create.

New tests:

test_picks_up_cu13_bin_x86_64_layout (the actual cu13 case).
test_picks_up_bin_x64_layout.
test_mixed_cu12_and_cu13_layouts.
test_glob_meta_in_prefix_is_safe (bracket repro).
test_arch_subdir_listed_before_parent_bin (ordering guarantee).

Regression: 15 cases in this file pass, 151 across test_llama_cpp_*.py + test_llama_server_args.py. The 35-scenario behavioural harness in temp/pr_simulation/ covers fresh-install, existing-install, PATH assembly, glob-meta safety, and cu13 layout end-to-end.

…tic CI test (#5376) * tests/studio: end-to-end Windows GPU detection mock test (#5106) Locks in the combined fix from #5322 + #5324 with a synthetic Windows scenario that CI runners without GPUs can execute. The test packs the real PyPI win_amd64 wheel layouts (cu12 modular and the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the exact filename set of the upstream b9103 cudart-llama-bin-win-cuda bundles, then mocks nvidia-smi output and asserts that: * Studio's nvidia-smi probe parses the CSV and reports the GPU. * After PR #5322 the install_dir/build/bin/Release/ tree contains all three cudart bundle DLLs alongside llama-server.exe. * After PR #5324 the PATH built by start_llama_server's win32 branch lists pip nvidia + torch/lib dirs in addition to the binary_dir. * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are each reachable from at least one PATH entry, with cudart specifically reachable from BOTH the install dir and a pip nvidia dir (defence in depth). * Bare venvs without pip nvidia wheels still work via #5322's binary_dir drop; pre-#5322 installs still work via #5324's PATH augmentation. * A reconstructed pre-PR scenario (cudart absent from binary_dir and pip dirs not on PATH) leaves cudart unreachable, confirming the test would catch a future regression. Bonus housekeeping in studio/install_llama_prebuilt.py: drop the pointless f-prefix on the literal "llama-" in the windows_cuda_attempts pairing guard (no behaviour change; lint nit flagged in the post-merge review). The mocks model real artifact contents I verified empirically: * pip download nvidia-cuda-runtime --platform win_amd64 produces nvidia/cu13/bin/x86_64/cudart64_13.dll. * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip produces exactly cudart64_13.dll + cublas64_13.dll + cublasLt64_13.dll, no executables. * objdump -p on the b9103 ggml-cuda.dll shows a static PE import on cublas64_13.dll (the root cause of #5106 when cublas64_13.dll is unreachable). Refs #5106 #5322 #5324 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test_5106_windows_gpu_detection_mock: don't shadow real httpx This file's name sorts before every other file in studio/backend/tests/ (starts with the digit '5'), so pytest collects it first. The previous ``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other test imported real httpx, which meant the stub permanently shadowed the real module for the rest of the collection. Tests that did ``from httpx import HTTPError, Response`` (test_anthropic_messages, test_browse_folders_route, test_training_*, etc) then failed at collection with ``ImportError: cannot import name 'HTTPError'`` because the stub did not define those names. The existing test_llama_cpp_windows_nvidia_path.py did not trigger the same issue because it sorts after test_a* / test_b* / etc, by which point the real httpx has already been imported and setdefault is a no-op. Switch the stub installation to ``importlib.util.find_spec(name) is None`` so we only fall back to the stub when the real module truly is not installed. Backend CI installs httpx, structlog, and the studio/backend/loggers package is reachable via the sys.path augmentation a few lines above, so on CI all three find_spec calls succeed and no stubs are installed at all. Also add HTTPError and Response to the stub module for the offline case, so anyone running this test outside CI with httpx absent still gets a stub that satisfies the broader test suite's imports. Refs #5106 * test_5106 + llama_cpp: extract win32 PATH helper and harden the regression test Follow-up to PR #5376's review feedback. Three real findings from the bot reviewers, plus one stale one. 1. (codex P2 line 201, gemini medium line 209) The regression test's _build_path_dirs_like_start_llama_server hand-copied the win32 branch of LlamaCppBackend.start_llama_server, so a future drop or reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production would have passed the test silently. Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs (binary_dir, prefix, cuda_path). Production start_llama_server now calls this helper. The test's wrapper is reduced to a one-line delegate that forwards to the staticmethod, so the regression asserts against the exact production logic instead of a parallel copy of it. 2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the variable set in the parent shell, _get_gpu_free_memory() filters the mocked CSV and returns [] or falls through to the torch fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES via monkeypatch.delenv(..., raising=False). 3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec ("loggers"), which returns a spec because studio/backend/loggers/ is on sys.path. But the actual import chain loads loggers/handlers.py which does `from fastapi import Request, Response` at module load. In a lightweight env without fastapi installed, the stub never lands and `from core.inference.llama_cpp import LlamaCppBackend` raises during collection. Switched _maybe_stub to a real import attempt under try / except ImportError so the stub falls into place when the package is discoverable but not importable. CI has fastapi so this is purely a developer- machine ergonomics fix. The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking across tests") was already addressed by 7437e73, which replaced the unconditional sys.modules.setdefault with the find_spec-gated _maybe_stub. No code change needed. Production behaviour is unchanged: _build_windows_path_dirs returns exactly the same ordering start_llama_server used inline ([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]). Verification (run inside studio/backend): pytest tests/test_5106_windows_gpu_detection_mock.py -v -> 10 passed pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py tests/test_5106_windows_gpu_detection_mock.py -q -> 171 passed CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu -> 1 passed * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename Windows GPU detection test to a generic filename and trim comments - studio/backend/tests/test_5106_windows_gpu_detection_mock.py -> studio/backend/tests/test_windows_gpu_detection_mock.py The file is the generic regression suite for Windows GPU detection; encoding the issue number in the filename is noise. - Shorten module docstring, helper docstrings, per-test docstrings and inline comments in the renamed test file. No behaviour change, all 10 cases still pass. - Shorten the _build_windows_path_dirs docstring in studio/backend/core/inference/llama_cpp.py and update the test-path reference; trim the win32 call-site comment to one line. Local verification: - pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed. - pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py studio/backend/tests/test_llama_server_args.py studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed. * Studio: harden _wait_for_health against transient httpx ReadError The probe loop in LlamaCppBackend._wait_for_health only caught ConnectError and TimeoutException. On Windows, when llama-server.exe accepts the TCP probe and then dies before sending HTTP headers, the peer process RST closes the socket. httpx maps this to ReadError ("WinError 10054 -- An existing connection was forcibly closed by the remote host"), which fell through the except clause and bubbled out of _wait_for_health, the routes/inference.py load_model handler, and back to /api/inference/load as an opaque 500. The crash diagnostic Studio actually wants to surface lives on the self._process.poll() branch at the top of the loop body: "llama-server exited with code X. Output: ...". We never reached that branch on the WinError 10054 path because the very first probe blew up. Expand the except to also swallow ReadError and RemoteProtocolError so the next 0.5-second iteration runs the poll() branch. Outcomes: * Process really died: structured exit-code + last-stdout log line. * Single transient probe blip: silently retried; load succeeds. Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five cases covering happy-path 200, transient ReadError + dead process, RemoteProtocolError + dead process, ConnectError cycling until success, and dead process before the first probe. The new cases would have failed against the old except clause -- ReadError / RemoteProtocolError would have propagated instead of returning False. Found while triaging the Windows Studio GGUF CI flake on this PR's 5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of launch on the GPU-less runner, and Studio reported "WinError 10054" instead of an upstream-tag-attributable exit-code line. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <michaelhan2050@gmail.com>

danielhanchen requested a review from rolandtannous as a code owner May 7, 2026 11:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

b012b9f

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Merge branch 'main' into dh/fix-5106-windows-nvidia-pip-path

afb1f7c

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

danielhanchen and others added 2 commits May 11, 2026 12:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

ccd83a9

for more information, see https://pre-commit.ci

danielhanchen mentioned this pull request May 11, 2026

Studio: download paired cudart bundle on Windows CUDA installs #5322

Merged

4 tasks

danielhanchen merged commit 379f5a5 into main May 11, 2026
30 of 31 checks passed

danielhanchen deleted the dh/fix-5106-windows-nvidia-pip-path branch May 11, 2026 12:42

danielhanchen mentioned this pull request May 11, 2026

tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test #5376

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Studio: add torch's pip nvidia DLL dirs to PATH on Windows#5324

Studio: add torch's pip nvidia DLL dirs to PATH on Windows#5324
danielhanchen merged 6 commits into
mainfrom
dh/fix-5106-windows-nvidia-pip-path

danielhanchen commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Uh oh!

danielhanchen commented May 11, 2026

Uh oh!

danielhanchen commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

danielhanchen commented May 7, 2026

Summary

What changed

How this relates to #5322 and #5323

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 11, 2026

Uh oh!

danielhanchen commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant