tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test by danielhanchen · Pull Request #5376 · unslothai/unsloth

danielhanchen · 2026-05-11T13:31:05Z

Summary

End-to-end regression test that locks in the Windows GPU detection fix shipped by #5322 + #5324. Runs as a normal pytest in studio/backend/tests/ (Ubuntu pytest matrix in studio-backend-ci.yml); also touches studio/install_llama_prebuilt.py so the Windows update / inference smoke CI runs on this PR as a post-merge sanity check.

Why we need a synthetic test even after the two fixes landed: the failure mode is on Windows hosts without a system CUDA toolkit. CI runners do not have GPUs and nvidia-smi is absent. This test mocks nvidia-smi output and packs a synthetic Windows layout that mirrors real artefacts, so any future regression in the win32 PATH builder or the paired cudart download is caught without needing a real GPU.

What is asserted, against the realistic synthetic layout:

Studio's nvidia-smi probe parses the mocked CSV and reports the GPU.
After Studio: download paired cudart bundle on Windows CUDA installs #5322's install path, install_dir/build/bin/Release/ contains all three DLLs the upstream cudart bundle ships (cudart64_X.dll, cublas64_X.dll, cublasLt64_X.dll).
After Studio: add torch's pip nvidia DLL dirs to PATH on Windows #5324's launch path, the PATH that start_llama_server's win32 branch builds lists the pip nvidia + torch/lib directories in front of the inherited PATH.
Each of the three cudart bundle DLLs is reachable from at least one PATH entry.
cudart is reachable from BOTH binary_dir (Studio: download paired cudart bundle on Windows CUDA installs #5322's contribution) AND a pip nvidia dir (Studio: add torch's pip nvidia DLL dirs to PATH on Windows #5324's contribution) — defence in depth.
Bare venvs without pip nvidia wheels still work via Studio: download paired cudart bundle on Windows CUDA installs #5322's binary_dir drop.
Pre-Studio: download paired cudart bundle on Windows CUDA installs #5322 installs still work via Studio: add torch's pip nvidia DLL dirs to PATH on Windows #5324's PATH augmentation.
A reconstructed pre-PR scenario (cudart absent from binary_dir and pip dirs not on PATH) leaves cudart unreachable — confirms the test catches a future regression.

The mocks reflect real artefact contents verified empirically:

pip download nvidia-cuda-runtime --platform win_amd64 produces nvidia/cu13/bin/x86_64/cudart64_13.dll.
unzip on the upstream b9103/cudart-llama-bin-win-cuda-13.1-x64.zip produces exactly cudart64_13.dll + cublas64_13.dll + cublasLt64_13.dll and nothing else.
objdump -p on the upstream b9103/ggml-cuda.dll shows a static PE import on cublas64_13.dll, which is the root cause of [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106 when cublas64_13.dll is unreachable.

Housekeeping: one lint nit dropped from #5322's windows_cuda_attempts (f"llama-" -> "llama-"; no behaviour change).

Test plan

pytest studio/backend/tests/test_5106_windows_gpu_detection_mock.py -- 10 new cases pass.
pytest studio/backend/tests/test_llama_cpp_*.py studio/backend/tests/test_llama_server_args.py tests/studio/install/{test_install_llama_prebuilt_logic,test_selection_logic,test_pr4562_bugfixes,test_llama_pr_force_and_source}.py -- 424 passed, no regressions.
Windows smoke CI (update + inference) on this PR -- triggered by the studio/install_llama_prebuilt.py touch and the new file under studio/backend/**.

Refs #5106 #5322 #5324

Locks in the combined fix from #5322 + #5324 with a synthetic Windows scenario that CI runners without GPUs can execute. The test packs the real PyPI win_amd64 wheel layouts (cu12 modular and the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the exact filename set of the upstream b9103 cudart-llama-bin-win-cuda bundles, then mocks nvidia-smi output and asserts that: * Studio's nvidia-smi probe parses the CSV and reports the GPU. * After PR #5322 the install_dir/build/bin/Release/ tree contains all three cudart bundle DLLs alongside llama-server.exe. * After PR #5324 the PATH built by start_llama_server's win32 branch lists pip nvidia + torch/lib dirs in addition to the binary_dir. * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are each reachable from at least one PATH entry, with cudart specifically reachable from BOTH the install dir and a pip nvidia dir (defence in depth). * Bare venvs without pip nvidia wheels still work via #5322's binary_dir drop; pre-#5322 installs still work via #5324's PATH augmentation. * A reconstructed pre-PR scenario (cudart absent from binary_dir and pip dirs not on PATH) leaves cudart unreachable, confirming the test would catch a future regression. Bonus housekeeping in studio/install_llama_prebuilt.py: drop the pointless f-prefix on the literal "llama-" in the windows_cuda_attempts pairing guard (no behaviour change; lint nit flagged in the post-merge review). The mocks model real artifact contents I verified empirically: * pip download nvidia-cuda-runtime --platform win_amd64 produces nvidia/cu13/bin/x86_64/cudart64_13.dll. * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip produces exactly cudart64_13.dll + cublas64_13.dll + cublasLt64_13.dll, no executables. * objdump -p on the b9103 ggml-cuda.dll shows a static PE import on cublas64_13.dll (the root cause of #5106 when cublas64_13.dll is unreachable). Refs #5106 #5322 #5324

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d14f36613

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T13:33:19Z

+        "__exit__": lambda self, *a: None,
+    },
+)
+sys.modules.setdefault("httpx", _httpx_stub)


Keep the httpx stub from leaking across tests

When this new test file is collected before the rest of studio/backend/tests (its test_5106... name makes that the normal alphabetical order), this setdefault installs the minimal fake httpx module into global sys.modules before any test has imported the real library. Later tests and backend modules then receive this fake instead of real httpx, but the fake lacks APIs such as Request, RequestError, AsyncClient, and Limits, so suites that include passthrough/inference tests can fail or exercise the wrong module. Please mirror the existing guarded pattern that first tries to import real httpx, or scope/remove the stub after importing LlamaCppBackend.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces a comprehensive end-to-end regression test for Windows GPU detection issues, specifically addressing cases where CUDA DLLs were not correctly resolved. It also includes a minor cleanup in the installation script to remove an unnecessary f-string. The reviewer pointed out that the test duplicates PATH-building logic from the production code, which creates a maintenance risk; they recommend refactoring this logic into a shared helper method within the LlamaCppBackend class to ensure consistency.

gemini-code-assist · 2026-05-11T13:40:30Z

+def _build_path_dirs_like_start_llama_server(
+    binary_dir: Path, prefix: Path, cuda_path: str = ""
+) -> list[str]:
+    """Faithful reproduction of the win32 branch in
+    LlamaCppBackend.start_llama_server. Returns the ordered list of
+    PATH entries we prepend to the inherited env. Production code:
+    studio/backend/core/inference/llama_cpp.py:2340-2363.
+    """
+    pip_dirs = LlamaCppBackend._windows_pip_nvidia_dll_dirs(str(prefix))
+    path_dirs = [str(binary_dir)]
+    path_dirs.extend(pip_dirs)
+    if cuda_path:
+        cuda_bin = os.path.join(cuda_path, "bin")
+        if os.path.isdir(cuda_bin):
+            path_dirs.append(cuda_bin)
+        cuda_bin_x64 = os.path.join(cuda_path, "bin", "x64")
+        if os.path.isdir(cuda_bin_x64):
+            path_dirs.append(cuda_bin_x64)
+    return path_dirs


This function is a faithful reproduction of logic in LlamaCppBackend.start_llama_server, which introduces code duplication. This creates a maintenance risk, as changes in the production code might not be reflected here, leading to the test becoming outdated. To improve maintainability and adhere to the DRY principle, consider refactoring the PATH-building logic from LlamaCppBackend.start_llama_server into its own helper method within the LlamaCppBackend class. This new method could then be called from both start_llama_server and this test file, ensuring consistency and eliminating the duplicated code.

References

Centralize recurring or complex logical checks (like determining if a host is external) into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

This file's name sorts before every other file in studio/backend/tests/ (starts with the digit '5'), so pytest collects it first. The previous ``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other test imported real httpx, which meant the stub permanently shadowed the real module for the rest of the collection. Tests that did ``from httpx import HTTPError, Response`` (test_anthropic_messages, test_browse_folders_route, test_training_*, etc) then failed at collection with ``ImportError: cannot import name 'HTTPError'`` because the stub did not define those names. The existing test_llama_cpp_windows_nvidia_path.py did not trigger the same issue because it sorts after test_a* / test_b* / etc, by which point the real httpx has already been imported and setdefault is a no-op. Switch the stub installation to ``importlib.util.find_spec(name) is None`` so we only fall back to the stub when the real module truly is not installed. Backend CI installs httpx, structlog, and the studio/backend/loggers package is reachable via the sys.path augmentation a few lines above, so on CI all three find_spec calls succeed and no stubs are installed at all. Also add HTTPError and Response to the stub module for the offline case, so anyone running this test outside CI with httpx absent still gets a stub that satisfies the broader test suite's imports. Refs #5106

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7437e73508

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T14:30:04Z

+def _maybe_stub(name: str, builder):
+    if _importlib_util.find_spec(name) is None:
+        sys.modules[name] = builder()


Stub local loggers before importing llama_cpp

In lightweight test runs that do not install FastAPI, this guard still lets the real studio/backend/loggers package load because _BACKEND_DIR is already on sys.path, so find_spec("loggers") succeeds even though loggers/handlers.py immediately imports fastapi. The new test then fails during collection before any case runs; neighboring llama_cpp tests avoid this by stubbing the local loggers module unconditionally while only being careful not to shadow real third-party modules like httpx.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-11T14:30:04Z

+    pip_dirs = LlamaCppBackend._windows_pip_nvidia_dll_dirs(str(prefix))
+    path_dirs = [str(binary_dir)]
+    path_dirs.extend(pip_dirs)


Exercise the real PATH assembly instead

Because this helper hand-copies the start_llama_server Windows PATH logic, the regression test will still pass if the production win32 branch later drops or reorders self._windows_pip_nvidia_dll_dirs(sys.prefix): the assertions only validate this duplicate implementation plus the resolver. For the #5324 regression this PR is meant to lock down, the test needs to observe the actual environment passed by start_llama_server (for example by mocking subprocess.Popen) rather than rebuilding path_dirs here.

Useful? React with 👍 / 👎.

danielhanchen · 2026-05-11T15:56:24Z

CI green: 33 success, 0 failures, 0 pending on 7437e735. Ready for review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c6de7e0ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T11:37:21Z

+        with _mock_nvidia_smi_run(fake_csv):
+            gpus = LlamaCppBackend._get_gpu_free_memory()


Pin CUDA_VISIBLE_DEVICES for the mocked nvidia-smi probe

In test environments that inherit CUDA_VISIBLE_DEVICES (for example CUDA_VISIBLE_DEVICES=1 on a shared GPU runner, or an empty mask), _get_gpu_free_memory() filters the mocked nvidia-smi rows before returning them, so this test can return [] or fall through to the torch fallback instead of [(0, 22805)]. Please clear the variable or set it to 0 for this probe so the synthetic CSV is independent of the parent environment.

Useful? React with 👍 / 👎.

…ssion test Follow-up to PR #5376's review feedback. Three real findings from the bot reviewers, plus one stale one. 1. (codex P2 line 201, gemini medium line 209) The regression test's _build_path_dirs_like_start_llama_server hand-copied the win32 branch of LlamaCppBackend.start_llama_server, so a future drop or reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production would have passed the test silently. Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs (binary_dir, prefix, cuda_path). Production start_llama_server now calls this helper. The test's wrapper is reduced to a one-line delegate that forwards to the staticmethod, so the regression asserts against the exact production logic instead of a parallel copy of it. 2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the variable set in the parent shell, _get_gpu_free_memory() filters the mocked CSV and returns [] or falls through to the torch fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES via monkeypatch.delenv(..., raising=False). 3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec ("loggers"), which returns a spec because studio/backend/loggers/ is on sys.path. But the actual import chain loads loggers/handlers.py which does `from fastapi import Request, Response` at module load. In a lightweight env without fastapi installed, the stub never lands and `from core.inference.llama_cpp import LlamaCppBackend` raises during collection. Switched _maybe_stub to a real import attempt under try / except ImportError so the stub falls into place when the package is discoverable but not importable. CI has fastapi so this is purely a developer- machine ergonomics fix. The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking across tests") was already addressed by 7437e73, which replaced the unconditional sys.modules.setdefault with the find_spec-gated _maybe_stub. No code change needed. Production behaviour is unchanged: _build_windows_path_dirs returns exactly the same ordering start_llama_server used inline ([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]). Verification (run inside studio/backend): pytest tests/test_5106_windows_gpu_detection_mock.py -v -> 10 passed pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py tests/test_5106_windows_gpu_detection_mock.py -q -> 171 passed CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu -> 1 passed

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-14T11:46:00Z

Pushed 7a2140d4 to address the three real findings from the five bot review threads on this PR.

1. codex P2 line 201 + gemini medium line 209 (same concern, dedup'd).

The local _build_path_dirs_like_start_llama_server hand-copied the win32 branch of LlamaCppBackend.start_llama_server, so a future drop or reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) would have passed this regression silently. Extracted a new staticmethod LlamaCppBackend._build_windows_path_dirs(binary_dir, prefix, cuda_path) and rewired both sides:

Production start_llama_server at llama_cpp.py:2436 now calls the helper.
The test's _build_path_dirs_like_start_llama_server is reduced to a one-line delegate that forwards to the same staticmethod, so all 10 cases now assert against the actual production logic.

Same observable PATH ordering as before ([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]).

2. codex P2 line 245.

test_nvidia_smi_probe_reports_synthetic_gpu did not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the variable set in the parent shell, _get_gpu_free_memory() filtered the mocked CSV and returned [] or fell through to the torch fallback. Cleared both CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES via monkeypatch.delenv(..., raising=False). Verified the test still passes with CUDA_VISIBLE_DEVICES=1 set in the parent shell.

3. codex P2 line 66.

_maybe_stub gated on importlib.util.find_spec("loggers"), which returns a spec because studio/backend/loggers/ is on sys.path. But the actual import chain loads loggers/handlers.py which does from fastapi import Request, Response at module load. In a lightweight env without fastapi the stub never landed and from core.inference.llama_cpp import LlamaCppBackend raised during collection. Switched _maybe_stub to a real import_module attempt under try / except ImportError so the stub falls into place when the package is discoverable but not importable. CI has fastapi so this never fired there; this is purely a developer-machine ergonomics fix.

4. codex P1 line 85 (stale).

Already addressed by 7437e735 which replaced the unconditional sys.modules.setdefault("httpx", stub) with the find_spec-gated _maybe_stub. The bot was looking at the pre-fix snapshot.

Verification, run inside studio/backend:

pytest tests/test_5106_windows_gpu_detection_mock.py -v
  -> 10 passed
pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py tests/test_5106_windows_gpu_detection_mock.py -q
  -> 171 passed
CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu
  -> 1 passed

…ai#5423) PR unslothai#5414's drift detectors and the corresponding import_fixes helpers were written against transformers 4.x. The Repo tests (CPU) step (which installs transformers>=4.51,<5.5 and currently resolves to 5.4.0) surfaces three real predicate gaps: * test_pretrained_model_enable_input_require_grads_uses_old_pattern fires DRIFT DETECTED whenever the source contains "for module in self.modules()". But the unsloth replacement that patch_enable_input_require_grads installs ALSO uses that pattern -- deliberately, just wrapped in try / except NotImplementedError. So the predicate cannot distinguish broken upstream from the working patch. Accept either pre-HF#41993 shape (no self.modules() loop) or the post-patch shape (loop + NotImplementedError handler). * test_transformers_torchcodec_available_flag_is_present asserts the pre-5.x module-level _torchcodec_available flag. transformers 5 replaced it with an lru_cache'd is_torchcodec_available() callable. Accept either symbol. Also update disable_torchcodec_if_broken to actually disable on 5.x: clear the cache and rebind the function to return False. Local verification: * transformers 4.57.6 + trl 0.25.1 + peft 0.19.1 + triton 3.5.1 + vllm 0.15.1+cu130 (the Core HF=4.57.6 cell shape): 18 passed. * transformers 5.8.1 + peft 0.19.1 + torch 2.9 CPU (the Repo tests (CPU) shape, drop trl / vllm / datasets / xformers): 12 passed, 6 skipped on missing optional libs, 0 failed. PR unslothai#5376's Repo tests (CPU) failure was a triple: - triton + enable_input_require_grads: fixed by merging current main (PR unslothai#5421's relaxed triton predicate + conftest 'import unsloth'). - torchcodec: fixed by THIS PR.

Brings in PR #5421 (drift detector parity: relaxed triton predicate + conftest 'import unsloth') and PR #5423 (transformers 5.x coverage: enable_input_require_grads + torchcodec) which together close the three DRIFT DETECTED failures on this branch's Repo tests (CPU) cell.

for more information, see https://pre-commit.ci

@staticmethod

Resolve conflict in studio/backend/core/inference/llama_cpp.py: take main's load_model body wrapped in the serial_load_lock (#5401 / #5161) and the #5347 mmproj guard, layered with this PR's two intended edits: - New @staticmethod _build_windows_path_dirs(binary_dir, prefix, cuda_path) next to _windows_pip_nvidia_dll_dirs, returning the win32 PATH ordering binary_dir, pip-nvidia wheels, CUDA_PATH/bin, CUDA_PATH/bin/x64. - start_llama_server's win32 branch now calls the helper instead of re-implementing the same ordering inline. The PR's own install_llama_prebuilt.py nit (f"llama-" -> "llama-") and the new test file under studio/backend/tests/ auto-merged cleanly. Picks up PR #5423's transformers 5.x drift-detector predicate fixes via the merge, which is what was failing on the three Core CI jobs.

…ents - studio/backend/tests/test_5106_windows_gpu_detection_mock.py -> studio/backend/tests/test_windows_gpu_detection_mock.py The file is the generic regression suite for Windows GPU detection; encoding the issue number in the filename is noise. - Shorten module docstring, helper docstrings, per-test docstrings and inline comments in the renamed test file. No behaviour change, all 10 cases still pass. - Shorten the _build_windows_path_dirs docstring in studio/backend/core/inference/llama_cpp.py and update the test-path reference; trim the win32 call-site comment to one line. Local verification: - pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed. - pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py studio/backend/tests/test_llama_server_args.py studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed.

The probe loop in LlamaCppBackend._wait_for_health only caught ConnectError and TimeoutException. On Windows, when llama-server.exe accepts the TCP probe and then dies before sending HTTP headers, the peer process RST closes the socket. httpx maps this to ReadError ("WinError 10054 -- An existing connection was forcibly closed by the remote host"), which fell through the except clause and bubbled out of _wait_for_health, the routes/inference.py load_model handler, and back to /api/inference/load as an opaque 500. The crash diagnostic Studio actually wants to surface lives on the self._process.poll() branch at the top of the loop body: "llama-server exited with code X. Output: ...". We never reached that branch on the WinError 10054 path because the very first probe blew up. Expand the except to also swallow ReadError and RemoteProtocolError so the next 0.5-second iteration runs the poll() branch. Outcomes: * Process really died: structured exit-code + last-stdout log line. * Single transient probe blip: silently retried; load succeeds. Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five cases covering happy-path 200, transient ReadError + dead process, RemoteProtocolError + dead process, ConnectError cycling until success, and dead process before the first probe. The new cases would have failed against the old except clause -- ReadError / RemoteProtocolError would have propagated instead of returning False. Found while triaging the Windows Studio GGUF CI flake on this PR's 5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of launch on the GPU-less runner, and Studio reported "WinError 10054" instead of an upstream-tag-attributable exit-code line.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56537b6978

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T04:25:29Z

+sys.modules.setdefault("loggers", _loggers_stub)
+sys.modules.setdefault("structlog", _types.ModuleType("structlog"))
+
+import httpx  # noqa: E402


Guard the httpx import in the health tests

When this new test file is run by itself in a lightweight backend test environment without httpx installed, collection fails here before the fallback code below can add the exception attributes; I reproduced this with python -m pytest studio/backend/tests/test_llama_cpp_wait_for_health.py ... in the current checkout, which raises ModuleNotFoundError: No module named 'httpx'. Neighboring llama_cpp tests stub httpx before importing the production module for this scenario, so this file should use the same guarded/stubbed import path instead of requiring the full dependency just to collect.

Useful? React with 👍 / 👎.

Resolve conflict in studio/backend/core/inference/llama_cpp.py: _wait_for_health's except clause. Main's #5505 landed essentially the same fix this branch shipped in 56537b6 -- catching httpx.ReadError and RemoteProtocolError to keep a transient TCP RST from masking a real subprocess crash -- and added httpx.WriteError on top. Took main's version (strict superset) and dropped this branch's duplicate. Extend tests/test_llama_cpp_wait_for_health.py: - Add WriteError to the stub-completion list so the file imports cleanly against either real httpx or a sibling test's stub. - Add test_write_error_also_swallowed alongside the existing ReadError and RemoteProtocolError cases. All other files in this merge auto-resolved cleanly (install_llama_prebuilt.py, workflows, etc.). Local verification: - pytest test_llama_cpp_wait_for_health.py + windows_nvidia_path + windows_gpu_detection_mock + llama_server_args -- 116 passed. - pytest test_import_fixes_drift.py -- 18 passed. - pytest test_offline_gguf_cache_fallback.py (main's #5505 addition) -- 44 passed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06fb256e0c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T05:59:28Z

+    sys.path.insert(0, _BACKEND_DIR)
+
+# Match the stubbing pattern in sibling tests so the module imports in
+# a lightweight env without fastapi.


Avoid leaking the fake loggers module

When pytest collects this file before tests that import loggers.handlers (for example studio/backend/tests/test_log_filter_no_truncation.py, which sorts after test_llama_cpp_wait_for_health.py), this unconditional setdefault leaves a plain ModuleType in sys.modules instead of the real loggers package. The later import then fails with 'loggers' is not a package even in environments where the real package is available, because this file never tries to import it before stubbing.

Useful? React with 👍 / 👎.

danielhanchen requested a review from rolandtannous as a code owner May 11, 2026 13:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

5d14f36

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

Merge branch 'main' into dh/test-5106-windows-gpu-ci-mock

8c6de7e

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

danielhanchen and others added 2 commits May 14, 2026 11:45

[pre-commit.ci] auto fixes from pre-commit.com hooks

4bc0aa4

for more information, see https://pre-commit.ci

danielhanchen mentioned this pull request May 14, 2026

import_fixes + drift detectors: cover transformers 5.x drift (unblocks PR #5376) #5423

Merged

4 tasks

danielhanchen and others added 5 commits May 14, 2026 12:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

7e1ea19

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

danielhanchen merged commit a09e70e into main May 18, 2026
35 checks passed

danielhanchen deleted the dh/test-5106-windows-gpu-ci-mock branch May 18, 2026 07:06

		with _mock_nvidia_smi_run(fake_csv):
		gpus = LlamaCppBackend._get_gpu_free_memory()

Uh oh!

Conversation

danielhanchen commented May 11, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants