Skip to content

tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test#5376

Merged
danielhanchen merged 12 commits into
mainfrom
dh/test-5106-windows-gpu-ci-mock
May 18, 2026
Merged

tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test#5376
danielhanchen merged 12 commits into
mainfrom
dh/test-5106-windows-gpu-ci-mock

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

End-to-end regression test that locks in the Windows GPU detection fix shipped by #5322 + #5324. Runs as a normal pytest in studio/backend/tests/ (Ubuntu pytest matrix in studio-backend-ci.yml); also touches studio/install_llama_prebuilt.py so the Windows update / inference smoke CI runs on this PR as a post-merge sanity check.

Why we need a synthetic test even after the two fixes landed: the failure mode is on Windows hosts without a system CUDA toolkit. CI runners do not have GPUs and nvidia-smi is absent. This test mocks nvidia-smi output and packs a synthetic Windows layout that mirrors real artefacts, so any future regression in the win32 PATH builder or the paired cudart download is caught without needing a real GPU.

What is asserted, against the realistic synthetic layout:

The mocks reflect real artefact contents verified empirically:

  • pip download nvidia-cuda-runtime --platform win_amd64 produces nvidia/cu13/bin/x86_64/cudart64_13.dll.
  • unzip on the upstream b9103/cudart-llama-bin-win-cuda-13.1-x64.zip produces exactly cudart64_13.dll + cublas64_13.dll + cublasLt64_13.dll and nothing else.
  • objdump -p on the upstream b9103/ggml-cuda.dll shows a static PE import on cublas64_13.dll, which is the root cause of [Bug] Not Unsloth Studio detects my GPU But not use but only uses CPU/RAM #5106 when cublas64_13.dll is unreachable.

Housekeeping: one lint nit dropped from #5322's windows_cuda_attempts (f"llama-" -> "llama-"; no behaviour change).

Test plan

  • pytest studio/backend/tests/test_5106_windows_gpu_detection_mock.py -- 10 new cases pass.
  • pytest studio/backend/tests/test_llama_cpp_*.py studio/backend/tests/test_llama_server_args.py tests/studio/install/{test_install_llama_prebuilt_logic,test_selection_logic,test_pr4562_bugfixes,test_llama_pr_force_and_source}.py -- 424 passed, no regressions.
  • Windows smoke CI (update + inference) on this PR -- triggered by the studio/install_llama_prebuilt.py touch and the new file under studio/backend/**.

Refs #5106 #5322 #5324

Locks in the combined fix from #5322 + #5324 with a synthetic
Windows scenario that CI runners without GPUs can execute. The
test packs the real PyPI win_amd64 wheel layouts (cu12 modular and
the new unsuffixed cu13 nvidia/cu13/bin/x86_64 layout) plus the
exact filename set of the upstream b9103 cudart-llama-bin-win-cuda
bundles, then mocks nvidia-smi output and asserts that:

 * Studio's nvidia-smi probe parses the CSV and reports the GPU.
 * After PR #5322 the install_dir/build/bin/Release/ tree contains
   all three cudart bundle DLLs alongside llama-server.exe.
 * After PR #5324 the PATH built by start_llama_server's win32
   branch lists pip nvidia + torch/lib dirs in addition to the
   binary_dir.
 * cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll are
   each reachable from at least one PATH entry, with cudart
   specifically reachable from BOTH the install dir and a pip
   nvidia dir (defence in depth).
 * Bare venvs without pip nvidia wheels still work via #5322's
   binary_dir drop; pre-#5322 installs still work via #5324's
   PATH augmentation.
 * A reconstructed pre-PR scenario (cudart absent from binary_dir
   and pip dirs not on PATH) leaves cudart unreachable, confirming
   the test would catch a future regression.

Bonus housekeeping in studio/install_llama_prebuilt.py: drop the
pointless f-prefix on the literal "llama-" in the
windows_cuda_attempts pairing guard (no behaviour change; lint
nit flagged in the post-merge review).

The mocks model real artifact contents I verified empirically:
 * pip download nvidia-cuda-runtime --platform win_amd64
   produces nvidia/cu13/bin/x86_64/cudart64_13.dll.
 * unzip on the b9103 cudart-llama-bin-win-cuda-13.1-x64.zip
   produces exactly cudart64_13.dll + cublas64_13.dll +
   cublasLt64_13.dll, no executables.
 * objdump -p on the b9103 ggml-cuda.dll shows a static PE
   import on cublas64_13.dll (the root cause of #5106 when
   cublas64_13.dll is unreachable).

Refs #5106 #5322 #5324

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d14f36613

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"__exit__": lambda self, *a: None,
},
)
sys.modules.setdefault("httpx", _httpx_stub)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep the httpx stub from leaking across tests

When this new test file is collected before the rest of studio/backend/tests (its test_5106... name makes that the normal alphabetical order), this setdefault installs the minimal fake httpx module into global sys.modules before any test has imported the real library. Later tests and backend modules then receive this fake instead of real httpx, but the fake lacks APIs such as Request, RequestError, AsyncClient, and Limits, so suites that include passthrough/inference tests can fail or exercise the wrong module. Please mirror the existing guarded pattern that first tries to import real httpx, or scope/remove the stub after importing LlamaCppBackend.

Useful? React with 👍 / 👎.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive end-to-end regression test for Windows GPU detection issues, specifically addressing cases where CUDA DLLs were not correctly resolved. It also includes a minor cleanup in the installation script to remove an unnecessary f-string. The reviewer pointed out that the test duplicates PATH-building logic from the production code, which creates a maintenance risk; they recommend refactoring this logic into a shared helper method within the LlamaCppBackend class to ensure consistency.

Comment on lines +164 to +182
def _build_path_dirs_like_start_llama_server(
binary_dir: Path, prefix: Path, cuda_path: str = ""
) -> list[str]:
"""Faithful reproduction of the win32 branch in
LlamaCppBackend.start_llama_server. Returns the ordered list of
PATH entries we prepend to the inherited env. Production code:
studio/backend/core/inference/llama_cpp.py:2340-2363.
"""
pip_dirs = LlamaCppBackend._windows_pip_nvidia_dll_dirs(str(prefix))
path_dirs = [str(binary_dir)]
path_dirs.extend(pip_dirs)
if cuda_path:
cuda_bin = os.path.join(cuda_path, "bin")
if os.path.isdir(cuda_bin):
path_dirs.append(cuda_bin)
cuda_bin_x64 = os.path.join(cuda_path, "bin", "x64")
if os.path.isdir(cuda_bin_x64):
path_dirs.append(cuda_bin_x64)
return path_dirs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is a faithful reproduction of logic in LlamaCppBackend.start_llama_server, which introduces code duplication. This creates a maintenance risk, as changes in the production code might not be reflected here, leading to the test becoming outdated. To improve maintainability and adhere to the DRY principle, consider refactoring the PATH-building logic from LlamaCppBackend.start_llama_server into its own helper method within the LlamaCppBackend class. This new method could then be called from both start_llama_server and this test file, ensuring consistency and eliminating the duplicated code.

References
  1. Centralize recurring or complex logical checks (like determining if a host is external) into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

This file's name sorts before every other file in studio/backend/tests/
(starts with the digit '5'), so pytest collects it first. The previous
``sys.modules.setdefault("httpx", _httpx_stub)`` ran before any other
test imported real httpx, which meant the stub permanently shadowed
the real module for the rest of the collection. Tests that did
``from httpx import HTTPError, Response`` (test_anthropic_messages,
test_browse_folders_route, test_training_*, etc) then failed at
collection with ``ImportError: cannot import name 'HTTPError'``
because the stub did not define those names. The existing
test_llama_cpp_windows_nvidia_path.py did not trigger the same issue
because it sorts after test_a* / test_b* / etc, by which point the
real httpx has already been imported and setdefault is a no-op.

Switch the stub installation to ``importlib.util.find_spec(name) is
None`` so we only fall back to the stub when the real module truly is
not installed. Backend CI installs httpx, structlog, and the
studio/backend/loggers package is reachable via the sys.path
augmentation a few lines above, so on CI all three find_spec calls
succeed and no stubs are installed at all.

Also add HTTPError and Response to the stub module for the offline
case, so anyone running this test outside CI with httpx absent still
gets a stub that satisfies the broader test suite's imports.

Refs #5106

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7437e73508

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +64 to +66
def _maybe_stub(name: str, builder):
if _importlib_util.find_spec(name) is None:
sys.modules[name] = builder()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stub local loggers before importing llama_cpp

In lightweight test runs that do not install FastAPI, this guard still lets the real studio/backend/loggers package load because _BACKEND_DIR is already on sys.path, so find_spec("loggers") succeeds even though loggers/handlers.py immediately imports fastapi. The new test then fails during collection before any case runs; neighboring llama_cpp tests avoid this by stubbing the local loggers module unconditionally while only being careful not to shadow real third-party modules like httpx.

Useful? React with 👍 / 👎.

Comment on lines +199 to +201
pip_dirs = LlamaCppBackend._windows_pip_nvidia_dll_dirs(str(prefix))
path_dirs = [str(binary_dir)]
path_dirs.extend(pip_dirs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exercise the real PATH assembly instead

Because this helper hand-copies the start_llama_server Windows PATH logic, the regression test will still pass if the production win32 branch later drops or reorders self._windows_pip_nvidia_dll_dirs(sys.prefix): the assertions only validate this duplicate implementation plus the resolver. For the #5324 regression this PR is meant to lock down, the test needs to observe the actual environment passed by start_llama_server (for example by mocking subprocess.Popen) rather than rebuilding path_dirs here.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

CI green: 33 success, 0 failures, 0 pending on 7437e735. Ready for review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c6de7e0ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +244 to +245
with _mock_nvidia_smi_run(fake_csv):
gpus = LlamaCppBackend._get_gpu_free_memory()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin CUDA_VISIBLE_DEVICES for the mocked nvidia-smi probe

In test environments that inherit CUDA_VISIBLE_DEVICES (for example CUDA_VISIBLE_DEVICES=1 on a shared GPU runner, or an empty mask), _get_gpu_free_memory() filters the mocked nvidia-smi rows before returning them, so this test can return [] or fall through to the torch fallback instead of [(0, 22805)]. Please clear the variable or set it to 0 for this probe so the synthetic CSV is independent of the parent environment.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 14, 2026 11:45
…ssion test

Follow-up to PR #5376's review feedback. Three real findings from the
bot reviewers, plus one stale one.

1. (codex P2 line 201, gemini medium line 209) The regression test's
   _build_path_dirs_like_start_llama_server hand-copied the win32
   branch of LlamaCppBackend.start_llama_server, so a future drop or
   reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) in production
   would have passed the test silently.

   Extract a new staticmethod LlamaCppBackend._build_windows_path_dirs
   (binary_dir, prefix, cuda_path). Production start_llama_server now
   calls this helper. The test's wrapper is reduced to a one-line
   delegate that forwards to the staticmethod, so the regression
   asserts against the exact production logic instead of a parallel
   copy of it.

2. (codex P2 line 245) test_nvidia_smi_probe_reports_synthetic_gpu did
   not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the
   variable set in the parent shell, _get_gpu_free_memory() filters
   the mocked CSV and returns [] or falls through to the torch
   fallback. Cleared CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES
   via monkeypatch.delenv(..., raising=False).

3. (codex P2 line 66) _maybe_stub gated on importlib.util.find_spec
   ("loggers"), which returns a spec because studio/backend/loggers/
   is on sys.path. But the actual import chain loads
   loggers/handlers.py which does `from fastapi import Request,
   Response` at module load. In a lightweight env without fastapi
   installed, the stub never lands and `from core.inference.llama_cpp
   import LlamaCppBackend` raises during collection. Switched
   _maybe_stub to a real import attempt under try / except ImportError
   so the stub falls into place when the package is discoverable but
   not importable. CI has fastapi so this is purely a developer-
   machine ergonomics fix.

The fourth comment (codex P1 line 85 "Keep the httpx stub from leaking
across tests") was already addressed by 7437e73, which replaced the
unconditional sys.modules.setdefault with the find_spec-gated
_maybe_stub. No code change needed.

Production behaviour is unchanged: _build_windows_path_dirs returns
exactly the same ordering start_llama_server used inline
([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]).

Verification (run inside studio/backend):
  pytest tests/test_5106_windows_gpu_detection_mock.py -v
    -> 10 passed
  pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py
       tests/test_5106_windows_gpu_detection_mock.py -q
    -> 171 passed
  CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu
    -> 1 passed
@danielhanchen

Copy link
Copy Markdown
Member Author

Pushed 7a2140d4 to address the three real findings from the five bot review threads on this PR.

1. codex P2 line 201 + gemini medium line 209 (same concern, dedup'd).

The local _build_path_dirs_like_start_llama_server hand-copied the win32 branch of LlamaCppBackend.start_llama_server, so a future drop or reorder of _windows_pip_nvidia_dll_dirs(sys.prefix) would have passed this regression silently. Extracted a new staticmethod LlamaCppBackend._build_windows_path_dirs(binary_dir, prefix, cuda_path) and rewired both sides:

  • Production start_llama_server at llama_cpp.py:2436 now calls the helper.
  • The test's _build_path_dirs_like_start_llama_server is reduced to a one-line delegate that forwards to the same staticmethod, so all 10 cases now assert against the actual production logic.

Same observable PATH ordering as before ([binary_dir, *pip_dirs, cuda_bin?, cuda_bin_x64?]).

2. codex P2 line 245.

test_nvidia_smi_probe_reports_synthetic_gpu did not clear CUDA_VISIBLE_DEVICES. On a shared GPU runner with the variable set in the parent shell, _get_gpu_free_memory() filtered the mocked CSV and returned [] or fell through to the torch fallback. Cleared both CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES via monkeypatch.delenv(..., raising=False). Verified the test still passes with CUDA_VISIBLE_DEVICES=1 set in the parent shell.

3. codex P2 line 66.

_maybe_stub gated on importlib.util.find_spec("loggers"), which returns a spec because studio/backend/loggers/ is on sys.path. But the actual import chain loads loggers/handlers.py which does from fastapi import Request, Response at module load. In a lightweight env without fastapi the stub never landed and from core.inference.llama_cpp import LlamaCppBackend raised during collection. Switched _maybe_stub to a real import_module attempt under try / except ImportError so the stub falls into place when the package is discoverable but not importable. CI has fastapi so this never fired there; this is purely a developer-machine ergonomics fix.

4. codex P1 line 85 (stale).

Already addressed by 7437e735 which replaced the unconditional sys.modules.setdefault("httpx", stub) with the find_spec-gated _maybe_stub. The bot was looking at the pre-fix snapshot.

Verification, run inside studio/backend:

pytest tests/test_5106_windows_gpu_detection_mock.py -v
  -> 10 passed
pytest tests/test_llama_cpp_*.py tests/test_llama_server_args.py tests/test_5106_windows_gpu_detection_mock.py -q
  -> 171 passed
CUDA_VISIBLE_DEVICES=1 pytest tests/test_5106_windows_gpu_detection_mock.py::TestWindowsGpuDetectionAfter5106Fix::test_nvidia_smi_probe_reports_synthetic_gpu
  -> 1 passed

pull Bot pushed a commit to steelburn/unsloth that referenced this pull request May 14, 2026
…ai#5423)

PR unslothai#5414's drift detectors and the corresponding import_fixes helpers
were written against transformers 4.x. The Repo tests (CPU) step (which
installs transformers>=4.51,<5.5 and currently resolves to 5.4.0)
surfaces three real predicate gaps:

  * test_pretrained_model_enable_input_require_grads_uses_old_pattern
    fires DRIFT DETECTED whenever the source contains
    "for module in self.modules()". But the unsloth replacement that
    patch_enable_input_require_grads installs ALSO uses that pattern --
    deliberately, just wrapped in try / except NotImplementedError. So
    the predicate cannot distinguish broken upstream from the working
    patch. Accept either pre-HF#41993 shape (no self.modules() loop) or
    the post-patch shape (loop + NotImplementedError handler).

  * test_transformers_torchcodec_available_flag_is_present asserts the
    pre-5.x module-level _torchcodec_available flag. transformers 5
    replaced it with an lru_cache'd is_torchcodec_available() callable.
    Accept either symbol. Also update disable_torchcodec_if_broken to
    actually disable on 5.x: clear the cache and rebind the function to
    return False.

Local verification:

  * transformers 4.57.6 + trl 0.25.1 + peft 0.19.1 + triton 3.5.1 +
    vllm 0.15.1+cu130 (the Core HF=4.57.6 cell shape): 18 passed.
  * transformers 5.8.1 + peft 0.19.1 + torch 2.9 CPU (the Repo tests
    (CPU) shape, drop trl / vllm / datasets / xformers): 12 passed,
    6 skipped on missing optional libs, 0 failed.

PR unslothai#5376's Repo tests (CPU) failure was a triple:
  - triton + enable_input_require_grads: fixed by merging current main
    (PR unslothai#5421's relaxed triton predicate + conftest 'import unsloth').
  - torchcodec: fixed by THIS PR.
danielhanchen and others added 5 commits May 14, 2026 12:14
Brings in PR #5421 (drift detector parity: relaxed triton predicate +
conftest 'import unsloth') and PR #5423 (transformers 5.x coverage:
enable_input_require_grads + torchcodec) which together close the three
DRIFT DETECTED failures on this branch's Repo tests (CPU) cell.
Resolve conflict in studio/backend/core/inference/llama_cpp.py: take main's
load_model body wrapped in the serial_load_lock (#5401 / #5161) and the
#5347 mmproj guard, layered with this PR's two intended edits:

- New @staticmethod _build_windows_path_dirs(binary_dir, prefix, cuda_path)
  next to _windows_pip_nvidia_dll_dirs, returning the win32 PATH ordering
  binary_dir, pip-nvidia wheels, CUDA_PATH/bin, CUDA_PATH/bin/x64.
- start_llama_server's win32 branch now calls the helper instead of
  re-implementing the same ordering inline.

The PR's own install_llama_prebuilt.py nit (f"llama-" -> "llama-") and the
new test file under studio/backend/tests/ auto-merged cleanly.

Picks up PR #5423's transformers 5.x drift-detector predicate fixes via the
merge, which is what was failing on the three Core CI jobs.
…ents

- studio/backend/tests/test_5106_windows_gpu_detection_mock.py
  -> studio/backend/tests/test_windows_gpu_detection_mock.py
  The file is the generic regression suite for Windows GPU detection;
  encoding the issue number in the filename is noise.
- Shorten module docstring, helper docstrings, per-test docstrings and
  inline comments in the renamed test file. No behaviour change,
  all 10 cases still pass.
- Shorten the _build_windows_path_dirs docstring in
  studio/backend/core/inference/llama_cpp.py and update the test-path
  reference; trim the win32 call-site comment to one line.

Local verification:
- pytest studio/backend/tests/test_windows_gpu_detection_mock.py -- 10 passed.
- pytest studio/backend/tests/test_llama_cpp_windows_nvidia_path.py
  studio/backend/tests/test_llama_server_args.py
  studio/backend/tests/test_windows_gpu_detection_mock.py -- 110 passed.
The probe loop in LlamaCppBackend._wait_for_health only caught
ConnectError and TimeoutException. On Windows, when llama-server.exe
accepts the TCP probe and then dies before sending HTTP headers, the
peer process RST closes the socket. httpx maps this to ReadError
("WinError 10054 -- An existing connection was forcibly closed by the
remote host"), which fell through the except clause and bubbled out of
_wait_for_health, the routes/inference.py load_model handler, and back
to /api/inference/load as an opaque 500.

The crash diagnostic Studio actually wants to surface lives on the
self._process.poll() branch at the top of the loop body: "llama-server
exited with code X. Output: ...". We never reached that branch on the
WinError 10054 path because the very first probe blew up.

Expand the except to also swallow ReadError and RemoteProtocolError so
the next 0.5-second iteration runs the poll() branch. Outcomes:
  * Process really died: structured exit-code + last-stdout log line.
  * Single transient probe blip: silently retried; load succeeds.

Adds studio/backend/tests/test_llama_cpp_wait_for_health.py with five
cases covering happy-path 200, transient ReadError + dead process,
RemoteProtocolError + dead process, ConnectError cycling until success,
and dead process before the first probe. The new cases would have
failed against the old except clause -- ReadError / RemoteProtocolError
would have propagated instead of returning False.

Found while triaging the Windows Studio GGUF CI flake on this PR's
5a6ddc3 push: llama-server.exe (b9203 prebuilt) crashed within 2.2 s of
launch on the GPU-less runner, and Studio reported "WinError 10054"
instead of an upstream-tag-attributable exit-code line.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56537b6978

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sys.modules.setdefault("loggers", _loggers_stub)
sys.modules.setdefault("structlog", _types.ModuleType("structlog"))

import httpx # noqa: E402

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard the httpx import in the health tests

When this new test file is run by itself in a lightweight backend test environment without httpx installed, collection fails here before the fallback code below can add the exception attributes; I reproduced this with python -m pytest studio/backend/tests/test_llama_cpp_wait_for_health.py ... in the current checkout, which raises ModuleNotFoundError: No module named 'httpx'. Neighboring llama_cpp tests stub httpx before importing the production module for this scenario, so this file should use the same guarded/stubbed import path instead of requiring the full dependency just to collect.

Useful? React with 👍 / 👎.

Resolve conflict in studio/backend/core/inference/llama_cpp.py:
_wait_for_health's except clause. Main's #5505 landed essentially the
same fix this branch shipped in 56537b6 -- catching httpx.ReadError
and RemoteProtocolError to keep a transient TCP RST from masking a
real subprocess crash -- and added httpx.WriteError on top. Took main's
version (strict superset) and dropped this branch's duplicate.

Extend tests/test_llama_cpp_wait_for_health.py:
- Add WriteError to the stub-completion list so the file imports
  cleanly against either real httpx or a sibling test's stub.
- Add test_write_error_also_swallowed alongside the existing ReadError
  and RemoteProtocolError cases.

All other files in this merge auto-resolved cleanly (install_llama_prebuilt.py,
workflows, etc.).

Local verification:
- pytest test_llama_cpp_wait_for_health.py + windows_nvidia_path +
  windows_gpu_detection_mock + llama_server_args -- 116 passed.
- pytest test_import_fixes_drift.py -- 18 passed.
- pytest test_offline_gguf_cache_fallback.py (main's #5505 addition) -- 44 passed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06fb256e0c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sys.path.insert(0, _BACKEND_DIR)

# Match the stubbing pattern in sibling tests so the module imports in
# a lightweight env without fastapi.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid leaking the fake loggers module

When pytest collects this file before tests that import loggers.handlers (for example studio/backend/tests/test_log_filter_no_truncation.py, which sorts after test_llama_cpp_wait_for_health.py), this unconditional setdefault leaves a plain ModuleType in sys.modules instead of the real loggers package. The later import then fails with 'loggers' is not a package even in environments where the real package is available, because this file never tries to import it before stubbing.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen merged commit a09e70e into main May 18, 2026
35 checks passed
@danielhanchen danielhanchen deleted the dh/test-5106-windows-gpu-ci-mock branch May 18, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants