Skip to content

fix(llamacpp-upstream): bump Windows CUDA-13 backend 13.1 → 13.3 (fixes "Failed to download GPU backend")#42

Merged
Vect0rM merged 1 commit into
AtomicBot-ai:mainfrom
danyurkin:fix/cuda-13-3-backend-download
Jun 3, 2026
Merged

fix(llamacpp-upstream): bump Windows CUDA-13 backend 13.1 → 13.3 (fixes "Failed to download GPU backend")#42
Vect0rM merged 1 commit into
AtomicBot-ai:mainfrom
danyurkin:fix/cuda-13-3-backend-download

Conversation

@danyurkin

Copy link
Copy Markdown

Problem

Since 1.1.95, Windows users on CUDA 13 hosts get "Failed to download GPU backend". The "Better Configuration Available → CUDA 13" prompt offers a backend that 404s on download.

Root cause: ggml-org/llama.cpp renamed the Windows CUDA-13 release asset from llama-{tag}-bin-win-cuda-13.1-x64.zip to ...-cuda-13.3-x64.zip (visible on release b9495). The llamacpp-upstream provider hardcodes the 13.1 minor as the canonical backend id, so:

Observed download attempt from a reporter (1.1.95):

Downloading backend b9495/win-cuda-13.1-x64 from
https://github.com/ggml-org/llama.cpp/releases/download/b9495/llama-b9495-bin-win-cuda-13.1-x64.zip
Error: Failed to get file size: HTTP status 404 Not Found

The release only ships llama-b9495-bin-win-cuda-13.3-x64.zip.

Fix (hotfix)

Bump the canonical Windows CUDA-13 minor 13.1 → 13.3 everywhere it's treated as an identity string, across the TS extension and the Rust plugin:

  • recommendation tier (detectIdealBackendType), static variant list, remote-asset whitelist
  • cudart sidecar filename + toolkit-version maps + backend regex
  • category matcher + priority list + is_cuda_installed cudart lib lookup
  • map_old_backend_to_new canonical target

Back-compat: legacy cuda-13.1 is still recognized by the category matcher and is migrated forward to 13.3 by map_old_backend_to_new, so an already-installed win-cuda-13.1-x64 folder is not orphaned. Updated the affected Rust unit tests accordingly.

Out of scope / follow-ups

  • Driver gate unchanged. The CUDA-13 gate (>= 581.15) is the documented minimum for Toolkit 13.1; CUDA 13.3's minimum Windows driver should be verified separately. Low risk here because the runtime device probe already degrades to 12.4 / Vulkan / CPU when a recommended tier can't enumerate a GPU.
  • Root-cause hardening (so this never recurs on the next upstream minor bump) is tracked separately — see the companion issue: resolve the CUDA-13.x asset name dynamically from the release instead of hardcoding the minor.

Testing

  • CI (TS typecheck/vitest + cargo test for tauri-plugin-llamacpp-upstream)
  • Manual: on a Windows CUDA-13 host, accept the "CUDA 13" upgrade prompt and confirm the backend downloads (no 404).

🤖 Generated with Claude Code

ggml-org/llama.cpp renamed the Windows CUDA-13 release asset from
`...-bin-win-cuda-13.1-x64.zip` to `...-bin-win-cuda-13.3-x64.zip`
(observed on release b9495). The upstream provider hardcoded the
`13.1` minor as the canonical backend id, so on >= 1.1.95:

  - detectIdealBackendType() recommends `win-cuda-13.1-x64` to every
    CUDA-13 host ("Better Configuration Available → CUDA 13"), and
  - getBackendDownloadUrl() resolves it to
    `llama-b9495-bin-win-cuda-13.1-x64.zip` → HTTP 404
    ("Failed to download GPU backend").

Bump the canonical Windows CUDA-13 minor to 13.3 across the TS
extension and the Rust plugin (recommendation, static variant list,
remote whitelist, cudart sidecar filename, toolkit-version map,
category matcher, priority list, cudart lib lookup). Legacy
`cuda-13.1` is still recognized by the category matcher and migrated
forward by map_old_backend_to_new(), so already-installed
`win-cuda-13.1-x64` folders are not orphaned.

Note: the CUDA-13 driver gate (>= 581.15) is left unchanged — it is
the documented minimum for Toolkit 13.1; CUDA 13.3's minimum driver
should be verified as a follow-up. The runtime device probe already
degrades to 12.4 / Vulkan / CPU if a recommended tier cannot
enumerate a GPU.

Fixes the "Failed to download GPU backend" report on Windows CUDA 13.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@danyurkin

Copy link
Copy Markdown
Author

Root-cause hardening follow-up tracked in #43 (resolve the CUDA-13 asset minor dynamically so this can't recur on the next ggml-org bump).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants