Skip to content

ggml-webgpu: Check earlier for WebGPU required features#23879

Merged
reeselevine merged 1 commit into
ggml-org:masterfrom
reeselevine:no-abort
May 29, 2026
Merged

ggml-webgpu: Check earlier for WebGPU required features#23879
reeselevine merged 1 commit into
ggml-org:masterfrom
reeselevine:no-abort

Conversation

@reeselevine

Copy link
Copy Markdown
Contributor

Overview

Check for required features earlier during ggml_backend_webgpu_reg so that 0 devices are reported and avoid aborts during create_webgpu_device(). Should fix #23844

Also have create_webgpu_device() be a void function, since we weren't checking anything on the result anyways.

Requirements

@reeselevine reeselevine requested a review from a team as a code owner May 29, 2026 17:45
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels May 29, 2026

@ngxson ngxson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, there might be another GGML_ASSERT that I think may worth refactoring:

GGML_ASSERT(ctx->webgpu_global_ctx->adapter != nullptr);

I think if the adapter is null, that means browser doesn't support webgpu or it's maybe half-working somehow.

Do you think in such case, we can still initialize webgpu backend but returns 0 devices?

@reeselevine

Copy link
Copy Markdown
Contributor Author

@ngxson the check in ggml_backend_webgpu_reg ensures the adapter is not null too.

https://github.com/reeselevine/llama.cpp/blob/40036ed480d12c90fe4e4c9e1805126c9b1adaa6/ggml/src/ggml-webgpu/ggml-webgpu.cpp#L4509-L4515

The idea being that if that check passes, then the adapter shouldn't be null later on, otherwise there are probably larger issues.

@ngxson

ngxson commented May 29, 2026

Copy link
Copy Markdown
Collaborator

hmm ok then, let's see if there are any users reported problem about this

if necessary, I can intercept browser's API instead, for ex. return an empty adapter instead of a null value

@reeselevine

Copy link
Copy Markdown
Contributor Author

What I'm saying is that we already query for the adapter, so that assertion is just making sure that two consecutive queries don't return conflicting results.

Although yes, I suppose there is a case where a user doesn't care about WebGPU, but still ends up crashing because the first request for the adapter succeeds and the second doesn't. I think I would prefer to wait and see if that actually happens though rather than adding the defensive code first, since it's not totally clear what state we'd end up in (my understanding is that llama.cpp assumes a call to get_device succeeds).

@BluuHuup

BluuHuup commented May 29, 2026

Copy link
Copy Markdown

Thanks for working on this fix, it means a lot.

My plan now is to do a test build with this change and see if the original Chromium setup avoids the 'abort' and reaches model load/preflight with n_gpu_layers: 0 .

I'll check if the WebGPU backend reports 0 usable devices when the required features are missing, instead of aborting in create_webgpu_device() - and if the CPU path then continues.


Chromium/Linux setup:

  • Chromium 148.0.0.0
  • WebGPU present
  • adapter available
  • shader-f16: false
  • wllama: 3.1.1
  • n_gpu_layers: 0
  • small GGUF: Qwen2.5-0.5B-Instruct-Q4_K_M.gguf

@reeselevine reeselevine merged commit 151f3a9 into ggml-org:master May 29, 2026
16 of 31 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 29, 2026
* origin/master:
server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884)
ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879)
ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760)
server-bench : add speed-bench for speculative decoding benchmarking (ggml-org#23869)
app: add llama update self updater (ggml-org#23865)
ui: handle audio/vnd.wave as audio WAV file (ggml-org#23754)
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
o7si added a commit to o7si/llama.cpp that referenced this pull request May 31, 2026
…wercase

* upstream/master: (27 commits)
  vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756)
  ui: fix ETag truncation with MSVC compiler (ggml-org#23917)
  docs : update ZenDNN docs for Q8 support (ggml-org#23791)
  llama: only use one iGPU device by default (ggml-org#23897)
  webui: add custom CSS injection via config (ggml-org#23904)
  Support `-fa auto` in llama-bench (ggml-org#23714)
  opencl: support bf16 by converting to f16 (ggml-org#23839)
  ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910)
  TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843)
  metal : restore im2col implementation for large kernels (ggml-org#23901)
  test: (test-llama-archs) log the config name first (ggml-org#23885)
  ci : update ios-xcode release job to macos-26 (ggml-org#23906)
  ggml : add some lsx support (ggml-org#23798)
  vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420)
  ci : fix s390x release job (ggml-org#23898)
  ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895)
  llama : do not skip iGPU when only RPC devices are present (ggml-org#23868)
  server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884)
  ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879)
  ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760)
  ...

# Conflicts:
#	gguf-py/gguf/vocab.py
#	src/llama-vocab.cpp
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning WebGPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ggml Chromium WebGPU ShaderF16 error/assertion stops CPU/WASM fallback when n_gpu_layers=0

5 participants