ggml-webgpu: Check earlier for WebGPU required features#23879
Conversation
ngxson
left a comment
There was a problem hiding this comment.
Btw, there might be another GGML_ASSERT that I think may worth refactoring:
GGML_ASSERT(ctx->webgpu_global_ctx->adapter != nullptr);I think if the adapter is null, that means browser doesn't support webgpu or it's maybe half-working somehow.
Do you think in such case, we can still initialize webgpu backend but returns 0 devices?
|
@ngxson the check in The idea being that if that check passes, then the adapter shouldn't be null later on, otherwise there are probably larger issues. |
|
hmm ok then, let's see if there are any users reported problem about this if necessary, I can intercept browser's API instead, for ex. return an empty adapter instead of a null value |
|
What I'm saying is that we already query for the adapter, so that assertion is just making sure that two consecutive queries don't return conflicting results. Although yes, I suppose there is a case where a user doesn't care about WebGPU, but still ends up crashing because the first request for the adapter succeeds and the second doesn't. I think I would prefer to wait and see if that actually happens though rather than adding the defensive code first, since it's not totally clear what state we'd end up in (my understanding is that llama.cpp assumes a call to |
|
Thanks for working on this fix, it means a lot. My plan now is to do a test build with this change and see if the original Chromium setup avoids the 'abort' and reaches model load/preflight with I'll check if the WebGPU backend reports 0 usable devices when the required features are missing, instead of aborting in Chromium/Linux setup:
|
* origin/master: server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) server-bench : add speed-bench for speculative decoding benchmarking (ggml-org#23869) app: add llama update self updater (ggml-org#23865) ui: handle audio/vnd.wave as audio WAV file (ggml-org#23754)
…wercase * upstream/master: (27 commits) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ui: fix ETag truncation with MSVC compiler (ggml-org#23917) docs : update ZenDNN docs for Q8 support (ggml-org#23791) llama: only use one iGPU device by default (ggml-org#23897) webui: add custom CSS injection via config (ggml-org#23904) Support `-fa auto` in llama-bench (ggml-org#23714) opencl: support bf16 by converting to f16 (ggml-org#23839) ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843) metal : restore im2col implementation for large kernels (ggml-org#23901) test: (test-llama-archs) log the config name first (ggml-org#23885) ci : update ios-xcode release job to macos-26 (ggml-org#23906) ggml : add some lsx support (ggml-org#23798) vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420) ci : fix s390x release job (ggml-org#23898) ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895) llama : do not skip iGPU when only RPC devices are present (ggml-org#23868) server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) ... # Conflicts: # gguf-py/gguf/vocab.py # src/llama-vocab.cpp
Overview
Check for required features earlier during
ggml_backend_webgpu_regso that 0 devices are reported and avoid aborts duringcreate_webgpu_device(). Should fix #23844Also have
create_webgpu_device()be avoidfunction, since we weren't checking anything on the result anyways.Requirements