Skip to content

ggml Chromium WebGPU ShaderF16 error/assertion stops CPU/WASM fallback when n_gpu_layers=0 #23844

@BluuHuup

Description

@BluuHuup

Hey,


I hit this snag through wllama but the native abort seems to originate from ggml-webgpu.cpp, so I am reporting it here after ngxson suggested to go upstream with this.

CC @reeselevine


I'm working (with the help of codex) on a "simple", offline-first, CPU-only, educational PWA and I've run into a bit of a doozy with the way wllama loads the model in Chromium, i.e. Chromium fails during model setup before preflight/generation.

Chromium logs show a 'ShaderF16 WebGPU' assertion during wllamaStart() when the actual app is set up with n_gpu_layers: 0.
It makes sense to expect some 'CPU/WASM' operation or a 'no WebGPU device' statement when GPU layers are disabled, right?

Instead, Chromium throws this error:

/source/llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:3699:
GGML_ASSERT(ctx->webgpu_global_ctx->adapter.HasFeature(wgpu::FeatureName::ShaderF16)) failed

Then it shows:

Received abort signal from llama.cpp; Message: (empty)
Aborted()
Cannot find waiting task with callbackId = ...

btw, the same app/model path works in Firefox for both the online mode and the offline cached Model mode - so I think that the model, app shell, service worker, and cache path are not the root issue.

Here's a rundown of a codex-assisted session that traced the timing of WebGPU/backend init relative to model load params:

- `wllamaStart()` is called before the normal load action.
- Native backend init happens through `wllama_start()` / `llama_backend_init()`.
- Load params are converted for the load action after startup.
- `n_gpu_layers` does not appear to gate early backend init.
- JS source does not explicitly select WebGPU before load.
- No public supported option was found to disable WebGPU/backend registration before `wllamaStart()`.
- Default V3.1 wasm appears built with WebGPU enabled.
- An internal-looking build-wasm/wllama.wasm artifact appeared CPU-only / GGML_WEBGPU=OFF by metadata, strings/size, but it failed with JS-wrapper/import-object incompatibility and is not a supported runtime path.

Questions

  • is shader-f16 currently a hard requirement for the ggml WebGPU backend?
  • where should CPU fallback for this case be handled: llama.cpp/ggml, wllama, or both?
  • should the WebGPU backend assert when the adapter lacks shader-f16, or should it return a recoverable 'backend-unavailable' error?
  • when no GPU layers are requested through n_gpu_layers: 0, should WebGPU backend/device initialization still happen?
  • is there a supported way to force CPU/WASM-only behavior in Chromium, disable WebGPU entirely? i.e. is there a supported way for browser/WASM callers to disable WebGPU backend registration entirely and use CPU/WASM only?
  • or is this expected behavior, a docs ambiguity, some wllama bug, a ggml/llama.cpp WebGPU backend issue, or a browser thing only?
  • should I just look into doing my own local, CPU-only wllama/llama.cpp build, or maybe some kind of runtime switch?

Thanks in advance for any pointers, help I can get.


Environment

  • OS: EndeavourOS / Arch Linux, BUILD_ID=2025.03.19
  • Kernel: Linux 6.18.33-1-lts x86_64
  • Desktop/session: KDE Plasma / Wayland
  • Device class: Lenovo Legion Y530 laptop
  • CPU: Intel i7-8750H
  • RAM: 32 GB
  • GPUs:
    • Intel UHD Graphics 630 / Mesa 26.1.1-arch1.2
    • NVIDIA GeForce GTX 1060 / NVIDIA driver 580.126.09
  • Chromium: 148.0.7778.178 Arch Linux
  • Firefox: 151.0.2
  • @wllama/wllama: dependency ^3.1.1, locked/installed 3.1.1
  • App stack: Vite PWA / TypeScript
  • Model: Qwen2.5-0.5B-Instruct-Q4_K_M.gguf

Current wllama setup

Current imports/module map/constructor/load params:

import { LogLevel, Wllama, type Model } from "@wllama/wllama";
import wllamaWasmUrl from "@wllama/wllama/esm/wasm/wllama.wasm?url";

const wllama = new Wllama(
  { default: wllamaWasmUrl },
  { allowOffline: true, suppressNativeLog: false },
);

const loadModelParams = {
  n_ctx: 256,
  n_batch: 64,
  n_gpu_layers: 0,
  log_level: LogLevel.DEBUG,
  progressCallback: (...),
};

The online URL path calls:

await wllama.loadModelFromUrl(localAiAbsoluteModelUrl, {
  ...loadModelParams,
  useCache: true,
});

The offline cached Model path calls:

await wllama.loadModel(cachedModel, loadModelParams);

Both paths use n_gpu_layers: 0.


Observed behavior

Chromium diagnostics before model load:

  • WebGPU: present

  • wllama.isSupportWebGPU(): yes

  • GPU adapter: available

  • adapter.features.has("shader-f16"): no

  • SharedArrayBuffer: unavailable

  • crossOriginIsolated: no

  • hardwareConcurrency: 12

  • deviceMemory: 32 GB

  • GGUF HEAD: yes / 200

  • GGUF GET: yes / 200

  • WASM loads

  • Final load stage: failed after Model URL load started

  • Does not reach URL load succeeded

  • Does not reach preflight started

  • No model metadata/tensor loading appears before abort

  • Firefox comparison: Firefox succeeds; previous app diagnostics observed WebGPU missing / wllama WebGPU support no / adapter not requested / shader-f16 not requested


Chromium WebGPU adapter check

{
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36",
"platform": "Linux x86_64",
"hardwareConcurrency": 12,
"deviceMemory": 32,
"crossOriginIsolated": false,
"sharedArrayBuffer": false,
"webgpu": true,
"adapterInfo": {},
"features": [
"bgra8unorm-storage",
"clip-distances",
"core-features-and-limits",
"depth-clip-control",
"depth32float-stencil8",
"dual-source-blending",
"float32-blendable",
"float32-filterable",
"indirect-first-instance",
"primitive-index",
"rg11b10ufloat-renderable",
"subgroups",
"texture-component-swizzle",
"texture-compression-bc",
"texture-compression-bc-sliced-3d",
"texture-formats-tier1",
"texture-formats-tier2",
"timestamp-query"
],
"shaderF16": false,
"error": ""
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions