Skip to content

Misc. bug: ggml\src\ggml-cuda\fattn.cu:453: fatal error #19096

@supercilious

Description

@supercilious

Name and Version

version: 7836 (0c21677)
built with MSVC 19.50.35721.0 for Windows AMD64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server --no-mmap --host ... --port ... --jinja --temp 0.7 --top-p 1.0 --min-p 0.01 -nkvo GLM-4.7-Flash-UD-Q8_K_XL.gguf

Problem description & steps to reproduce

Using the GLM-4.7-Flash model (GLM-4.7-Flash-UD-Q8_K_XL.gguf) with nkvo hits an assert in ggml\src\ggml-cuda\fattn.cu

sched_reserve: reserve took 43.27 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
C:\B\llama.cpp\ggml\src\ggml-cuda\fattn.cu:453: fatal error

First Bad Commit

Bug is recent (last day-ish), but I haven't bisected it.

Relevant log output

-fa is not specfied on the command line, so I assume it is auto. I'm not sure why its hitting the NONE case.

void ggml_cuda_flash_attn_ext(ggml_backend_cuda_context & ctx, ggml_tensor * dst) {
    ggml_cuda_set_device(ctx.device);
    switch (ggml_cuda_get_best_fattn_kernel(ggml_cuda_get_device(), dst)) {
        case BEST_FATTN_KERNEL_NONE:
            GGML_ABORT("fatal error");

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions