Skip to content

Misc. bug: Qwen3-Next PP performance loss with larger ubatch-size (Strix Halo, Vulkan) #18725

@lemmi

Description

@lemmi

Name and Version

build/bin/llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 7684 (53eb9435d)
built with GNU 14.2.1 for Linux x86_64

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench, llama-server

Command line

build/bin/llama-bench -m ../models/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF/Qwen3-Next-80B-A3B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -p 4096 -ub 256,512,1024,2048,4096 -n 0 -fa 1 -r 3 --mmap 0 -ngl 0,999

Problem description & steps to reproduce

Qwen3-Next behaves a little strange with respect to ubatch sizes. Normally I expect performance to raise up to point, then plateau (or decrease slightly). With this model (and hardware) performance almost halves with ubatch sizes > 512.

model size params backend ngl n_ubatch fa mmap test t/s
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 0 256 1 0 pp4096 479.58 ± 0.92
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 0 512 1 0 pp4096 499.74 ± 1.77
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 0 1024 1 0 pp4096 404.50 ± 3.72
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 0 2048 1 0 pp4096 283.88 ± 5.11
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 0 4096 1 0 pp4096 280.88 ± 4.69
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 999 256 1 0 pp4096 388.09 ± 2.15
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 999 512 1 0 pp4096 444.50 ± 0.70
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 999 1024 1 0 pp4096 305.61 ± 8.87
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 999 2048 1 0 pp4096 263.40 ± 13.52
qwen3next 80B.A3B Q8_0 79.57 GiB 79.67 B Vulkan 999 4096 1 0 pp4096 261.35 ± 7.57

Also PP on Vulkan is slower than CPU. The avx 512 path with repacking seems to work very well.

First Bad Commit

No response

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions