Skip to content

graph : fix nkvo offload with FA#19105

Merged
ggerganov merged 1 commit into
masterfrom
gg/graph-fix-nkvo
Jan 26, 2026
Merged

graph : fix nkvo offload with FA#19105
ggerganov merged 1 commit into
masterfrom
gg/graph-fix-nkvo

Conversation

@ggerganov

Copy link
Copy Markdown
Member

fix #19096

The ggml_flash_attn_ext was not being offloaded to the CPU when -nkvo is specified.

Also remove obsolete strcmp(name, "kqv_merged_cont") check in the graph callback.

@ggerganov ggerganov merged commit 8f80d1b into master Jan 26, 2026
73 of 78 checks passed
@ggerganov ggerganov deleted the gg/graph-fix-nkvo branch January 26, 2026 18:18
shaofeiqi pushed a commit to qualcomm/llama.cpp that referenced this pull request Feb 6, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: ggml\src\ggml-cuda\fattn.cu:453: fatal error

2 participants