Skip to content

llama : separate compute buffer reserve from fattn check#15696

Merged
slaren merged 1 commit intomasterfrom
sl/fix-fattn-reserve
Aug 31, 2025
Merged

llama : separate compute buffer reserve from fattn check#15696
slaren merged 1 commit intomasterfrom
sl/fix-fattn-reserve

Conversation

@slaren
Copy link
Member

@slaren slaren commented Aug 31, 2025

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
@slaren slaren merged commit 9777032 into master Aug 31, 2025
42 of 48 checks passed
@slaren slaren deleted the sl/fix-fattn-reserve branch August 31, 2025 13:49
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 31, 2025
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 7, 2025
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 26, 2025
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants