Skip to content

feat(parakeet-cpp): enable GGML_CUDA_GRAPHS in the cublas build#10273

Merged
mudler merged 1 commit into
masterfrom
feat/parakeet-cuda-graphs
Jun 12, 2026
Merged

feat(parakeet-cpp): enable GGML_CUDA_GRAPHS in the cublas build#10273
mudler merged 1 commit into
masterfrom
feat/parakeet-cuda-graphs

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

What

Pass -DGGML_CUDA_GRAPHS=ON alongside -DPARAKEET_GGML_CUDA=ON in the parakeet-cpp backend's cublas build.

Why

ggml leaves GGML_CUDA_GRAPHS off by default. With it on, the CUDA backend captures and replays the compute graph for a small but free speedup. Measured on a GB10 (interleaved, best-of, same 180s clip):

model graphs ON graphs OFF gain
tdt-1.1b ~1477 ms ~1498 ms +1.4%
tdt-0.6b-v3 ~970 ms ~974 ms +0.4%

Never negative across runs. It is not gated by parakeet.cpp's CMake options, so it passes straight through to ggml and takes effect regardless of the pinned parakeet.cpp commit.

Notes

🤖 Generated with Claude Code

ggml leaves GGML_CUDA_GRAPHS off by default. Passing -DGGML_CUDA_GRAPHS=ON
for cublas builds lets the CUDA backend capture and replay the compute
graph for a small free speedup (about 1% measured on a GB10, never
negative). It is not gated by parakeet.cpp's CMake options, so it passes
straight through to ggml.

Assisted-by: Claude Opus 4.8 <noreply@anthropic.com>
@mudler mudler merged commit 8c8204d into master Jun 12, 2026
66 of 67 checks passed
@mudler mudler deleted the feat/parakeet-cuda-graphs branch June 12, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants