build(cuda): enable GGML_CUDA_GRAPHS on CUDA builds by localai-bot · Pull Request #26 · mudler/parakeet.cpp

localai-bot · 2026-06-12T16:35:17Z

What

Enable GGML_CUDA_GRAPHS whenever parakeet.cpp forwards CUDA (PARAKEET_GGML_CUDA=ON). ggml leaves this off by default.

Why

With CUDA graphs on, the CUDA backend captures and replays the compute graph, a small but free speedup. Measured on a GB10 (interleaved, best-of, same 180s clip):

model	graphs ON	graphs OFF	gain
tdt-1.1b	~1477 ms	~1498 ms	+1.4%
tdt-0.6b-v3	~970 ms	~974 ms	+0.4%

Never negative across runs. The gain is capped because parakeet rebuilds each graph in a fresh ggml_context per call, which defeats ggml's cross-call graph replay (keyed on the first node pointer); the encoder also runs once per request. Lifting that is a separate, larger change. This PR just takes the free win.

Notes

Single point of control in CMakeLists.txt, so docker/release/local CUDA builds all inherit it.
Runtime kill-switch GGML_CUDA_DISABLE_GRAPHS=1 still works for A/B testing.
Proven by building parakeet.cpp with -DGGML_CUDA_GRAPHS=ON on the GB10 and benchmarking; this change just automates that flag.

🤖 Generated with Claude Code

ggml leaves GGML_CUDA_GRAPHS off by default. Turning it on lets the CUDA backend capture and replay the compute graph, a small but free speedup (about 1% measured on a GB10: 1.4% on tdt-1.1b, 0.4% on tdt-0.6b-v3, and never negative across interleaved runs). Enable it whenever we forward CUDA so every CUDA build (docker, release, local) inherits it. The runtime kill-switch GGML_CUDA_DISABLE_GRAPHS=1 still disables it for A/B testing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ggml leaves GGML_CUDA_GRAPHS off by default. Turning it on lets the CUDA backend capture and replay the compute graph, a small but free speedup (about 1% measured on a GB10: 1.4% on tdt-1.1b, 0.4% on tdt-0.6b-v3, and never negative across interleaved runs). Enable it whenever we forward CUDA so every CUDA build (docker, release, local) inherits it. The runtime kill-switch GGML_CUDA_DISABLE_GRAPHS=1 still disables it for A/B testing. Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

localai-bot mentioned this pull request Jun 12, 2026

feat(parakeet-cpp): enable GGML_CUDA_GRAPHS in the cublas build mudler/LocalAI#10273

Merged

mudler approved these changes Jun 12, 2026

View reviewed changes

mudler merged commit b8012f1 into master Jun 12, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(cuda): enable GGML_CUDA_GRAPHS on CUDA builds#26

build(cuda): enable GGML_CUDA_GRAPHS on CUDA builds#26
mudler merged 1 commit into
masterfrom
feat/cuda-graphs

localai-bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

localai-bot commented Jun 12, 2026

What

Why

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants