CUDA: Do not mutate cgraph for fused ADDs by ORippler · Pull Request #19566 · ggml-org/llama.cpp

ORippler · 2026-02-12T16:00:44Z

We should try to minimize in-place changes to the incoming ggml_cgraph where possible (those should happen in a backends' graph_optimize function)
Modifying in-place leads to an additional, unnecessary graph capture step as we store the properties before modifying the graph in-place in the cuda-backend: We hit ggml_cuda_graph_node_set_properties via ggml_cuda_graph_update_required before entering ggml_cuda_graph_evaluate_and_capture.

Isolated from #19521

1. We should try to minimize in-place changes to the incoming ggml_cgraph where possible (those should happen in graph_optimize) 2. Modifying in-place leads to an additional, unnecessary graph capture step as we store the properties before modifying the graph in-place in the cuda-backend

ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* Do not mutate cgraph for fused ADDs 1. We should try to minimize in-place changes to the incoming ggml_cgraph where possible (those should happen in graph_optimize) 2. Modifying in-place leads to an additional, unnecessary graph capture step as we store the properties before modifying the graph in-place in the cuda-backend * Assert ggml_tensor is trivially copyable * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com> (cherry picked from commit 43919b7)

* Do not mutate cgraph for fused ADDs 1. We should try to minimize in-place changes to the incoming ggml_cgraph where possible (those should happen in graph_optimize) 2. Modifying in-place leads to an additional, unnecessary graph capture step as we store the properties before modifying the graph in-place in the cuda-backend * Assert ggml_tensor is trivially copyable * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>

ORippler mentioned this pull request Feb 12, 2026

CUDA: Enable cuda graphs for qwen3 next-style architectures #19521

Closed

ORippler changed the title ~~Do not mutate cgraph for fused ADDs~~ CUDA: Do not mutate cgraph for fused ADDs Feb 12, 2026

am17an reviewed Feb 12, 2026

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Show resolved Hide resolved

Assert ggml_tensor is trivially copyable

17717f3

am17an approved these changes Feb 12, 2026

View reviewed changes

am17an reviewed Feb 12, 2026

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 12, 2026

Update ggml/src/ggml-cuda/ggml-cuda.cu

d812b69

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

am17an merged commit 43919b7 into ggml-org:master Feb 13, 2026
75 checks passed

ggerganov mentioned this pull request Feb 14, 2026

models : optimizing qwen3next graph #19375

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Do not mutate cgraph for fused ADDs#19566

CUDA: Do not mutate cgraph for fused ADDs#19566
am17an merged 3 commits intoggml-org:masterfrom
ORippler:osimons/fix_multi_add

ORippler commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ORippler commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ORippler commented Feb 12, 2026 •

edited

Loading