Skip to content

cuda : cuda graphs now compare all node params#19383

Merged
ggerganov merged 1 commit intomasterfrom
gg/cuda-graph-props-check-all-params
Feb 6, 2026
Merged

cuda : cuda graphs now compare all node params#19383
ggerganov merged 1 commit intomasterfrom
gg/cuda-graph-props-check-all-params

Conversation

@ggerganov
Copy link
Member

ref #19338 (comment)

This should fix the CUDA graph usage logic when the ops have variable op params. This issue is most pronounced during test-backend-ops.

@ggerganov ggerganov requested a review from am17an February 6, 2026 05:50
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 6, 2026
@ggerganov ggerganov merged commit 3e21647 into master Feb 6, 2026
74 of 75 checks passed
@ggerganov ggerganov deleted the gg/cuda-graph-props-check-all-params branch February 6, 2026 05:55
@AndVinni
Copy link

AndVinni commented Feb 6, 2026

After this change, the GTX 970m 3GB (CUDA 5.2) and T2000 4GB (CUDA 7.5) stopped working completely.
Any utility throws an error.
Llama server example:

...
llama_context: n_ctx_seq (2048) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context: CUDA_Host output buffer size = 2.32 MiB
llama_kv_cache: CUDA0 KV buffer size = 192.00 MiB
llama_kv_cache: size = 192.00 MiB ( 2048 cells, 48 ​​layers, 4/1 seqs), K (f16): 96.00 MiB, V (f16): 96.00 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: CUDA0 compute buffer size = 300.75 MiB
sched_reserve: CUDA_Host compute buffer size = 12.01 MiB
sched_reserve: graph nodes = 3031
sched_reserve: graph splits = 2
sched_reserve: reserve took 14.34 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
ggml_cuda_compute_forward: MUL_MAT failed
CUDA error: device kernel image is invalid 
current device: 0, in function ggml_cuda_compute_forward at D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2758
err
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:97: CUDA error

b7951 everything works fine.

All CUDA13 drivers, and Windows 11, are always the latest.
A prebuilt CUDA12 archive is used.

@am17an
Copy link
Contributor

am17an commented Feb 7, 2026

Not sure how this PR can cause an issue, probably the issue is elsewhere. I have high confidence in this because this path is not even exercised for anything below ampere (cc 8.9)

@AndVinni
Copy link

AndVinni commented Feb 7, 2026

Sorry, something went wrong.
I downloaded b7952 twice separately on each computer, and both times there were errors.
I still have a non-working version of b7952 on two computers.
Probably solar magnetic storms :)

e:\AI>c:\llama\b7951\llama-bench.exe   -m e:\AI\nomic-embed-text-v1.Q4_0.gguf   -t 4   -embd 1   -mmp 0
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970M, compute capability 5.2, VMM: yes
load_backend: loaded CUDA backend from c:\llama\b7951\ggml-cuda.dll
load_backend: loaded RPC backend from c:\llama\b7951\ggml-rpc.dll
load_backend: loaded CPU backend from c:\llama\b7951\ggml-cpu-haswell.dll
| model                          |       size |     params | backend    | ngl |       embd |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | --------------: | -------------------: |
| nomic-bert 137M Q4_0           |  73.48 MiB |   136.73 M | CUDA       |  99 |          1 |           pp512 |      4721.43 + 17.09 |
| nomic-bert 137M Q4_0           |  73.48 MiB |   136.73 M | CUDA       |  99 |          1 |           tg128 |        227.81 + 3.72 |

build: 22cae8321 (7951)

e:\AI>c:\llama\b7952\llama-bench.exe   -m e:\AI\nomic-embed-text-v1.Q4_0.gguf   -t 4   -embd 1   -mmp 0
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970M, compute capability 5.2, VMM: yes
load_backend: loaded CUDA backend from c:\llama\b7952\ggml-cuda.dll
load_backend: loaded RPC backend from c:\llama\b7952\ggml-rpc.dll
load_backend: loaded CPU backend from c:\llama\b7952\ggml-cpu-haswell.dll
| model                          |       size |     params | backend    | ngl |       embd |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | --------------: | -------------------: |
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:97: CUDA error

e:\AI>c:\llama\b7964\llama-bench.exe   -m e:\AI\nomic-embed-text-v1.Q4_0.gguf   -t 4   -embd 1   -mmp 0
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970M, compute capability 5.2, VMM: yes
load_backend: loaded CUDA backend from c:\llama\b7964\ggml-cuda.dll
load_backend: loaded RPC backend from c:\llama\b7964\ggml-rpc.dll
load_backend: loaded CPU backend from c:\llama\b7964\ggml-cpu-haswell.dll
| model                          |       size |     params | backend    | ngl |       embd |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | --------------: | -------------------: |
| nomic-bert 137M Q4_0           |  73.48 MiB |   136.73 M | CUDA       |  99 |          1 |           pp512 |      4735.51 + 15.17 |
| nomic-bert 137M Q4_0           |  73.48 MiB |   136.73 M | CUDA       |  99 |          1 |           tg128 |        224.12 + 7.50 |

build: b83111815 (7964)

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants