Allow multiple copy function pointers for CUDA graph kernel updates by agray3 · Pull Request #7565 · ggml-org/llama.cpp

agray3 · 2024-05-27T13:07:24Z

CUDA graphs require parameter updates to kernels associated with GGML_OP_CPY nodes. Previously the implementation only checked for a single CUDA kernel in such nodes, but this caused a bug in cases where 2 such kernels exist. This fixes the issue by using a vector to allow multiple function pointers to be stored and checked against.

Fixes #7492

…ates CUDA graphs require parameter updates to kernels associated with GGML_OP_CPY nodes. Previously the implementation only checked for a single CUDA kernel in such nodes, but this caused a bug in cases where 2 such kernels exist. This fixes the issue by using a vector to allow multiple function pointers to be stored and checked against. Fixes ggml-org#7942

agray3 · 2024-05-27T13:16:58Z

@JohannesGaessler Can you check if this works for #7527 ?

github-actions · 2024-05-27T14:46:53Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 529 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8873.23ms p(95)=21807.9ms fails=, finish reason: stop=476 truncated=53
Prompt processing (pp): avg=105.19tk/s p(95)=468.51tk/s
Token generation (tg): avg=58.91tk/s p(95)=46.72tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=ag_allow_multiple_cuda_cpy_fn_ptrs commit=21826514dfac9237a32cad6d1f2312298800ebf9

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 447.26, 447.26, 447.26, 447.26, 447.26, 690.48, 690.48, 690.48, 690.48, 690.48, 684.42, 684.42, 684.42, 684.42, 684.42, 699.25, 699.25, 699.25, 699.25, 699.25, 788.71, 788.71, 788.71, 788.71, 788.71, 787.03, 787.03, 787.03, 787.03, 787.03, 791.06, 791.06, 791.06, 791.06, 791.06, 815.97, 815.97, 815.97, 815.97, 815.97, 815.27, 815.27, 815.27, 815.27, 815.27, 827.36, 827.36, 827.36, 827.36, 827.36, 830.6, 830.6, 830.6, 830.6, 830.6, 861.16, 861.16, 861.16, 861.16, 861.16, 882.34, 882.34, 882.34, 882.34, 882.34, 905.07, 905.07, 905.07, 905.07, 905.07, 910.81, 910.81, 910.81, 910.81, 910.81, 910.32, 910.32, 910.32, 910.32, 910.32, 913.05, 913.05, 913.05, 913.05, 913.05, 910.64, 910.64, 910.64, 910.64, 910.64, 917.3, 917.3, 917.3, 917.3, 917.3, 930.07, 930.07, 930.07, 930.07, 930.07, 927.55, 927.55, 927.55, 927.55, 927.55, 931.7, 931.7, 931.7, 931.7, 931.7, 931.69, 931.69, 931.69, 931.69, 931.69, 920.1, 920.1, 920.1, 920.1, 920.1, 917.93, 917.93, 917.93, 917.93, 917.93, 919.82, 919.82, 919.82, 919.82, 919.82, 933.43, 933.43, 933.43, 933.43, 933.43, 929.53, 929.53, 929.53, 929.53, 929.53, 925.89, 925.89, 925.89, 925.89, 925.89, 924.95, 924.95, 924.95, 924.95, 924.95, 927.88, 927.88, 927.88, 927.88, 927.88, 927.28, 927.28, 927.28, 927.28, 927.28, 924.2, 924.2, 924.2, 924.2, 924.2, 926.38, 926.38, 926.38, 926.38, 926.38, 934.53, 934.53, 934.53, 934.53, 934.53, 937.01, 937.01, 937.01, 937.01, 937.01, 935.63, 935.63, 935.63, 935.63, 935.63, 933.67, 933.67, 933.67, 933.67, 933.67, 930.17, 930.17, 930.17, 930.17, 930.17, 928.36, 928.36, 928.36, 928.36, 928.36, 931.63, 931.63, 931.63, 931.63, 931.63, 930.67, 930.67, 930.67, 930.67, 930.67, 927.58, 927.58, 927.58, 927.58, 927.58, 901.39, 901.39, 901.39, 901.39, 901.39, 900.6, 900.6, 900.6, 900.6, 900.6, 897.45, 897.45, 897.45, 897.45, 897.45, 894.94, 894.94, 894.94, 894.94, 894.94, 892.24, 892.24, 892.24, 892.24, 892.24, 893.78, 893.78, 893.78, 893.78, 893.78, 895.0, 895.0, 895.0, 895.0, 895.0, 893.24, 893.24, 893.24, 893.24, 893.24, 896.37, 896.37, 896.37, 896.37, 896.37, 894.8, 894.8, 894.8, 894.8, 894.8, 894.4, 894.4, 894.4, 894.4, 894.4, 896.21, 896.21, 896.21, 896.21, 896.21, 895.12, 895.12, 895.12, 895.12, 895.12, 892.14, 892.14, 892.14, 892.14, 892.14, 893.71, 893.71, 893.71, 893.71, 893.71, 893.02, 893.02, 893.02, 893.02, 893.02, 892.89, 892.89, 892.89, 892.89, 892.89, 892.49, 892.49, 892.49]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 44.53, 44.53, 44.53, 44.53, 44.53, 44.57, 44.57, 44.57, 44.57, 44.57, 28.21, 28.21, 28.21, 28.21, 28.21, 29.32, 29.32, 29.32, 29.32, 29.32, 31.07, 31.07, 31.07, 31.07, 31.07, 31.09, 31.09, 31.09, 31.09, 31.09, 32.1, 32.1, 32.1, 32.1, 32.1, 32.86, 32.86, 32.86, 32.86, 32.86, 32.96, 32.96, 32.96, 32.96, 32.96, 32.95, 32.95, 32.95, 32.95, 32.95, 33.14, 33.14, 33.14, 33.14, 33.14, 33.31, 33.31, 33.31, 33.31, 33.31, 32.69, 32.69, 32.69, 32.69, 32.69, 32.25, 32.25, 32.25, 32.25, 32.25, 31.99, 31.99, 31.99, 31.99, 31.99, 30.6, 30.6, 30.6, 30.6, 30.6, 29.31, 29.31, 29.31, 29.31, 29.31, 29.64, 29.64, 29.64, 29.64, 29.64, 29.73, 29.73, 29.73, 29.73, 29.73, 29.5, 29.5, 29.5, 29.5, 29.5, 29.78, 29.78, 29.78, 29.78, 29.78, 29.86, 29.86, 29.86, 29.86, 29.86, 30.13, 30.13, 30.13, 30.13, 30.13, 30.34, 30.34, 30.34, 30.34, 30.34, 30.15, 30.15, 30.15, 30.15, 30.15, 30.47, 30.47, 30.47, 30.47, 30.47, 30.54, 30.54, 30.54, 30.54, 30.54, 30.29, 30.29, 30.29, 30.29, 30.29, 30.37, 30.37, 30.37, 30.37, 30.37, 30.63, 30.63, 30.63, 30.63, 30.63, 30.83, 30.83, 30.83, 30.83, 30.83, 30.84, 30.84, 30.84, 30.84, 30.84, 31.05, 31.05, 31.05, 31.05, 31.05, 31.1, 31.1, 31.1, 31.1, 31.1, 31.03, 31.03, 31.03, 31.03, 31.03, 30.78, 30.78, 30.78, 30.78, 30.78, 30.45, 30.45, 30.45, 30.45, 30.45, 30.24, 30.24, 30.24, 30.24, 30.24, 30.3, 30.3, 30.3, 30.3, 30.3, 30.5, 30.5, 30.5, 30.5, 30.5, 30.58, 30.58, 30.58, 30.58, 30.58, 30.6, 30.6, 30.6, 30.6, 30.6, 30.76, 30.76, 30.76, 30.76, 30.76, 30.63, 30.63, 30.63, 30.63, 30.63, 30.5, 30.5, 30.5, 30.5, 30.5, 29.95, 29.95, 29.95, 29.95, 29.95, 28.96, 28.96, 28.96, 28.96, 28.96, 28.67, 28.67, 28.67, 28.67, 28.67, 28.64, 28.64, 28.64, 28.64, 28.64, 28.63, 28.63, 28.63, 28.63, 28.63, 28.62, 28.62, 28.62, 28.62, 28.62, 28.65, 28.65, 28.65, 28.65, 28.65, 28.68, 28.68, 28.68, 28.68, 28.68, 28.72, 28.72, 28.72, 28.72, 28.72, 28.74, 28.74, 28.74, 28.74, 28.74, 28.75, 28.75, 28.75, 28.75, 28.75, 28.76, 28.76, 28.76, 28.76, 28.76, 28.81, 28.81, 28.81, 28.81, 28.81, 29.01, 29.01, 29.01, 29.01, 29.01, 29.13, 29.13, 29.13, 29.13, 29.13, 29.18, 29.18, 29.18]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.15, 0.15, 0.15, 0.15, 0.15, 0.42, 0.42, 0.42, 0.42, 0.42, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.27, 0.27, 0.27, 0.27, 0.27, 0.32, 0.32, 0.32, 0.32, 0.32, 0.35, 0.35, 0.35, 0.35, 0.35, 0.44, 0.44, 0.44, 0.44, 0.44, 0.34, 0.34, 0.34, 0.34, 0.34, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.31, 0.31, 0.31, 0.31, 0.31, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.32, 0.32, 0.32, 0.32, 0.32, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.28, 0.28, 0.28, 0.28, 0.28, 0.34, 0.34, 0.34, 0.34, 0.34, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.34, 0.34, 0.34, 0.34, 0.34, 0.51, 0.51, 0.51, 0.51, 0.51, 0.64, 0.64, 0.64, 0.64, 0.64, 0.6, 0.6, 0.6, 0.6, 0.6, 0.41, 0.41, 0.41, 0.41, 0.41, 0.21, 0.21, 0.21, 0.21, 0.21, 0.25, 0.25, 0.25, 0.25, 0.25, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.07, 0.07, 0.07, 0.07, 0.07, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0]

JohannesGaessler

I can confirm that this fixes the issue both on master and for my PR.

agray3 mentioned this pull request May 27, 2024

CUDA graphs break quantized K cache #7492

Closed

JohannesGaessler approved these changes May 27, 2024

View reviewed changes

JohannesGaessler merged commit 197c006 into ggml-org:master May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple copy function pointers for CUDA graph kernel updates#7565

Allow multiple copy function pointers for CUDA graph kernel updates#7565
JohannesGaessler merged 1 commit intoggml-org:masterfrom
agray3:ag_allow_multiple_cuda_cpy_fn_ptrs

agray3 commented May 27, 2024 •

edited

Loading

Uh oh!

agray3 commented May 27, 2024

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

JohannesGaessler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agray3 commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agray3 commented May 27, 2024

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agray3 commented May 27, 2024 •

edited

Loading