sycl : Fixes broken build and test-backend-ops by Alcpz · Pull Request #10257 · ggml-org/llama.cpp

Alcpz · 2024-11-11T21:40:10Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133)
Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops
Fixes asserts in norm to fix debug builds.

Tests confirmed passing in Nvidia A100 and Intel Data Center GPU Max 1100

Alcpz · 2024-11-11T21:53:55Z

@airMeng I undestand you were fixing the unsupported permuted MUL_MAT in #10041, but since there is some issues with the SYCL CI and it seems that it could take longer, can we merge this?

airMeng · 2024-11-12T00:53:32Z

could you cherry-pick the norm related cases from #10041 too? It will only crash with debug building

Alcpz · 2024-11-12T09:38:09Z

Added the changes

Rbiessy

The oneMKL changes look good to me.

easyfab · 2024-11-14T17:02:47Z

these commits negatively affect intel gpus. Is this expected ?

For example :
Before :

ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                 Intel Iris Xe Graphics|    1.6|     96|     512|   32| 31604M|            1.3.31441|
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         pp512 |        358.62 ± 8.26 |
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         tg128 |         13.10 ± 0.34 |

build: 80dd7ff2 (4068)

After:

ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                 Intel Iris Xe Graphics|    1.6|     96|     512|   32| 31604M|            1.3.31441|
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         pp512 |       276.80 ± 13.68 |
| qwen2 1.5B Q5_K - Medium       |   1.22 GiB |     1.78 B | SYCL       |  99 |         tg128 |         10.64 ± 0.25 |

build: 2e82ffa4 (4069)

Reverting over master and performance returns

Alcpz · 2024-11-15T11:20:15Z

I'm currently investigating another performance regression in the SYCL backend, though this seems a separate issue. Reverting the change would break the build for non-intel backends, but we also want to avoid this performance loss. Will look into it.

Edit: Sorry for the inconvenience

Alcpz · 2024-11-15T13:57:47Z

The issues is caused by the mul_mats marked as unsupported. I've found out a new issue for a specific test case:

MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NMSE = 2.227304559 > 0.000500000 FAIL

The following is a temporary patch that fixes the regression and makes test-backend-ops not crash. I'm still investigating the test-case from above.

diff --git a/ggml/src/ggml-sycl/ggml-sycl.cpp b/ggml/src/ggml-sycl/ggml-sycl.cpp
index 2dba15d2..72a94a50 100644
--- a/ggml/src/ggml-sycl/ggml-sycl.cpp
+++ b/ggml/src/ggml-sycl/ggml-sycl.cpp
@@ -4350,9 +4350,10 @@ static bool ggml_backend_sycl_device_supports_op(ggml_backend_dev_t dev, const g
                 if (op->op == GGML_OP_MUL_MAT) {
                     a = op->src[0];
                     b = op->src[1];
-                    if (ggml_is_permuted(a) || ggml_is_permuted(b)) {
+                    if (ggml_is_permuted(a)) {
                         // TODO: fix like https://github.com/ggerganov/llama.cpp/pull/10021
-                        return false;
+                        if (a->nb[0] <= a->nb[1] && a->nb[3] <= a->nb[2]) return false; // 0,1,3,2 Unsupported
+                        if (b->type != GGML_TYPE_F32) return false;
                     }
                 } else {
                     a = op->src[2];

* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ggml-org#10133) * Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops * Fixes asserts in norm to fix debug builds.

NeoZhangJianyu · 2024-11-18T04:10:51Z

@Alcpz
This PR lead to reduce 50% performance on Intel GPU.
I don't know the status for other GPUs.
I want to revert this PR.

How do you think?

Thank you!

Alcpz · 2024-11-18T10:32:14Z

Yes, let's revert until we get a proper fix for the test-backend-ops.

Edit: I can't automatically do it. I will submit a new PR reverting just the problematic changes. CI is gonna fail for SYCL though, so we may need to have a convesation there.

sycl : Fixes RWKV6 broken build in the cuda backend

17b8a2e

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 11, 2024

sycl : marks permuted MUL_MAT as unsupported

f6ea8b7

Alcpz force-pushed the Alcpz/sycl-backend-build-fix branch from 06cb3c6 to f6ea8b7 Compare November 11, 2024 21:41

Alcpz requested a review from airMeng November 11, 2024 21:54

sycl : fix norm asserts in debug build

6a2c025

Alcpz requested a review from NeoZhangJianyu November 12, 2024 10:53

Rbiessy approved these changes Nov 12, 2024

View reviewed changes

airMeng approved these changes Nov 12, 2024

View reviewed changes

Alcpz merged commit 2e82ffa into ggml-org:master Nov 13, 2024

Alcpz mentioned this pull request Nov 18, 2024

sycl: Revert MUL_MAT_OP support changes #10385

Merged

4 tasks

Rbiessy mentioned this pull request Jan 3, 2025

[SYCL] pass SYCL CI #10041

Closed

4 tasks

Alcpz deleted the Alcpz/sycl-backend-build-fix branch November 27, 2025 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl : Fixes broken build and test-backend-ops#10257

sycl : Fixes broken build and test-backend-ops#10257
Alcpz merged 3 commits intoggml-org:masterfrom
Alcpz:Alcpz/sycl-backend-build-fix

Alcpz commented Nov 11, 2024 •

edited

Loading

Uh oh!

Alcpz commented Nov 11, 2024

Uh oh!

airMeng commented Nov 12, 2024

Uh oh!

Alcpz commented Nov 12, 2024

Uh oh!

Rbiessy left a comment

Uh oh!

easyfab commented Nov 14, 2024

Uh oh!

Alcpz commented Nov 15, 2024 •

edited

Loading

Uh oh!

Alcpz commented Nov 15, 2024

Uh oh!

NeoZhangJianyu commented Nov 18, 2024

Uh oh!

Alcpz commented Nov 18, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Alcpz commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alcpz commented Nov 11, 2024

Uh oh!

airMeng commented Nov 12, 2024

Uh oh!

Alcpz commented Nov 12, 2024

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

easyfab commented Nov 14, 2024

Uh oh!

Alcpz commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alcpz commented Nov 15, 2024

Uh oh!

NeoZhangJianyu commented Nov 18, 2024

Uh oh!

Alcpz commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Alcpz commented Nov 11, 2024 •

edited

Loading

Alcpz commented Nov 15, 2024 •

edited

Loading

Alcpz commented Nov 18, 2024 •

edited

Loading