CANN: supports out_prod operator for F32 and F16 by TianHao324 · Pull Request #17406 · ggml-org/llama.cpp

TianHao324 · 2025-11-20T10:30:15Z

The CANN backend supports floating-point product calculations.

TianHao324 · 2025-11-20T10:46:29Z

test result

noemotiovon

LGTM, just a minor issue.

noemotiovon · 2025-11-20T10:45:23Z

ggml/src/ggml-cann/aclnn_ops.cpp

+
+            const int64_t i12 = i2;
+            const int64_t i13 = i3;
+            aclTensor *accumulator = ggml_cann_create_tensor(


The result of ggml_cann_create_tensor should be acl_tensor_ptr, not aclTensor*.

noemotiovon · 2025-11-20T10:45:39Z

ggml/src/ggml-cann/aclnn_ops.h

+ */
+void ggml_cann_out_prod(ggml_backend_cann_context & ctx, ggml_tensor * dst);
+
+void ggml_cann_out_prod_fp(ggml_backend_cann_context & ctx, ggml_tensor * dst);


noemotiovon · 2025-11-20T10:46:55Z

ggml/src/ggml-cann/aclnn_ops.cpp

@@ -72,6 +72,7 @@
 #include <aclnnop/aclnn_index_select.h>
 #include <aclnnop/aclnn_clamp.h>
 #include <aclnnop/aclnn_threshold.h>
+#include <aclnnop/aclnn_ger.h>


You should use
find ggml/src/ggml-cann -iname ".cpp" -o -iname ".h" | xargs clang-format -i
to format code.

noemotiovon · 2025-11-20T10:47:01Z

ggml/src/ggml-cann/aclnn_ops.cpp

@@ -72,6 +72,7 @@
 #include <aclnnop/aclnn_index_select.h>
 #include <aclnnop/aclnn_clamp.h>
 #include <aclnnop/aclnn_threshold.h>
+#include <aclnnop/aclnn_ger.h>


You should use
find ggml/src/ggml-cann -iname ".cpp" -o -iname ".h" | xargs clang-format -i
to format code.

noemotiovon · 2025-11-20T10:48:36Z

ggml/src/ggml-cann/aclnn_ops.cpp

+                dst->nb,
+                2);
+
+            GGML_CANN_CALL_ACLNN_OP(ctx, InplaceZero, accumulator);


Currently, InplaceZero is being called on each iteration of the for loop. I believe we can just call it once on dst before the loop.

noemotiovon · 2025-11-21T01:30:05Z

Thank you for your contribution! :)

hipudding

I think the current computation method is not optimal. Computing the outer product is essentially two vectors of dimension 1, which can be broadcast and multiplied element-wise. Here, the broadcast can be implemented by modifying the view (similar to how it’s done in operators like Add), followed by element-wise multiplication.

Ignore my previous comment. aclnnGer is actually the outer product; the CANN documentation description is incorrect.

hipudding · 2025-11-25T09:24:56Z

ggml/src/ggml-cann/aclnn_ops.cpp

+                                        ggml_type_size(dst->type), dst->ne, dst->nb, 2);
+
+            // The outer product needs to be accumulated in this dimension.
+            for (int64_t i1 = 0; i1 < ne11; i1++) {


There are three nested loops here, which will result in very poor performance. Optimization should be done once an appropriate operator is available.

hipudding · 2025-11-25T09:25:38Z

ggml/src/ggml-cann/aclnn_ops.cpp

+                acl_tensor_ptr       acl_out = ggml_cann_create_tensor(output_buffer, ggml_cann_type_mapping(dst->type),
+                                                                       ggml_type_size(dst->type), dst->ne, dst->nb, 2);
+
+                GGML_CANN_CALL_ACLNN_OP(ctx, Ger, acl_input.get(), acl_weight.get(), acl_out.get());


~~This operator is intended for computing the outer product, but it uses the inner product here, which is unsuitable.~~

Ignore my previous comment. aclnnGer is actually the outer product; the CANN documentation description is incorrect.

hipudding · 2025-11-25T09:51:16Z

Is 310p support aclnnGer? If not, please change support op function.

TianHao324 · 2025-11-25T09:53:34Z

Is 310p support aclnnGer? If not, please change support op function.

I will go check this issue, and if not, I will modify it in the next PR.

Co-authored-by: tianhao <tianhao42@huawei.com>

noemotiovon added the Ascend NPU issues specific to Ascend NPUs label Nov 20, 2025

TianHao324 changed the title ~~cann supports out_prod operator for F32 and F16~~ CANN: supports out_prod operator for F32 and F16 Nov 20, 2025

noemotiovon reviewed Nov 20, 2025

View reviewed changes

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 20, 2025

TianHao324 force-pushed the out_prod branch 3 times, most recently from 815e770 to 5d9578a Compare November 20, 2025 11:45

TianHao324 force-pushed the out_prod branch from 5d9578a to 1c2b447 Compare November 21, 2025 07:43

CANN: supports out_prod operator for F32 and F16

eb2a245

TianHao324 force-pushed the out_prod branch from 1c2b447 to eb2a245 Compare November 24, 2025 01:54

hipudding reviewed Nov 25, 2025

View reviewed changes

hipudding approved these changes Nov 25, 2025

View reviewed changes

hipudding merged commit 064c90d into ggml-org:master Nov 25, 2025
77 checks passed

noemotiovon mentioned this pull request Jan 13, 2026

CANN: Add support for OUT_PROD operator noemotiovon/llama.cpp#6

Closed

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

CANN: supports out_prod operator for F32 and F16 (ggml-org#17406)

8a70ee0

Co-authored-by: tianhao <tianhao42@huawei.com>

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

CANN: supports out_prod operator for F32 and F16 (#17406)

ebc8fbc

Co-authored-by: tianhao <tianhao42@huawei.com>

Conversation

TianHao324 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TianHao324 commented Nov 20, 2025

Uh oh!

noemotiovon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noemotiovon commented Nov 21, 2025

Uh oh!

hipudding left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hipudding Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hipudding Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hipudding commented Nov 25, 2025

Uh oh!

TianHao324 commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TianHao324 commented Nov 20, 2025 •

edited

Loading

hipudding left a comment •

edited

Loading

hipudding Nov 25, 2025 •

edited

Loading

hipudding Nov 25, 2025 •

edited

Loading