[aarch64] add sbgemm inner product op - cherrypick of pr#1768 by snadampal · Pull Request #1831 · uxlfoundation/oneDNN

snadampal · 2024-03-13T20:27:28Z

Description

Added sbgemm inner product op and blocked weights support to enable PyTorch torch.compile() and bf16 fastmath kernels to work together on aarch64.

Please include a summary of the change. Please also include relevant motivation and context. See contribution guidelines for more details. If the change fixes an issue not documented in the project's Github issue tracker, please document all steps necessary to reproduce it.

Fixes # (github issue)

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
tested make test and the inner product primitive tests (make test_bench_ip_ci).
[ x] Have you formatted the code using clang-format?

Performance improvements

Have you submitted performance data that demonstrates performance improvements?

New features

Have you published an RFC for the new feature?
Was the RFC approved?
Have you added relevant tests?

Bug fixes

Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
Have you added relevant regression tests?

RFC PR

Does RFC document follow the template?
Have you added a link to the rendered document?

with weights pre-packing enabled in torch.compile(), the weights come already reorderd and in oneDNN format, so, allowing format_kind::blocked as one of the supported formats for acl inner product primitive.

vpirogov · 2024-03-13T20:31:46Z

+@cfRod, @jondea please review

@snadampal, do you have tests passing locally?

vpirogov · 2024-03-13T20:37:27Z

I'm mainly interested in tests on Arm processors of course :)

snadampal · 2024-03-13T20:41:10Z

Yes, @vpirogov , I tested
make test
make test_bench_ip_ci and
./benchdnn --ip --mode=P --engine=cpu --allow-enum-tags-only=0 --batch=inputs/ip/test_ip_acl

igorsafo · 2024-03-13T20:47:08Z

Yes, @vpirogov , I tested make test make test_bench_ip_ci and ./benchdnn --ip --mode=P --engine=cpu --allow-enum-tags-only=0 --batch=inputs/ip/test_ip_acl

These tests do not cover mixed precision cases (f32:bf16). Could you please add it as well? Please make sure they do not fail on the platforms where mixed precision is not supported by ACL.

snadampal · 2024-03-13T20:50:22Z

Hi @igorsafo , can you please point me to which tests you were referring to?

igorsafo · 2024-03-13T21:13:10Z

Hi @igorsafo , can you please point me to which tests you were referring to?

Hi @snadampal !
Currently none of test_ip_ci and test_ip_acl cover a case when activation is f32 and weights bf16 (f32:bf16:bf16 or f32:bf16:f32). I am suggesting to add it into test_ip_acl to make sure it is validated for ACL. Another option is to run tests/benchdnn/inputs/ip/test_ip_bfloat16 on Aarch64 with ACL enabled, because it does have such cases:

$ grep -r "f32:bf16" tests/benchdnn/inputs/ip/
tests/benchdnn/inputs/ip/test_ip_bfloat16:--dt=bf16,f32:bf16:bf16
tests/benchdnn/inputs/ip/test_ip_bfloat16:--dt=bf16,bf16:f32:bf16

UPDATE: It is a ~~trap~~backport! Please ignore the comments about additional changes, such changes should be created into main for the future releases and not into the backport. The changes look good to me.

snadampal · 2024-03-13T21:25:39Z

thanks, @igorsafo , i will look into adding those additional test cases to main.

snadampal added 3 commits March 13, 2024 19:50

cpu: aarch64: allow blocked weight format for primitive creation

16e5def

with weights pre-packing enabled in torch.compile(), the weights come already reorderd and in oneDNN format, so, allowing format_kind::blocked as one of the supported formats for acl inner product primitive.

cpu: aarch64: add sbgemm (fp32 input and bf16 weights) inner product op

6e7cf85

benchdnn: ip: add benchdnn tests for acl weights blocked layout

13968d0

snadampal mentioned this pull request Mar 13, 2024

Upgrade submodule onednn to v3.3.5 pytorch/pytorch#120767

Closed

vpirogov added this to the v3.3 milestone Mar 13, 2024

igorsafo approved these changes Mar 13, 2024

View reviewed changes

vpirogov merged commit e7abee2 into uxlfoundation:rls-v3.3 Mar 13, 2024

Xia-Weiwen mentioned this pull request Mar 19, 2024

Upgrade submodule oneDNN to v3.3.6 pytorch/pytorch#122164

Closed

Xia-Weiwen mentioned this pull request Mar 29, 2024

Upgrade submodule oneDNN to v3.3.6 for release/2.3 (#122164) pytorch/pytorch#122930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aarch64] add sbgemm inner product op - cherrypick of pr#1768#1831

[aarch64] add sbgemm inner product op - cherrypick of pr#1768#1831
vpirogov merged 3 commits intouxlfoundation:rls-v3.3from
snadampal:cherrypick_pr1768

snadampal commented Mar 13, 2024

Uh oh!

vpirogov commented Mar 13, 2024

Uh oh!

vpirogov commented Mar 13, 2024

Uh oh!

snadampal commented Mar 13, 2024

Uh oh!

igorsafo commented Mar 13, 2024 •

edited

Loading

Uh oh!

snadampal commented Mar 13, 2024

Uh oh!

igorsafo commented Mar 13, 2024 •

edited

Loading

Uh oh!

snadampal commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

snadampal commented Mar 13, 2024

Description

Checklist

General

Performance improvements

New features

Bug fixes

RFC PR

Uh oh!

vpirogov commented Mar 13, 2024

Uh oh!

vpirogov commented Mar 13, 2024

Uh oh!

snadampal commented Mar 13, 2024

Uh oh!

igorsafo commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snadampal commented Mar 13, 2024

Uh oh!

igorsafo commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snadampal commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

igorsafo commented Mar 13, 2024 •

edited

Loading

igorsafo commented Mar 13, 2024 •

edited

Loading