Poor DGEMM performance for armsve build on Neoverse N2

Hi.

Whilst doing some comparative benchmarking on the Alibaba Cloud g8m instances I've run into some BLIS performance issues.  g8m is based on Arm's Neoverse N2 technology and has 2x128-bit SVE vectors.  

When I've done a build for the target "armsve" I am getting a peak performance of between 5 and 6 GFLOPs on a single core rather than the 20 GFLOPs I get from the Neon implementation.

There seems to be an awful lot of time spent in the function "bli_dpackm_mrxk_armsve_ref" which makes me think it is packing incorrectly for the 128-bit vector length.  Running on AWS Graviton3 instances (with a 256-bit vector length) does not show these issues.

Thanks.

Chris

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor DGEMM performance for armsve build on Neoverse N2 #641

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor DGEMM performance for armsve build on Neoverse N2 #641

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions