Skip to content

Poor DGEMM performance for armsve build on Neoverse N2 #641

@chrisgoodyer

Description

@chrisgoodyer

Hi.

Whilst doing some comparative benchmarking on the Alibaba Cloud g8m instances I've run into some BLIS performance issues. g8m is based on Arm's Neoverse N2 technology and has 2x128-bit SVE vectors.

When I've done a build for the target "armsve" I am getting a peak performance of between 5 and 6 GFLOPs on a single core rather than the 20 GFLOPs I get from the Neon implementation.

There seems to be an awful lot of time spent in the function "bli_dpackm_mrxk_armsve_ref" which makes me think it is packing incorrectly for the 128-bit vector length. Running on AWS Graviton3 instances (with a 256-bit vector length) does not show these issues.

Thanks.

Chris

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions