[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320#480
Conversation
|
This pull review modifies files outside of the |
|
This is a cherry-pick from upstream head. I'm assuming we don't need to adhere to the |
davemgreen
left a comment
There was a problem hiding this comment.
This looks the same as the upstream version, and should be pretty safe as it only alters tuning features. LGTM.
On our branch, it's still a downstream change that needs tracking. Also, a reminder:
|
tblah
left a comment
There was a problem hiding this comment.
I think this is okay for ATfL as this is an upstream backport and doesn't impact the cores we benchmark against.
Downstream issue: arm#482 With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, arm#1, mul vl] add x12, x1, x10 ldr z3, [x12, arm#1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul arm#2 str z1, [x12, arm#1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, arm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, arm#32 add x12, x12, arm#32 stp q0, q1, [x10, #-16] add x10, x10, arm#32 ... ``` This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520 from the CPU features to the tune features.
63631f7
926af2e to
63631f7
Compare
Ok, thanks for the clarification. Done. |
dcandler
left a comment
There was a problem hiding this comment.
LGTM, the only changes since the previous commit are additional comments which are non-functional so I think the previous approvals should still apply.
Downstream issue: #482
With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal.
So when we have:
When compiling without the feature enabled, we get:
When compiling with, we get:
This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520 from the CPU features to the tune features.