Audio: DRC: Some HiFi4 and HiFi3 optimizations by singalsu · Pull Request #9646 · thesofproject/sof

singalsu · 2024-11-08T19:08:02Z

No description provided.

singalsu · 2024-11-08T19:13:53Z

src/audio/drc/drc_math_hifi3.c

+	 * p1p0.l holds p0(x)
+	 * in3p1.h holds p1(x) * x^3
+	 */
+	acc = AE_MOVAD32_L(AE_ADD32_HL_LH(in3p1, in3p1));


1st working version, this should be possible optimize further. If it works, it should apply to other used polynomial evaluation based functions.

great - lets takes this as step 1. The future optimizations can merge as other PRs.

There's now a better version, but I'm out of ideas how to improve this further. Also, the polynomial is too low order for four parallel multipliers version of Horner, there would be setup overhead and redundant zero coefficients.

lgirdwood

LGTM

lgirdwood · 2024-11-11T16:41:39Z

src/audio/drc/drc_math_hifi3.c

+	 * p1p0.l holds p0(x)
+	 * in3p1.h holds p1(x) * x^3
+	 */
+	acc = AE_MOVAD32_L(AE_ADD32_HL_LH(in3p1, in3p1));


great - lets takes this as step 1. The future optimizations can merge as other PRs.

src/audio/drc/drc_math_hifi3.c

singalsu · 2024-11-11T17:48:59Z

src/audio/drc/drc_math_hifi3.c

-	int32_t precision_inv;
-	int32_t sqrt2_extracted = 0;
-	ae_f32 acc;
+	ae_f32 coef[2];


__aligned(8) ae_f32 coef[2];

@singalsu does this need aligned i.e. will it be used for SIMD ? I would assume that the SIMD intrinsics would force alignment in their definition.

I think I need to add that since I declare it as array of 32-bit while I use it in hot code with 64 bit load. The compiler may figure that out but I think there is no guarantee. I didn't get align faults when I ran this but better to be safe.

singalsu · 2024-11-12T14:50:58Z

I added unit test patch for the changed function separately in #9649

lgirdwood

LGTM, just one open.

lgirdwood · 2024-11-13T15:54:35Z

src/audio/drc/drc_math_hifi3.c

-	int32_t precision_inv;
-	int32_t sqrt2_extracted = 0;
-	ae_f32 acc;
+	ae_f32 coef[2];


@singalsu does this need aligned i.e. will it be used for SIMD ? I would assume that the SIMD intrinsics would force alignment in their definition.

kv2019i

Very helpful comments. A few minor things, but no showstoppers. I'd add the alignment attribute.

src/audio/drc/drc_math_hifi3.c

Use 64 bit SIMD for load/store and maximum absolute values search. This saves about 0.1 MCPS in MTL simulation in sof-testbench4. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

This change improves calculation speed. The implementation is bit exact with previous but implemented with 2-way SIMD multiply. The Horner polynomial evaluation is changed to parallel Horner version for two multipliers. The input and output shifting code is also simplified. The code changes save 1.0 MCPS in sof-testbench4 simulated MTL platform. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

singalsu commented Nov 8, 2024

View reviewed changes

lgirdwood reviewed Nov 11, 2024

View reviewed changes

singalsu force-pushed the drc_optimize_detector_average_inverse branch from 8ccefd0 to 9a1a160 Compare November 11, 2024 17:17

singalsu commented Nov 11, 2024

View reviewed changes

src/audio/drc/drc_math_hifi3.c Show resolved Hide resolved

singalsu commented Nov 11, 2024

View reviewed changes

singalsu marked this pull request as ready for review November 13, 2024 14:17

singalsu requested a review from a team as a code owner November 13, 2024 14:17

lgirdwood reviewed Nov 13, 2024

View reviewed changes

cujomalainey requested a review from johnylin76 November 15, 2024 01:21

kv2019i approved these changes Nov 15, 2024

View reviewed changes

src/audio/drc/drc_math_hifi3.c Show resolved Hide resolved

Audio: DRC: Optimize HiFi4 drc_update_detector_average()

6117ceb

Use 64 bit SIMD for load/store and maximum absolute values search. This saves about 0.1 MCPS in MTL simulation in sof-testbench4. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

singalsu force-pushed the drc_optimize_detector_average_inverse branch from 9a1a160 to 64a8433 Compare November 15, 2024 14:19

singalsu force-pushed the drc_optimize_detector_average_inverse branch from 64a8433 to c46e4e3 Compare November 15, 2024 14:39

lgirdwood approved these changes Nov 18, 2024

View reviewed changes

lgirdwood merged commit 5c0bee8 into thesofproject:main Nov 22, 2024

Conversation

singalsu commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

singalsu commented Nov 12, 2024

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kv2019i left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

singalsu commented Nov 8, 2024 •

edited

Loading