Audio: DRC: Some HiFi4 and HiFi3 optimizations#9646
Audio: DRC: Some HiFi4 and HiFi3 optimizations#9646lgirdwood merged 2 commits intothesofproject:mainfrom
Conversation
src/audio/drc/drc_math_hifi3.c
Outdated
| * p1p0.l holds p0(x) | ||
| * in3p1.h holds p1(x) * x^3 | ||
| */ | ||
| acc = AE_MOVAD32_L(AE_ADD32_HL_LH(in3p1, in3p1)); |
There was a problem hiding this comment.
1st working version, this should be possible optimize further. If it works, it should apply to other used polynomial evaluation based functions.
There was a problem hiding this comment.
great - lets takes this as step 1. The future optimizations can merge as other PRs.
There was a problem hiding this comment.
There's now a better version, but I'm out of ideas how to improve this further. Also, the polynomial is too low order for four parallel multipliers version of Horner, there would be setup overhead and redundant zero coefficients.
src/audio/drc/drc_math_hifi3.c
Outdated
| * p1p0.l holds p0(x) | ||
| * in3p1.h holds p1(x) * x^3 | ||
| */ | ||
| acc = AE_MOVAD32_L(AE_ADD32_HL_LH(in3p1, in3p1)); |
There was a problem hiding this comment.
great - lets takes this as step 1. The future optimizations can merge as other PRs.
8ccefd0 to
9a1a160
Compare
src/audio/drc/drc_math_hifi3.c
Outdated
| int32_t precision_inv; | ||
| int32_t sqrt2_extracted = 0; | ||
| ae_f32 acc; | ||
| ae_f32 coef[2]; |
There was a problem hiding this comment.
__aligned(8) ae_f32 coef[2];
There was a problem hiding this comment.
@singalsu does this need aligned i.e. will it be used for SIMD ? I would assume that the SIMD intrinsics would force alignment in their definition.
There was a problem hiding this comment.
I think I need to add that since I declare it as array of 32-bit while I use it in hot code with 64 bit load. The compiler may figure that out but I think there is no guarantee. I didn't get align faults when I ran this but better to be safe.
|
I added unit test patch for the changed function separately in #9649 |
src/audio/drc/drc_math_hifi3.c
Outdated
| int32_t precision_inv; | ||
| int32_t sqrt2_extracted = 0; | ||
| ae_f32 acc; | ||
| ae_f32 coef[2]; |
There was a problem hiding this comment.
@singalsu does this need aligned i.e. will it be used for SIMD ? I would assume that the SIMD intrinsics would force alignment in their definition.
kv2019i
left a comment
There was a problem hiding this comment.
Very helpful comments. A few minor things, but no showstoppers. I'd add the alignment attribute.
Use 64 bit SIMD for load/store and maximum absolute values search. This saves about 0.1 MCPS in MTL simulation in sof-testbench4. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
9a1a160 to
64a8433
Compare
This change improves calculation speed. The implementation is bit exact with previous but implemented with 2-way SIMD multiply. The Horner polynomial evaluation is changed to parallel Horner version for two multipliers. The input and output shifting code is also simplified. The code changes save 1.0 MCPS in sof-testbench4 simulated MTL platform. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
64a8433 to
c46e4e3
Compare
No description provided.