Math: Optimize sofm_exp_fixed() HiFi version#9631
Math: Optimize sofm_exp_fixed() HiFi version#9631kv2019i merged 2 commits intothesofproject:mainfrom
Conversation
src/math/exp_fcn_hifi.c
Outdated
| xs >>= 1; | ||
| n++; | ||
| n <<= 1; | ||
| } |
There was a problem hiding this comment.
can we not just count the number of bits to shift and shift by N directly? Maybe with a branch like
if (xs >= SOFM_EXP_TWO_Q27)
xs >>= ffs(xs) - ffs(SOFM_EXP_TWO_Q27);
else if (xs <= SOFM_EXP_MINUS_TWO_Q27)
xs >>= ...;
? The actual shift above is likely wrong, but that would be the idea
There was a problem hiding this comment.
There's a HiFi instruction to count bits for normalization, so maybe such instruction could be used. If counting bits is done in a loop like this it would not help.
There's also the Q27 vs. Q28 format issue, here we shift right one too much and in next we shift left. But I could not get measurable improvement from trying to optimize that, and I'm not sure I preserved right the equation so I left it out from this.
There was a problem hiding this comment.
Also, it's max. 3 loops for worst case large input number. So, the overhead from counting bits other way needs to be small.
There was a problem hiding this comment.
@singalsu on Xtensa __builtin_ffs() uses the NSAU instruction to count those bits
There was a problem hiding this comment.
Thanks, I didn't know, I'll try with it.
There was a problem hiding this comment.
OK, I'm able to save another 0.4 MCPS with normalize HiFi instruction based input value scaling to replace the while loop. Also even more saving seems to be possible with allowing range -4 to +4 instead of -2 to +2 for the small value exponent function. I'll update this PR tomorrow.
kv2019i
left a comment
There was a problem hiding this comment.
Another great looking improvement! One question about using rand for the test case, plesae check inline.
| int32_t ivalue, iexp_value; | ||
| int i; | ||
|
|
||
| srand((unsigned int)time(NULL)); |
There was a problem hiding this comment.
A bit mixed feeling about this. Randomization can work, but then you'd have to have enough iterations to cover meaningful amount of the test space to provide repeatable and reliable results (see e.g. https://scotthannen.org/blog/2024/05/20/dont-use-random-numbers-in-tests.html).
There was a problem hiding this comment.
I followed here testing of the other small values exp() function that uses random numbers (there was a discussion about it in review). Random numbers give good coverage over time if the random numbers are good (the failing input / output value is printed). But of course with 256 points in one run, some issue might be missed. But what would then be suitable number of points to trust that 32 bit inputs are properly tested in acceptable time. The cmocka test is run with xtensa simulator, so it could take quite long.
There was a problem hiding this comment.
I still kept this. If it's a blocker, also easy to change to a fixed grid of test points. As you prefer.
There was a problem hiding this comment.
random is not reproducible but does offer a more varied surface area for the tests. @singalsu for the moment can we make this a static table as we dont yet have the infra to deal with test randomization.
There was a problem hiding this comment.
Yep, knowing the exponent nature, I think I could test with a dense grid of 1 lsb at min and max values and some 100 points grid for intermediate values.
I think there's a mistake in random values of the other function test in this same source file so I should fix it too. The values tested are actually only integer decimal values +0.5 and not entire 32 bit range.
There was a problem hiding this comment.
can we have both? A fixed set of points and a few more random ones?
There was a problem hiding this comment.
Yep, I can do that since I have already code for random.
There was a problem hiding this comment.
Changed my mind again. After adding the linear grid test with 3 regions of exp() asymptotic approach of x-axis, curve up, asymptotic approach of y-axis I thought that having random added is low value, so I removed it.
bdd20f9 to
210bff1
Compare
kv2019i
left a comment
There was a problem hiding this comment.
Unless my eyes deceive, the last style issue is still there...
ed44aac to
d8f21c1
Compare
There was no unit test for the function, so it is added. The pass criteria is tuned for current implementation to just pass in the three defined regions. It is useful to verify the next changes to optimize the function. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The unnecessary shift and multiply functions can be removed with use of normal C shift left and with use xtensa multiply, shift, and round intrinsics directly in the function. This change saves in TGL HiFi3 platform 1.3 MCPS in DRC processing mode. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
d8f21c1 to
9b19183
Compare
| double fstep = (b - a) / (step_count - 1); | ||
| double fvalue = a + fstep * point; | ||
|
|
||
| *value = (int32_t)round(fvalue * 134217728.0); |
There was a problem hiding this comment.
out of curiosity - that number has a meaning, right? I might be the only one who doesn't recognise it (ok, I'm sure it isn't Pi and it isn't e, I'm certain enough about that :-) ) but maybe a comment would be helpful
No description provided.