Math: Optimize sofm_exp_fixed() HiFi version by singalsu · Pull Request #9631 · thesofproject/sof

singalsu · 2024-10-31T15:53:45Z

No description provided.

lyakh · 2024-11-04T08:57:12Z

src/math/exp_fcn_hifi.c

 		xs >>= 1;
-		n++;
+		n <<= 1;
 	}


can we not just count the number of bits to shift and shift by N directly? Maybe with a branch like

if (xs >= SOFM_EXP_TWO_Q27) xs >>= ffs(xs) - ffs(SOFM_EXP_TWO_Q27); else if (xs <= SOFM_EXP_MINUS_TWO_Q27) xs >>= ...;

? The actual shift above is likely wrong, but that would be the idea

There's a HiFi instruction to count bits for normalization, so maybe such instruction could be used. If counting bits is done in a loop like this it would not help.

There's also the Q27 vs. Q28 format issue, here we shift right one too much and in next we shift left. But I could not get measurable improvement from trying to optimize that, and I'm not sure I preserved right the equation so I left it out from this.

Also, it's max. 3 loops for worst case large input number. So, the overhead from counting bits other way needs to be small.

@singalsu on Xtensa __builtin_ffs() uses the NSAU instruction to count those bits

Thanks, I didn't know, I'll try with it.

@singalsu yep, we have definitions for all the ffs*() and clz*() variants in

sof/src/include/sof/common.h

Lines 133 to 139 in 53f2d96

#define ffs(i) __builtin_ffs(i)

#define ffsl(i) __builtin_ffsl(i)

#define ffsll(i) __builtin_ffsll(i)

#define clz(i) __builtin_clz(i)

#define clzl(i) __builtin_clzl(i)

#define clzll(i) __builtin_clzll(i)

OK, I'm able to save another 0.4 MCPS with normalize HiFi instruction based input value scaling to replace the while loop. Also even more saving seems to be possible with allowing range -4 to +4 instead of -2 to +2 for the small value exponent function. I'll update this PR tomorrow.

kv2019i

Another great looking improvement! One question about using rand for the test case, plesae check inline.

kv2019i · 2024-11-04T08:59:50Z

test/cmocka/src/math/arithmetic/exponential.c

+	int32_t ivalue, iexp_value;
+	int i;
+
+	srand((unsigned int)time(NULL));


A bit mixed feeling about this. Randomization can work, but then you'd have to have enough iterations to cover meaningful amount of the test space to provide repeatable and reliable results (see e.g. https://scotthannen.org/blog/2024/05/20/dont-use-random-numbers-in-tests.html).

I followed here testing of the other small values exp() function that uses random numbers (there was a discussion about it in review). Random numbers give good coverage over time if the random numbers are good (the failing input / output value is printed). But of course with 256 points in one run, some issue might be missed. But what would then be suitable number of points to trust that 32 bit inputs are properly tested in acceptable time. The cmocka test is run with xtensa simulator, so it could take quite long.

I still kept this. If it's a blocker, also easy to change to a fixed grid of test points. As you prefer.

random is not reproducible but does offer a more varied surface area for the tests. @singalsu for the moment can we make this a static table as we dont yet have the infra to deal with test randomization.

Yep, knowing the exponent nature, I think I could test with a dense grid of 1 lsb at min and max values and some 100 points grid for intermediate values.

I think there's a mistake in random values of the other function test in this same source file so I should fix it too. The values tested are actually only integer decimal values +0.5 and not entire 32 bit range.

can we have both? A fixed set of points and a few more random ones?

Yep, I can do that since I have already code for random.

Changed my mind again. After adding the linear grid test with 3 regions of exp() asymptotic approach of x-axis, curve up, asymptotic approach of y-axis I thought that having random added is low value, so I removed it.

src/math/exp_fcn_hifi.c

kv2019i

Unless my eyes deceive, the last style issue is still there...

src/math/exp_fcn_hifi.c

There was no unit test for the function, so it is added. The pass criteria is tuned for current implementation to just pass in the three defined regions. It is useful to verify the next changes to optimize the function. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The unnecessary shift and multiply functions can be removed with use of normal C shift left and with use xtensa multiply, shift, and round intrinsics directly in the function. This change saves in TGL HiFi3 platform 1.3 MCPS in DRC processing mode. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

lyakh · 2024-11-18T08:03:38Z

test/cmocka/src/math/arithmetic/exponential.c

+	double fstep = (b - a) / (step_count - 1);
+	double fvalue = a + fstep * point;
+
+	*value = (int32_t)round(fvalue * 134217728.0);


out of curiosity - that number has a meaning, right? I might be the only one who doesn't recognise it (ok, I'm sure it isn't Pi and it isn't e, I'm certain enough about that :-) ) but maybe a comment would be helpful

singalsu marked this pull request as ready for review November 1, 2024 11:35

singalsu requested review from dbaluta, kv2019i, lbetlej, lgirdwood, mmaka1 and plbossart as code owners November 1, 2024 11:35

lgirdwood approved these changes Nov 1, 2024

View reviewed changes

lyakh reviewed Nov 4, 2024

View reviewed changes

kv2019i reviewed Nov 4, 2024

View reviewed changes

singalsu force-pushed the exp_fixed_optimize branch from bdd20f9 to 210bff1 Compare November 5, 2024 16:28

lyakh reviewed Nov 6, 2024

View reviewed changes

src/math/exp_fcn_hifi.c Show resolved Hide resolved

singalsu requested review from kv2019i and lyakh November 12, 2024 16:02

kv2019i reviewed Nov 12, 2024

View reviewed changes

src/math/exp_fcn_hifi.c Show resolved Hide resolved

singalsu force-pushed the exp_fixed_optimize branch from ed44aac to d8f21c1 Compare November 13, 2024 14:14

singalsu added 2 commits November 15, 2024 15:57

singalsu force-pushed the exp_fixed_optimize branch from d8f21c1 to 9b19183 Compare November 15, 2024 13:59

singalsu requested a review from kv2019i November 15, 2024 14:04

lyakh approved these changes Nov 18, 2024

View reviewed changes

kv2019i approved these changes Nov 18, 2024

View reviewed changes

kv2019i merged commit d2a231f into thesofproject:main Nov 18, 2024

	#define ffs(i) __builtin_ffs(i)
	#define ffsl(i) __builtin_ffsl(i)
	#define ffsll(i) __builtin_ffsll(i)

	#define clz(i) __builtin_clz(i)
	#define clzl(i) __builtin_clzl(i)
	#define clzll(i) __builtin_clzll(i)

Conversation

singalsu commented Oct 31, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kv2019i left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kv2019i left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants