Skip to content

RISC-V: fix mul 8/16 bit for RVV 0.7#24931

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
mshabunin:fix-rvv07-mul
Jan 29, 2024
Merged

RISC-V: fix mul 8/16 bit for RVV 0.7#24931
asmorkalov merged 1 commit intoopencv:4.xfrom
mshabunin:fix-rvv07-mul

Conversation

@mshabunin
Copy link
Copy Markdown
Contributor

In 0.7.1 RVV implementation multiplication had performed unnecessary operations unpacking and then packing values back again. Removed this part. Performance on LicheePi 4A with Xuantie 2.8.0 toolchain have increased 5.82 ms -> 1.06 ms (1920x1080 / CV_8UC1 - BinaryOpTest.multiply/20). Accuracy tests for core and imgproc pass with the same failures as before the fix.

vuint16m2_t res = vwmulu_vv_u16m2(a, b, 16);

// following calls are not needed - they unpack values and pack them back again
vuint16m1_t c = vget_v_u16m2_u16m1(res, 0);
vuint16m1_t d = vget_v_u16m2_u16m1(res, 1);
vuint16m2_t im = vundefined_u16m2();
im = vset_v_u16m1_u16m2(im, 0, c);
im = vset_v_u16m1_u16m2(im, 1, d);
// emd - we can pass 'res' directly to 'vnclipu'

vuint8m1_t fin = vnclipu_wx_u8m1(im, 0, 16);

Note: RVV-scalable implementation already uses shortened code

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit 8ed0319 into opencv:4.x Jan 29, 2024
@mshabunin mshabunin deleted the fix-rvv07-mul branch January 29, 2024 09:53
This was referenced Feb 3, 2024
@dkurt dkurt added this to the 4.10.0 milestone Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants