Skip to content

Convert moments in tile algorithms to HAL (1.3x faster for VSX).#15828

Merged
alalek merged 3 commits intoopencv:3.4from
ChipKerchner:momentsToHal
Nov 5, 2019
Merged

Convert moments in tile algorithms to HAL (1.3x faster for VSX).#15828
alalek merged 3 commits intoopencv:3.4from
ChipKerchner:momentsToHal

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

@ChipKerchner ChipKerchner commented Nov 1, 2019

Convert moments in tile algorithms to HAL (1.3x faster for VSX).

force_builders=ARMv7,ARMv8,Custom
buildworker:Custom=linux-1
#build_image:Custom=javascript-simd
build_image:Custom=mips64el

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

Found a "bug" in the NEON version of HAL. Creating registers does NOT initialize them to zero like SSE and VSX HAL. Wondering if this should be fixed for consistency.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

x0 = buf[0];
x1 = buf[1];
x2 = buf[2];
x3 = buf64[0] + buf64[1];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we don't need to force 64F here (at least in such complex form).

This is post-processing, so performance here is not critical.

Could you try this approach:

x0 = v_reduce_sum(v_x0);
x1 = v_reduce_sum(v_x1);
x2 = v_reduce_sum(v_x2);
int64 CV_DECL_ALIGNED(16) buf64[2];  // avoid declarations as class fields
v_store_aligned(buf64, v_reinterpret_as_s64(v_x3));
x3 = buf64[0] + buf64[1];

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thank you 👍

@alalek alalek merged commit 2112aa3 into opencv:3.4 Nov 5, 2019
@ChipKerchner ChipKerchner deleted the momentsToHal branch November 5, 2019 17:54
@alalek alalek mentioned this pull request Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants