Skip to content

Fixed several cases of unaligned pointer cast#26547

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
mshabunin:fix-type-cast
Dec 3, 2024
Merged

Fixed several cases of unaligned pointer cast#26547
asmorkalov merged 1 commit intoopencv:4.xfrom
mshabunin:fix-type-cast

Conversation

@mshabunin
Copy link
Copy Markdown
Contributor

Fixed several cases of casting from uchar* to wider pointer type (e.g. uint64_t*) which can cause unaligned access.

{
result += (int)CV_POPCNT_U64(*(uint64*)(a + i));
uint64_t val;
std::memcpy(&val, a + i, sizeof(val));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need extra code path under alignment check to avoid possible perf regressions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On x86_64 and AArch64 the machine code will be the same (https://godbolt.org/z/4sMoE8cWG), on ARMv7 and RISC-V it could be less efficient. IMO we can keep the simple code for now and then try to optimize it if it would be necessary.

It might be better to add separate scalar function <typename T> int popcnt64(T * ptr) instead of this macro and maybe sum bits for each byte instead of switching between aligned/unaligned load for specific platforms.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion decided to leave it without extra alignment checks because this is tail processing and shouldn't cause perf degradation on popular platforms.

{
return (((const int*)"\0\x1\x2\x3\x4\x5\x6\x7")[0] & 255) != 0;
const uint32_t val = 1;
return (*(uint8_t *)&val != 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, we have compiler definition from CMake scripts. See WORDS_BIGENDIAN

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this function to use preprocessor macro. The same macro is used in libtiff, libwebp and softfloat libraries.

v_int16 v_mul01 = v_reinterpret_as_s16(vx_setall_u32(*((uint32_t*)m)));
uint32_t val01;
std::memcpy(&val01, m, sizeof(val01));
v_int16 v_mul01 = v_reinterpret_as_s16(vx_setall_u32(val01));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is initially wrong relating to little/big endian correctness.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCV has not been tested on big endian architecture AFAIK, especially with SIMD optimizations. Perhaps we should leave it as-is for now.

@opencv-alalek
Copy link
Copy Markdown
Contributor

BTW, this compiler automated check should also enable CV_STRONG_ALIGNMENT flag from here: #16463

@asmorkalov asmorkalov added this to the 4.11.0 milestone Dec 3, 2024
@asmorkalov asmorkalov merged commit 03cedee into opencv:4.x Dec 3, 2024
@mshabunin mshabunin deleted the fix-type-cast branch December 3, 2024 12:27
@asmorkalov asmorkalov mentioned this pull request Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants