Fixed invalid cast and unaligned memory access by seanm · Pull Request #23734 · opencv/opencv

seanm · 2023-06-02T15:10:31Z

Although acceptable to Intel CPUs, it's still undefined behaviour according to the C++ standard.

It can be replaced with memcpy, which makes the code much simpler, and it generates the same assembly code with gcc and clang with -O2 (verified with godbolt).

seanm · 2023-06-02T16:14:55Z

Here you can see that code generation is as good as before: https://godbolt.org/z/bcW3qxd5K

seanm · 2023-06-05T14:52:05Z

The one bot warning:

C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppBuild.targets(392,5): warning MSB8028: The intermediate directory (opencv_perf_xfeatures2d.dir\Release\) contains files shared from another project (*.vcxproj).  This can lead to incorrect clean and rebuild behavior. [C:\build\precommit_windows64\build\modules\xfeatures2d\opencv_perf_xfeatures2d.vcxproj]

Doesn't seem like it could have been caused by my change here...

vpisarev · 2023-06-07T01:21:35Z

@seanm, sorry, this patch cannot be accepted. We use platform-agnostic little-endian data representation. On big-endian platforms the proposed code will provide wrong results.

seanm · 2023-06-07T01:37:34Z

@seanm, sorry, this patch cannot be accepted. We use platform-agnostic little-endian data representation. On big-endian platforms the proposed code will provide wrong results.

@vpisarev thanks for your review. It did look to me like there was supposed to be byte-swapping here, but if that's the intention, I think the code is already broken. Before my change, it had:

#if defined __i386__ || defined(_M_IX86) || defined __x86_64__ || defined(_M_X64)
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 1
#else
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 0
#endif

But Intel is not the only little endian processor. As you know, almost everything is little endian now. Apple Silicon for example is little endian, but goes into the else.

What is CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS supposed to mean? Should it be true for all little endian systems? Or only intel little endian?

I actually have OpenBSD working on an old big endian PowerPC Mac, I could try to build OpenCV there...

opencv-alalek · 2023-06-09T04:25:07Z

What is CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS supposed to mean?

It is an internal optimization flag for some platforms which are LE and they support unaligned access by default (even x86 has "align" flag, but it is not the default - https://stackoverflow.com/questions/548164/mis-aligned-pointers-on-x86).

If you have issue with sanitizers please update condition to filter out enabling of that flag.

This patch introduces a bug - on different platforms data is interpreted / stored in a different way (LE / BE). This is critical for storing / transferring of data which should be platform-independent.

See also:

asmorkalov · 2023-06-09T10:42:43Z

The patch was discussed on OpenCV core team meeting. The team decided to reject it, because it breaks some platforms.

seanm · 2023-06-09T13:08:17Z

Please reopen. I can update the patch to fix the endian issue. But discussion is required first.

What is CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS supposed to mean?

It is an internal optimization flag for some platforms which are LE and they support unaligned access by default...

I think you misunderstood my question. I know about unaligned access on x86. I'm saying that, before my patch even, the code was wrong.

This below test is not determining if the system is little endian vs big endian. It is determining if the system is x86 or not. x86 is not the only little endian system.

#if defined __i386__ || defined(_M_IX86) || defined __x86_64__ || defined(_M_X64)
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 1
#else
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 0
#endif

Then this code:

static inline int readInt(const uchar* p)
{
#if CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS
    return *(const int*)p;
#else
    int val = (int)(p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24));
    return val;
#endif
}

Is using CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS as if it means big vs little.

Little endian systems like Apple Silicon are going into the #else branch and doing the byte swap. Is that what you want? I though no, which I why I removed the byte swapping code.

Which big endian systems does OpenCV actually support? I tried building on PowerPC OpenBSD, but OpenCV does not even compile. Does your CI have any big endian?

opencv-alalek · 2023-06-09T20:57:37Z

int val = (int)(p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24));

Little endian systems like Apple Silicon are going into the #else branch and doing the byte swap.

There is no byte-swap in that code for other LE systems.
This code is agnostic of used BE/LE systems and provides the same result. memcpy() doesn't do that!

Is using CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS as if it means big vs little.

Please read the previous comment again. Your assumption is wrong.
Again, this is optimization code path which assumes:

unaligned access is allowed/supported.
system is LE (so we don't need bytes swap and could write directly)

I tried building on PowerPC OpenBSD

There is no official support for *BSD. It is community driven activity.

Although acceptible to Intel CPUs, it's still undefined behaviour according to the C++ standard. It can be replaced with memcpy, which makes the code simpler, and it generates the same assembly code with gcc and clang with -O2 (verified with godbolt). Also expanded the test to include other little endian CPUs by testing for __LITTLE_ENDIAN__.

seanm · 2023-06-09T22:57:42Z

There is no byte-swap in that code for other LE systems.
This code is agnostic of used BE/LE systems and provides the same result.

You are correct! I got that very mixed up! Please see my updated commit (and commit message).

seanm · 2023-06-16T01:21:31Z

@opencv-alalek does this seem correct and mergeable now?

asmorkalov · 2023-06-23T17:48:12Z

@vpisarev @opencv-alalek could you take a look?

asmorkalov · 2023-07-07T12:42:05Z

@vpisarev @opencv-alalek could you take a look?

opencv-alalek

godbolt shows that compilers are smart enough to optimize this properly, so no changes are expected for modern platforms.

asmorkalov requested a review from opencv-alalek June 2, 2023 15:37

asmorkalov added the category: core label Jun 2, 2023

vpisarev self-requested a review June 7, 2023 01:18

asmorkalov closed this Jun 9, 2023

asmorkalov reopened this Jun 9, 2023

seanm force-pushed the unaligned-copy branch from 4a68cf0 to 57da72d Compare June 9, 2023 22:58

opencv-alalek approved these changes Aug 11, 2023

View reviewed changes

asmorkalov merged commit 747b7ca into opencv:4.x Aug 11, 2023

asmorkalov added this to the 4.9.0 milestone Aug 11, 2023

asmorkalov assigned opencv-alalek Aug 11, 2023

asmorkalov mentioned this pull request Sep 11, 2023

(5.x) Merge 4.x #24254

Merged

Uh oh!

Conversation

seanm commented Jun 2, 2023

Uh oh!

seanm commented Jun 2, 2023

Uh oh!

seanm commented Jun 5, 2023

Uh oh!

vpisarev commented Jun 7, 2023

Uh oh!

seanm commented Jun 7, 2023

Uh oh!

opencv-alalek commented Jun 9, 2023

Uh oh!

asmorkalov commented Jun 9, 2023

Uh oh!

seanm commented Jun 9, 2023

Uh oh!

opencv-alalek commented Jun 9, 2023

Uh oh!

seanm commented Jun 9, 2023

Uh oh!

seanm commented Jun 16, 2023

Uh oh!

asmorkalov commented Jun 23, 2023

Uh oh!

asmorkalov commented Jul 7, 2023

Uh oh!

opencv-alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants