Skip to content

Fixed invalid cast and unaligned memory access#23734

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
seanm:unaligned-copy
Aug 11, 2023
Merged

Fixed invalid cast and unaligned memory access#23734
asmorkalov merged 1 commit intoopencv:4.xfrom
seanm:unaligned-copy

Conversation

@seanm
Copy link
Copy Markdown
Contributor

@seanm seanm commented Jun 2, 2023

Although acceptable to Intel CPUs, it's still undefined behaviour according to the C++ standard.

It can be replaced with memcpy, which makes the code much simpler, and it generates the same assembly code with gcc and clang with -O2 (verified with godbolt).

@seanm
Copy link
Copy Markdown
Contributor Author

seanm commented Jun 2, 2023

Here you can see that code generation is as good as before: https://godbolt.org/z/bcW3qxd5K

@seanm
Copy link
Copy Markdown
Contributor Author

seanm commented Jun 5, 2023

The one bot warning:

C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppBuild.targets(392,5): warning MSB8028: The intermediate directory (opencv_perf_xfeatures2d.dir\Release\) contains files shared from another project (*.vcxproj).  This can lead to incorrect clean and rebuild behavior. [C:\build\precommit_windows64\build\modules\xfeatures2d\opencv_perf_xfeatures2d.vcxproj]

Doesn't seem like it could have been caused by my change here...

@vpisarev vpisarev self-requested a review June 7, 2023 01:18
@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Jun 7, 2023

@seanm, sorry, this patch cannot be accepted. We use platform-agnostic little-endian data representation. On big-endian platforms the proposed code will provide wrong results.

@seanm
Copy link
Copy Markdown
Contributor Author

seanm commented Jun 7, 2023

@seanm, sorry, this patch cannot be accepted. We use platform-agnostic little-endian data representation. On big-endian platforms the proposed code will provide wrong results.

@vpisarev thanks for your review. It did look to me like there was supposed to be byte-swapping here, but if that's the intention, I think the code is already broken. Before my change, it had:

#if defined __i386__ || defined(_M_IX86) || defined __x86_64__ || defined(_M_X64)
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 1
#else
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 0
#endif

But Intel is not the only little endian processor. As you know, almost everything is little endian now. Apple Silicon for example is little endian, but goes into the else.

What is CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS supposed to mean? Should it be true for all little endian systems? Or only intel little endian?

I actually have OpenBSD working on an old big endian PowerPC Mac, I could try to build OpenCV there...

@opencv-alalek
Copy link
Copy Markdown
Contributor

What is CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS supposed to mean?

It is an internal optimization flag for some platforms which are LE and they support unaligned access by default (even x86 has "align" flag, but it is not the default - https://stackoverflow.com/questions/548164/mis-aligned-pointers-on-x86).

If you have issue with sanitizers please update condition to filter out enabling of that flag.


This patch introduces a bug - on different platforms data is interpreted / stored in a different way (LE / BE). This is critical for storing / transferring of data which should be platform-independent.

See also:

@asmorkalov
Copy link
Copy Markdown
Contributor

The patch was discussed on OpenCV core team meeting. The team decided to reject it, because it breaks some platforms.

@asmorkalov asmorkalov closed this Jun 9, 2023
@seanm
Copy link
Copy Markdown
Contributor Author

seanm commented Jun 9, 2023

Please reopen. I can update the patch to fix the endian issue. But discussion is required first.


What is CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS supposed to mean?

It is an internal optimization flag for some platforms which are LE and they support unaligned access by default...

I think you misunderstood my question. I know about unaligned access on x86. I'm saying that, before my patch even, the code was wrong.

This below test is not determining if the system is little endian vs big endian. It is determining if the system is x86 or not. x86 is not the only little endian system.

#if defined __i386__ || defined(_M_IX86) || defined __x86_64__ || defined(_M_X64)
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 1
#else
#define CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS 0
#endif

Then this code:

static inline int readInt(const uchar* p)
{
#if CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS
    return *(const int*)p;
#else
    int val = (int)(p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24));
    return val;
#endif
}

Is using CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS as if it means big vs little.

Little endian systems like Apple Silicon are going into the #else branch and doing the byte swap. Is that what you want? I though no, which I why I removed the byte swapping code.

Which big endian systems does OpenCV actually support? I tried building on PowerPC OpenBSD, but OpenCV does not even compile. Does your CI have any big endian?

@asmorkalov asmorkalov reopened this Jun 9, 2023
@opencv-alalek
Copy link
Copy Markdown
Contributor

int val = (int)(p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24));

Little endian systems like Apple Silicon are going into the #else branch and doing the byte swap.

There is no byte-swap in that code for other LE systems.
This code is agnostic of used BE/LE systems and provides the same result. memcpy() doesn't do that!

Is using CV_UNALIGNED_LITTLE_ENDIAN_MEM_ACCESS as if it means big vs little.

Please read the previous comment again. Your assumption is wrong.
Again, this is optimization code path which assumes:

  • unaligned access is allowed/supported.
  • system is LE (so we don't need bytes swap and could write directly)

I tried building on PowerPC OpenBSD

There is no official support for *BSD. It is community driven activity.

Although acceptible to Intel CPUs, it's still undefined behaviour according to the C++ standard.

It can be replaced with memcpy, which makes the code simpler, and it generates the same assembly code with gcc and clang with -O2 (verified with godbolt).

Also expanded the test to include other little endian CPUs by testing for __LITTLE_ENDIAN__.
@seanm
Copy link
Copy Markdown
Contributor Author

seanm commented Jun 9, 2023

There is no byte-swap in that code for other LE systems.
This code is agnostic of used BE/LE systems and provides the same result.

You are correct! I got that very mixed up! Please see my updated commit (and commit message).

@seanm seanm force-pushed the unaligned-copy branch from 4a68cf0 to 57da72d Compare June 9, 2023 22:58
@seanm
Copy link
Copy Markdown
Contributor Author

seanm commented Jun 16, 2023

@opencv-alalek does this seem correct and mergeable now?

@asmorkalov
Copy link
Copy Markdown
Contributor

@vpisarev @opencv-alalek could you take a look?

1 similar comment
@asmorkalov
Copy link
Copy Markdown
Contributor

@vpisarev @opencv-alalek could you take a look?

Copy link
Copy Markdown
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godbolt shows that compilers are smart enough to optimize this properly, so no changes are expected for modern platforms.

@asmorkalov asmorkalov merged commit 747b7ca into opencv:4.x Aug 11, 2023
@asmorkalov asmorkalov added this to the 4.9.0 milestone Aug 11, 2023
@asmorkalov asmorkalov mentioned this pull request Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants