dnn: fix unaligned memory access crash on armv7#20725
Conversation
alalek
left a comment
There was a problem hiding this comment.
Thank you for the contribution!
| } | ||
|
|
||
| Mat getTensorContent(const tensorflow::TensorProto &tensor, bool copy) | ||
| Mat getTensorContentRefUnaligned(const tensorflow::TensorProto& tensor) |
There was a problem hiding this comment.
Mat's data pointer must be aligned on base type requirements.
Otherwise it is impossible to use it in OpenCV algorithms on some platforms.
Mat getTensorContentRefUnaligned
So this function is wrong by design. Mat should not contain "unaligned" pointer.
I believe, it is better to fix existed implementation.
Check pointer alignment and reallocate Mat for such case.
Parameter copy should be renamed to forceCopy due to behavior changes.
There was a problem hiding this comment.
The code's structure does not allow realigning the offending Mat, because PReLUSubgraph's finalize method depends on having write access to the misaligned data block.
My motivation with the PR was focused on resolving the crash without changing semantics.
I do not sufficiently understand the remainder of the implementation at this point to reengineer its complete structure and keep the functionality of PReLUSubgraph by propagating an aligned copy of the tensor content. Instead, I have taken special care that the previously broken code now behaves correctly and as probably intended by the author by accessing the offending Mat via memcpy only.
To address your last sentence regarding renaming copy => forceCopy: You cannot safely keep the existing behavior, it goes against what you said in the beginning where a Mat must be aligned. No copy equals not aligned, since raw Protobuf data goes into that function.
Please let me know how to proceed here. I could have it return a void* instead, which would technically fulfill your requirement of not having an unaligned Mat as return value?
Is a minimal patch sufficient as long as it is an improvement in general, or is a rewrite/restructure required?
The getTensorContent function would return a Mat pointing to some member of a Protobuf-encoded message. Protobuf does not make any alignment guarantees, which results in a crash on armv7 when loading models while bit 2 is set in /proc/cpu/alignment (or the relevant kernel feature for alignment compatibility is disabled). Any read attempt from the previously unaligned data member would send SIGBUS. As workaround, this commit makes an aligned copy via existing clone functionality in getTensorContent. The unsafe copy=false option is removed. Unfortunately, a rather crude hack in PReLUSubgraph in fact writes(!) to the Protobuf message. We limit ourselves to fixing the alignment issues in this commit, and add getTensorContentRefUnaligned to cover the write case with a safe memcpy. A FIXME marks the issue.
|
Rebased to 3.4 branch and reduced amount of |
|
@alalek missed your email, sorry! Thank you for looking into this, I will attempt to verify on Monday. Our application right now builds against OpenCV 4 but the API should be compatible enough to give this a try. |
|
Just a quick status update: I am making progress, OpenCV 3 now builds for affected Android platforms, integration into affected component is pending. (3 hours later:) I can now reproduce the issue with OpenCV 3 🎉 Will proceed with validating the patch. |
| Mat scales = scalesRef.clone() * -1; | ||
| Mat scalesRef = getTensorContentRef_(inputNodes[1]->attr().at("value").tensor()); | ||
| // FIXME: This breaks the const guarantees of tensor() by writing to scalesRef and is likely to perform an | ||
| // unaligned write (hence the separate multiplication above to have this work on ARM). |
There was a problem hiding this comment.
Comment's reference to multiplication (in brackets) is outdated with this commit
|
BTW, you can merge / cherry-pick these commits on the master(4.x) branch for validation. |
|
Good news, I can confirm that the patch with your changes resolves the issue! Note that my validation did not cover the
This gave me a nice excuse to restore compatibility with both OpenCV versions ;-). I prefer to test the exact state submitted here, even if it takes longer. |
|
Thank you too for reviewing and improving performance here 👍 |
Hello,
The getTensorContent function would return a Mat pointing to some member of a Protobuf-encoded message. Protobuf does not make any alignment guarantees, which results in a crash on armv7 when loading models while bit 2 is set in
/proc/cpu/alignment(or the relevant kernel feature for alignment compatibility is disabled/unavailable). Any read attempt from the previously unaligned data member would send SIGBUS.As workaround, this commit makes an aligned copy via existing clone functionality in getTensorContent. The unsafe copy=false option is removed. Unfortunately, a rather crude hack in PReLUSubgraph in fact writes to the Protobuf message. I only fix the alignment issue in this commit, and add getTensorContentRefUnaligned to cover the write case with a safe memcpy. A FIXME marks the issue.
Affected Environment
This issue is reliably reproducible by loading any floating-point model in an optimized(!) build of OpenCV 4.5.X, including master, compiled by clang for armv7 as shipped with the Android NDK 21. The issue does not happen with all phones -- some have kernel emulation enabled to fix up the unaligned read automatically, others have support for unaligned reads on CPU level. The most reliable environment to find this issue is with old Android 6 phones, e.g. Samsung Galaxy S5 Neo.
Caveats
This patch makes loading models just slightly slower in theory, because an aligned copy is made in getTensorContent. It is possible to optimize this branch for dims!=4 when both the input and output data type is a 32-bit float.
However, I could not find a test case where this made a difference beyond a few microseconds, hence I avoided the complexity of implementing a fast branch for that specific case.
The copy cannot safely be avoided otherwise, because unaligned access is undefined behavior in C even on platforms that support it.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
There is reference to original bug report and related workI could not find any previous bug report similar to this issue, only unrelated alignment crash issues. Bug described in PR instead.There is accuracy test, performance test and test data in opencv_extra repositoryExisting tests cover the issue and find the bug on affected Linux arm platforms with the kernel settings described in the commit message.The feature is well documented and sample code can be built with the project CMakeNot a feature.