-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
SIGBUS crash in pointSetBoundingRect() on ARMv7-A (armeabi-v7a) due to unaligned NEON load #25265
Description
System Information
OpenCV version: 3.4+
Platform: Android armeabi-v7a, presumably all 32-bit armv7-a boards with NEON enabled
Compiler: clang ver 14.0.6 in Android NDK 25b, generation of crashing instruction changes randomly with clang version
Detailed description
We have hit this crash in production here at ZenID.
Happens on 32-bit ARM (armeabi-v7a ABI) phones, including natively 64-bit phones forced to run 32-bit executable.
It may happen when feeding data to cv::boundingRect() that isn't aligned to 64 bit, depending on your luck with compiler optimization.
In our case we had
struct Foo {
...
std::array<cv::Point, 4> Outline; // alignof() == 4
...
}
...
Foo bar;
cv::boundingRect(bar.Outline);
It crashes on https://github.com/opencv/opencv/blob/843c09bc/modules/imgproc/src/shapedescr.cpp#L899
vx_load_low() uses vld1_u64(uint64_t*) intrinsic under the hood, which may translate into either vldr {d0}, [r0] or vld1.32 {d0}, [r0:64], depending on your luck with optimizer.
vldr is a generic 64 bit load wich requires alignment to 32 bit, vld1.32 would normally only require 32 bit alignment too (and supports unaligned load?), but clang adds an alignment hint (reasonably since it's being passed int64_t*) and that results in a crash.
For us it compiled to vldr with NDK 21d (clang ver 9.0.8) but then it became vld1.32 with NDK 25b (clang ver 14.0.6) and started crashing.
Godbolt repro of this behavior (play with the alignment attribute on ua64 and see what happens):
- clang 14.0.0
vld1.32 {d16}, [r0:64]- crash - clang 9.0.1
vldr {d16}, [r0]- no crash
also see alignment requirements from ARM reference manual at the end
There are three ways I see of fixing it:
-
Use
int32_t*inpointSetBoundingRect().
I don't really see why it needs to operate on int64_t* when input depth is asserted to be 32 bits and it casts all the 64 bit vector loads to 32 bit anyway. Maybe there's some x86_64 SSE performance reason to do it?
This fixes the crash:
v_int32 ptXY = v_reinterpret_as_s32(v_expand_low(v_reinterpret_as_u32(vx_load_low((int32_t*)(pts + i)))));
However, somewhere in the codebase more unaligned NEON loads are sure to be lurking, hidden by vldr instructions and it may randomly explode on us in the future when the optimizer changes slightly.
-
Make v_load/v_store intrinsics accept unaligned data.
Draft PR: #25267
This results in vld1.8 {d0}, [r0] instruction and solves all cases of load/store on unaligned address.
#if defined(__clang__) && defined(__arm__)
#define OPENCV_HAL_IMPL_NEON_LOAD_LOW_OP(_Tpvec, _Tp, suffix) \
inline _Tpvec v_load_low(const _Tp* ptr) \
{ \
typedef _Tp CV_DECL_ALIGNED(1) unaligned_ptr; \
unaligned_ptr* uptr = (unaligned_ptr*)ptr; \
return _Tpvec(vcombine_##suffix(vld1_##suffix(uptr), vdup_n_##suffix((_Tp)0))); \
}
// and same for other load/store
This is the solution we picked since it means we don't have to sift through the entire opencv codebase looking for potential unaligned access.
-
Allow alignment to lane type but no higher than 32 bit in load/store
Draft PR: #25266
This is less fool-proof but also less invasive. In effect it ensures that vx_load_low() remains as vldr, which happens half the time anyway. I expect that opencv has a few more places where 64 bit NEON load aligned to 32 bit can happen, and very few (or none) places where you end up with ≤32 NEON load that is misaligned.
On the assembly level that means:
vld1.32 {d0}, [r0:64] =>vldr {d0}, [r0]` (clang already does this half the time, non-determinisically!)vld1.64 {d0}, [r0] =>vld1.32 {d0}, [r0]`vld1.64 {d0, d1}, [r0]=>vld1.32 {d0, d1}, [r0]vst1.64 {d0, d1}, [r0]=>vst1.32 {d0, d1}, [r0]
So basically do this, only for 64bit load/store:
inline _Tpvec v_load_low(const _Tp* ptr) \
{ \
typedef _Tp CV_DECL_ALIGNED(4) aligned32_ptr; \
aligned32_ptr* uptr = (aligned32_ptr*)ptr; \
return _Tpvec(vcombine_##suffix(vld1_##suffix(uptr), vdup_n_##suffix((_Tp)0))); \
}
and do the same for v_load(), v_store(), etc.
I've looked at prior art: PR #16152 attempted something like my option 3 but making the 64-bit load/store completely unaligned. I think it doesn't make sense to go this far and only for 64-bit. That PR was closed with #16139 + #16463 used instead.
On the face of it, it makes sense to fix unaligned access on the algo side. In practice, that work is driven by bug reports and vldr vs vld1.32 thing is particularly nasty, because you don't know if it will compile into the crashing instruction, so it's hard to test for it even with full coverage, and crashes may stay lurking for years before suddenly appearing with a new compiler version.
Alignment requirements for the relevant instructions for your reference:

Steps to reproduce
Feed 64-bit misaligned (and 32-bit aligned) array of ints or floats to cv::boundingRect(), using clang ver 14.0.6 or any other version that happens to produce vld1.32 {d0}, [r0:64] instruction.
Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)