core: fix Core_GEMM.accuracy failure on recent macOS#25454
core: fix Core_GEMM.accuracy failure on recent macOS#25454asmorkalov merged 1 commit intoopencv:4.xfrom
Core_GEMM.accuracy failure on recent macOS#25454Conversation
Core_GEMM.accuracy failure on recent macOS
|
It is even weirder that if I set unrolling by factor of 2 instead of 4, it passes. Could it be something related to the compiler? #if CV_ENABLE_UNROLLED
for(; j <= m - 2; j += 2 )
{
WT t0 = d_buf[j] + WT(b_data[j])*al;
WT t1 = d_buf[j+1] + WT(b_data[j+1])*al;
d_buf[j] = t0;
d_buf[j+1] = t1;
// t0 = d_buf[j+2] + WT(b_data[j+2])*al;
// t1 = d_buf[j+3] + WT(b_data[j+3])*al;
// d_buf[j+2] = t0;
// d_buf[j+3] = t1;
}
#endif |
vpisarev
left a comment
There was a problem hiding this comment.
yes, we discussed it. probably, it's some weird bug in latest Clang compiler. Given that modern compiler do great job with code vectorization, the version without manually unrolled loop is not only simpler, but probably faster as well
|
Are there some aliased pointers on the same memory for reading and writing? (inplace processing) |
Probably no.
opencv/modules/core/src/matmul.simd.hpp Lines 1246 to 1257 in 5da17a4 |
|
Test is now green with this patch merged https://github.com/opencv/ci-gha-workflow/actions/runs/8750864630/job/24015266117?pr=171 |
Resolves #25302
Reproducer: https://github.com/opencv/ci-gha-workflow/actions/runs/8747714722/job/24006610667?pr=171#step:12:1041
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.