core: fix `Core_GEMM.accuracy` failure on recent macOS by fengyuentau · Pull Request #25454 · opencv/opencv

fengyuentau · 2024-04-19T06:26:57Z

Resolves #25302

Reproducer: https://github.com/opencv/ci-gha-workflow/actions/runs/8747714722/job/24006610667?pr=171#step:12:1041

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

fengyuentau · 2024-04-19T07:56:53Z

It is even weirder that if I set unrolling by factor of 2 instead of 4, it passes. Could it be something related to the compiler?

                 #if CV_ENABLE_UNROLLED
                for(; j <= m - 2; j += 2 )
                {
                    WT t0 = d_buf[j] + WT(b_data[j])*al;
                    WT t1 = d_buf[j+1] + WT(b_data[j+1])*al;
                    d_buf[j] = t0;
                    d_buf[j+1] = t1;
                    // t0 = d_buf[j+2] + WT(b_data[j+2])*al;
                    // t1 = d_buf[j+3] + WT(b_data[j+3])*al;
                    // d_buf[j+2] = t0;
                    // d_buf[j+3] = t1;
                }
                #endif

vpisarev

yes, we discussed it. probably, it's some weird bug in latest Clang compiler. Given that modern compiler do great job with code vectorization, the version without manually unrolled loop is not only simpler, but probably faster as well

opencv-alalek · 2024-04-19T08:21:10Z

Are there some aliased pointers on the same memory for reading and writing? (inplace processing)

fengyuentau · 2024-04-19T08:37:34Z

Are there some aliased pointers on the same memory for reading and writing? (inplace processing)

Probably no.

d_buf is an AutoBuffer allocated in the domain of outside of the loop. It is set to zero for each element just before entering the problematic loop.
The source of b_data is a bit complecated. It is a parameter for the function and for test case 14, it is a piece of buffer copied from the original source B mat.

opencv/modules/core/src/matmul.simd.hpp

Lines 1246 to 1257 in 5da17a4

    
           if( dj < d_size.width ) 
        
           { 
        
               Size b_size; 
        
               if( !is_b_t ) 
        
                   b_size.width = dj, b_size.height = dk; 
        
               else 
        
                   b_size.width = dk, b_size.height = dj; 
        
               _b_step = b_size.width*elem_size; 
        
               GEMM_CopyBlock( _b, b_step, b_buf, _b_step, b_size, elem_size ); 
        
               _b = b_buf; 
        
           }

fengyuentau · 2024-04-19T08:51:49Z

Test is now green with this patch merged https://github.com/opencv/ci-gha-workflow/actions/runs/8750864630/job/24015266117?pr=171

remove manual unrolling that causes problem

4ef5986

fengyuentau added the category: core label Apr 19, 2024

fengyuentau self-assigned this Apr 19, 2024

fengyuentau added this to the 4.10.0 milestone Apr 19, 2024

asmorkalov added the bug label Apr 19, 2024

asmorkalov requested a review from vpisarev April 19, 2024 06:28

fengyuentau changed the title ~~remove manual unrolling that causes problem~~ core: fix Core_GEMM.accuracy failure on recent macOS Apr 19, 2024

asmorkalov assigned vpisarev and unassigned fengyuentau Apr 19, 2024

vpisarev approved these changes Apr 19, 2024

View reviewed changes

asmorkalov merged commit 5da17a4 into opencv:4.x Apr 19, 2024

fengyuentau deleted the fix_core_gemm_acc branch April 19, 2024 08:38

asmorkalov mentioned this pull request Apr 19, 2024

5.x merge 4.x #25460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: fix `Core_GEMM.accuracy` failure on recent macOS#25454

core: fix `Core_GEMM.accuracy` failure on recent macOS#25454
asmorkalov merged 1 commit intoopencv:4.xfrom
fengyuentau:fix_core_gemm_acc

fengyuentau commented Apr 19, 2024 •

edited

Loading

Uh oh!

fengyuentau commented Apr 19, 2024

Uh oh!

vpisarev left a comment

Uh oh!

opencv-alalek commented Apr 19, 2024

Uh oh!

fengyuentau commented Apr 19, 2024 •

edited

Loading

Uh oh!

fengyuentau commented Apr 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

fengyuentau commented Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

fengyuentau commented Apr 19, 2024

Uh oh!

vpisarev left a comment

Choose a reason for hiding this comment

Uh oh!

opencv-alalek commented Apr 19, 2024

Uh oh!

fengyuentau commented Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fengyuentau commented Apr 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fengyuentau commented Apr 19, 2024 •

edited

Loading

fengyuentau commented Apr 19, 2024 •

edited

Loading