T-API: changed optimal vector width for Intel#2893
T-API: changed optimal vector width for Intel#2893opencv-pushbot merged 1 commit intoopencv:masterfrom
Conversation
|
@mostafahagog or @abatushi or @myshevts or @mletavin |
|
is this the one that gets the best results? how about |
|
I tried |
|
@mostafahagog Hi, please check this again. |
|
@ilya-lavrenov I see that you dont use vector load for U8 data (vload4 vload8 vload16 ). that could be the reason why 16 and 8 is not optimal vector size and 4 shows better perf for 8U. could you try to use vloadn/vstoren operations for uchar instead of tuning vector size? |
the result is - |
|
@ilya-lavrenov Ilya, could you please rerun performance tests for this pull request to avoid influence of other PRs. Thanks. |
|
@SergeySivolgin, the process has been initiated. |
|
@krodyush, can you please review the pull request, it's here for quite a long time already |
|
@vpisarev, I dont see speedup according last measurement. So, I dont see reason for such changes |
|
@krodyush, last measurement are in progress yet (result you see were made with another driver) |
|
@ilya-lavrenov I looked into last 2 perf reports from 07.25 and 07.29 and see big deviations in results. It looks like perf test is not reliable enough. Could you re run it several times to be sure that we see real improvement but not some noise in measurement? |
0ef70f6 to
970de35
Compare
|
@krodyush, see the latest performance report. |
|
what was changed? |
|
nothing, I've rerun perf report generation and we can see stable results that show performance gain. |
|
Then could you make several perf reports to be able to see the improvements.stability? |
|
@ilya-lavrenov please resolve merge conflict |
6c5dec2 to
28cd305
Compare
|
@ElenaGvozdeva, done. |
28cd305 to
1f598b5
Compare
1f598b5 to
98e7d4c
Compare
|
@krodyush, I've made 3 performance reports and each of them shows stable performance gain for uchars. So, please review this PR once again. |
|
@ilya-lavrenov what was the reason to reduce number of tests from ~3000 to ~900? |
|
@krodyush, Sergey S. asked me to do that, because mostly these functions are affected by the patch. |
|
👍 |
Description:
Performance report:
http://ocl.itseez.com/intel/export/perf/pr/2893/report/
check_regression=OCL_AbsDiff:OCL_Add:OCL_Sub:OCL_Mul:OCL_Div:OCL_Bitwise:OCL_Compare:OCL_Min:OCL_Max:OCL_Flip:OCL_Repeat:OCL_AbsDiff:OCL_Sum:OCL_Count:OCL_Norm:OCL_Mean:_OCL_CalcHist*
test_filter=OCL_AbsDiff:OCL_Add:OCL_Sub:OCL_Mul:OCL_Div:OCL_Bitwise:OCL_Compare:OCL_Min:OCL_Max:OCL_Flip:OCL_Repeat:OCL_AbsDiff:OCL_Sum:OCL_Count:OCL_Norm:OCL_Mean:_OCL_CalcHist*
test_modules=core,imgproc
build_examples=OFF