Skip to content

T-API: changed optimal vector width for Intel#2893

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
ilya-lavrenov:tapi_vector_width_intel
Sep 18, 2014
Merged

T-API: changed optimal vector width for Intel#2893
opencv-pushbot merged 1 commit intoopencv:masterfrom
ilya-lavrenov:tapi_vector_width_intel

Conversation

@ilya-lavrenov
Copy link
Copy Markdown
Contributor

Description:

  • changed optimal vector width for Intel platform since it shows better performance in most of cases

Performance report:
http://ocl.itseez.com/intel/export/perf/pr/2893/report/

check_regression=OCL_AbsDiff:OCL_Add:OCL_Sub:OCL_Mul:OCL_Div:OCL_Bitwise:OCL_Compare:OCL_Min:OCL_Max:OCL_Flip:OCL_Repeat:OCL_AbsDiff:OCL_Sum:OCL_Count:OCL_Norm:OCL_Mean:_OCL_CalcHist*
test_filter=OCL_AbsDiff:OCL_Add:OCL_Sub:OCL_Mul:OCL_Div:OCL_Bitwise:OCL_Compare:OCL_Min:OCL_Max:OCL_Flip:OCL_Repeat:OCL_AbsDiff:OCL_Sum:OCL_Count:OCL_Norm:OCL_Mean:_OCL_CalcHist*
test_modules=core,imgproc
build_examples=OFF

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@mostafahagog or @abatushi or @myshevts or @mletavin
what do you think about this change? In most of cases it shows better performance. We can see a little performance degradation on some tests (even on 32f, but the patch affects only 8u) and I believe they are outliers.

@mostafahagog
Copy link
Copy Markdown

is this the one that gets the best results? how about
int vectorWidthsIntel[] = { 8, 8, 4, 4, 1, 1, 1, -1 }; ?
or
int vectorWidthsIntel[] = { 8, 8, 8, 8, 1, 1, 1, -1 };?

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

I tried 8 as vector width for uchar and it shows a worse performance with compare to 4. My observations argued me that 32bit is optimal in most of cases

@Daniil-Osokin
Copy link
Copy Markdown

@mostafahagog Hi, please check this again.

@krodyush
Copy link
Copy Markdown
Contributor

@ilya-lavrenov I see that you dont use vector load for U8 data (vload4 vload8 vload16 ). that could be the reason why 16 and 8 is not optimal vector size and 4 shows better perf for 8U. could you try to use vloadn/vstoren operations for uchar instead of tuning vector size?

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

could you try to use vloadn/vstoren operations for uchar instead of tuning vector size?

the result is - uchar4 is better than uchar16 (see #2969)

@SergeySivolgin
Copy link
Copy Markdown

@ilya-lavrenov Ilya, could you please rerun performance tests for this pull request to avoid influence of other PRs. Thanks.

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@SergeySivolgin, the process has been initiated.

@vpisarev vpisarev assigned krodyush and unassigned mostafahagog Jul 25, 2014
@vpisarev
Copy link
Copy Markdown
Contributor

@krodyush, can you please review the pull request, it's here for quite a long time already

@krodyush
Copy link
Copy Markdown
Contributor

@vpisarev, I dont see speedup according last measurement. So, I dont see reason for such changes

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@krodyush, last measurement are in progress yet (result you see were made with another driver)

@krodyush
Copy link
Copy Markdown
Contributor

@ilya-lavrenov I looked into last 2 perf reports from 07.25 and 07.29 and see big deviations in results. It looks like perf test is not reliable enough. Could you re run it several times to be sure that we see real improvement but not some noise in measurement?

@ilya-lavrenov ilya-lavrenov force-pushed the tapi_vector_width_intel branch from 0ef70f6 to 970de35 Compare August 25, 2014 07:30
@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@krodyush, see the latest performance report.

@krodyush
Copy link
Copy Markdown
Contributor

what was changed?

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

nothing, I've rerun perf report generation and we can see stable results that show performance gain.

@krodyush
Copy link
Copy Markdown
Contributor

Then could you make several perf reports to be able to see the improvements.stability?

@ElenaGvozdeva
Copy link
Copy Markdown
Contributor

@ilya-lavrenov please resolve merge conflict

@ilya-lavrenov ilya-lavrenov force-pushed the tapi_vector_width_intel branch 2 times, most recently from 6c5dec2 to 28cd305 Compare September 3, 2014 09:05
@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@ElenaGvozdeva, done.

@ilya-lavrenov ilya-lavrenov force-pushed the tapi_vector_width_intel branch from 28cd305 to 1f598b5 Compare September 3, 2014 12:30
@ilya-lavrenov ilya-lavrenov force-pushed the tapi_vector_width_intel branch from 1f598b5 to 98e7d4c Compare September 4, 2014 07:59
@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@krodyush, I've made 3 performance reports and each of them shows stable performance gain for uchars. So, please review this PR once again.

@krodyush
Copy link
Copy Markdown
Contributor

krodyush commented Sep 9, 2014

@ilya-lavrenov what was the reason to reduce number of tests from ~3000 to ~900?

@ilya-lavrenov
Copy link
Copy Markdown
Contributor Author

@krodyush, Sergey S. asked me to do that, because mostly these functions are affected by the patch.

@krodyush
Copy link
Copy Markdown
Contributor

krodyush commented Sep 9, 2014

👍

@opencv-pushbot opencv-pushbot merged commit 98e7d4c into opencv:master Sep 18, 2014
@ilya-lavrenov ilya-lavrenov deleted the tapi_vector_width_intel branch September 18, 2014 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants