Skip to content

Improve vectorization in the 'norm' functions#15402

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
ChipKerchner:normUnroll
Aug 31, 2019
Merged

Improve vectorization in the 'norm' functions#15402
opencv-pushbot merged 1 commit intoopencv:3.4from
ChipKerchner:normUnroll

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

Unroll normL2Sqr_, float normL1_ and int normL1_to hide latencies - 30% improvement on VSX.

Vectorize normHamming - 5.6x improvement on VSX ans SSE and 8x speedup for AVX.

@terfendail
Copy link
Copy Markdown
Contributor

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
hal_normL1_f32::test_len::300000 0.087 0.085 1.02
hal_normL1_f32::test_len::2000000 0.806 0.792 1.02
hal_normL1_u8::test_len::300000 0.012 0.012 1.04
hal_normL1_u8::test_len::2000000 0.145 0.142 1.02
hal_normL2Sqr::test_len::300000 0.087 0.086 1.01
hal_normL2Sqr::test_len::2000000 0.802 0.795 1.01
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.120 0.037 3.27
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.780 0.228 3.42
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.031 0.031 1.00
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.203 0.193 1.05
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.078 0.034 2.28
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.515 0.215 2.39
norm::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.028 0.028 1.00
norm::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.190 0.178 1.07
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
hal_normL1_f32::test_len::300000 0.086 0.085 1.01
hal_normL1_f32::test_len::2000000 0.815 0.794 1.03
hal_normL1_u8::test_len::300000 0.012 0.012 1.03
hal_normL1_u8::test_len::2000000 0.145 0.143 1.01
hal_normL2Sqr::test_len::300000 0.086 0.085 1.01
hal_normL2Sqr::test_len::2000000 0.815 0.798 1.02
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.119 0.037 3.24
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.780 0.240 3.25
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.031 0.031 1.00
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.203 0.203 1.00
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.079 0.034 2.29
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.514 0.226 2.28
norm::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.028 0.028 1.01
norm::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.191 0.187 1.02
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
hal_normL1_f32::test_len::300000 0.090 0.091 0.99
hal_normL1_f32::test_len::2000000 0.848 0.827 1.03
hal_normL1_u8::test_len::300000 0.012 0.012 1.04
hal_normL1_u8::test_len::2000000 0.148 0.153 0.96
hal_normL2Sqr::test_len::300000 0.091 0.091 0.99
hal_normL2Sqr::test_len::2000000 0.852 0.830 1.03
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.118 0.037 3.21
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.782 0.245 3.19
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.021 0.021 1.00
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.162 0.165 0.98
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.079 0.034 2.28
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.514 0.229 2.25
norm::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.015 0.015 1.00
norm::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.113 0.108 1.04
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
hal_normL1_f32::test_len::300000 0.089 0.089 1.00
hal_normL1_f32::test_len::2000000 0.826 0.831 0.99
hal_normL1_u8::test_len::300000 0.010 0.010 1.04
hal_normL1_u8::test_len::2000000 0.153 0.153 1.00
hal_normL2Sqr::test_len::300000 0.089 0.089 1.00
hal_normL2Sqr::test_len::2000000 0.811 0.816 0.99
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.115 0.015 7.92
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.745 0.160 4.67
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.012 0.012 0.94
norm2::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.157 0.159 0.99
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480) 0.076 0.012 6.36
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080) 0.514 0.097 5.31
norm::PerfHamming::(NORM_HAMMING, 8UC1, 640x480) 0.010 0.010 0.98
norm::PerfHamming::(NORM_HAMMING, 8UC1, 1920x1080) 0.092 0.089 1.03

@opencv-pushbot opencv-pushbot merged commit 288e6f9 into opencv:3.4 Aug 31, 2019
@ChipKerchner ChipKerchner deleted the normUnroll branch September 3, 2019 13:47
@alalek alalek mentioned this pull request Sep 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants