Skip to content

StereoSGBM algorithm updated to use wide universal intrinsics#15478

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
terfendail:wintr_stereosgbm
Oct 31, 2019
Merged

StereoSGBM algorithm updated to use wide universal intrinsics#15478
opencv-pushbot merged 1 commit intoopencv:3.4from
terfendail:wintr_stereosgbm

Conversation

@terfendail
Copy link
Copy Markdown
Contributor

@terfendail terfendail commented Sep 6, 2019

resolves #15206

This pullrequest changes

StereoSGBM algorithm updated to use wide universal intrinsics

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

@terfendail
Copy link
Copy Markdown
Contributor Author

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_HH4) 127.486 132.974 0.96
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM) 93.480 86.080 1.09
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM_3WAY) 74.211 76.667 0.97
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_HH4) 180.248 183.064 0.98
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM) 119.444 115.848 1.03
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM_3WAY) 99.528 100.390 0.99
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_HH4) 457.854 460.814 0.99
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM) 323.993 302.777 1.07
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM_3WAY) 256.581 264.158 0.97
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_HH4) 762.396 807.578 0.94
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM) 510.502 481.793 1.06
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM_3WAY) 406.937 413.859 0.98
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_HH4) 128.651 129.730 0.99
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM) 94.933 86.877 1.09
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM_3WAY) 74.903 76.043 0.99
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_HH4) 184.795 179.593 1.03
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM) 123.253 115.259 1.07
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM_3WAY) 100.512 100.662 1.00
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_HH4) 466.527 458.792 1.02
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM) 327.592 302.283 1.08
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM_3WAY) 252.955 256.413 0.99
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_HH4) 768.995 799.933 0.96
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM) 518.386 472.727 1.10
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM_3WAY) 410.078 411.482 1.00
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_HH4) 126.389 125.046 1.01
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM) 90.838 81.128 1.12
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM_3WAY) 68.740 69.941 0.98
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_HH4) 178.245 175.912 1.01
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM) 116.176 109.258 1.06
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM_3WAY) 89.002 92.805 0.96
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_HH4) 451.558 444.315 1.02
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM) 314.827 287.050 1.10
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM_3WAY) 232.987 242.633 0.96
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_HH4) 749.226 782.894 0.96
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM) 495.161 461.669 1.07
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM_3WAY) 372.483 382.057 0.97
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_HH4) 124.361 114.816 1.08
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM) 90.213 72.189 1.25
SGBM::TestStereoCorrespSGBM::(640x480, 128, StereoSGBM::MODE_SGBM_3WAY) 65.086 50.145 1.30
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_HH4) 176.013 163.201 1.08
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM) 117.757 96.591 1.22
SGBM::TestStereoCorrespSGBM::(640x480, 256, StereoSGBM::MODE_SGBM_3WAY) 86.871 65.565 1.32
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_HH4) 443.719 418.918 1.06
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM) 315.630 260.964 1.21
SGBM::TestStereoCorrespSGBM::(1280x720, 128, StereoSGBM::MODE_SGBM_3WAY) 222.996 187.938 1.19
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_HH4) 739.070 704.779 1.05
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM) 502.672 417.489 1.20
SGBM::TestStereoCorrespSGBM::(1280x720, 256, StereoSGBM::MODE_SGBM_3WAY) 360.991 295.564 1.22

@terfendail terfendail force-pushed the wintr_stereosgbm branch 4 times, most recently from 3be502c to d9d62fb Compare September 25, 2019 12:08
@terfendail terfendail requested a review from alalek October 8, 2019 18:04
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is crashed test on AVX512: Calib3d_StereoSGBM.regression
Please take a look.

C[x] = (CostType)(Cprev[x] + hsumAdd[x] - hsumSub[x]);
#endif
}
else*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part perform cost evaluation for the bottom of the image, but it was missed in the original implementation. So result of the code is inconsistent with test reference data. However bottom evaluation works inaccurate anyway due to matching of mostly border padding instead of real data.

v_store_aligned(cost + x*D + d, _c0 + v_reinterpret_as_s16(diff1 >> diff_scale));
v_store_aligned(cost + x*D + d + 8, _c1 + v_reinterpret_as_s16(diff2 >> diff_scale));
}
for( ; d <= maxD - v_uint8::nlanes; d += v_uint8::nlanes )
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_uint8::nlanes

Why is v_int16::nlanes * 2 not used? (as destination buffer access is performed by two int16 vectors)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

for( x = width-1-maxX2; x < width-1- minX2; x++ )
// to process values from [minX2, maxX2) we should check memory location (width - 1 - maxX2, width - 1 - minX2]
// so iterate through [width - maxX2, width - minX2)
for( x = width-maxX2; x < width-minX2; x++ )
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prow2[x] indexes are changed - intentionally?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is to address #15206

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!
Thank you 👍

@opencv-pushbot opencv-pushbot merged commit 42b1d04 into opencv:3.4 Oct 31, 2019
@alalek alalek mentioned this pull request Nov 4, 2019
@terfendail terfendail deleted the wintr_stereosgbm branch November 5, 2019 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants