Skip to content

fixed several problems when running tests on Mac#16608

Merged
alalek merged 4 commits intoopencv:3.4from
vpisarev:fix_mac_ocl_tests
Feb 21, 2020
Merged

fixed several problems when running tests on Mac#16608
alalek merged 4 commits intoopencv:3.4from
vpisarev:fix_mac_ocl_tests

Conversation

@vpisarev
Copy link
Copy Markdown
Contributor

@vpisarev vpisarev commented Feb 17, 2020

  • failures in ocl_pyrUp
  • failures in ocl_flip when processing CV_16UC3 or CV_16SC3 data
  • some basic UMat tests; Mat => UMat => Mat chain did not work properly on Discrete video cards.
  • One of histogram badarg tests tried to access the 2nd element of 1-element input array. Most of the time it worked fine, because the function reported the expected error because of some other bad arg, but sometimes it crashed on histogram allocation when the size was valid but huge.
force_builders=Custom Mac,Custom
test_opencl:Custom Mac=ON

build_image:Custom=ubuntu:18.04
buildworker:Custom=linux-5
test_opencl:Custom=ON

* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)
@vpisarev
Copy link
Copy Markdown
Contributor Author

@alalek, could you please add the magic comment to enable OpenCL on Mac?


bool copyOnMap = (flags0 & UMatData::COPY_ON_MAP) != 0;
if( copyOnMap )
accessFlags &= ~ACCESS_FAST;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember which test triggers that?

I have forced hostUnifiedMemory = false, but nothing is failed on Macs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek, the test is "UMat.SyncTemp". When you put some condition and nothing is failed, it does not mean that there is no bug in the logic. Before the patch the code attempts to do Map/Unmap even on machines without unified (read "shared") GPU-CPU memory. Somehow modern drivers on Windows/Linux implement this map unmap by doing implicit copying. On Mac it does not work well. On your macs everything works well probably because there is just iGPU there, no dGPU. I have 2 macs with dGPU, and they just freeze on "UMat.SyncTemp" test for infinite amount of time. After the patch everything works well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mac it does not work well.

So, where should we put #ifdef __APPLE__?

Dropping ACCESS_FAST flag here does not look like a good idea.
AFAIK, it was designed to fail fast if any data copying is required.

Please change in ocl.cpp:

-#define CV_OPENCL_TRACE_CHECK                    0
+#define CV_OPENCL_TRACE_CHECK                    1

and collect stdout dump on problematic test (e.g, which requires this fix).


BTW, I can't find any place where ACCESS_FAST is enabled (including tests).
GitHub results: 1 declaration + 2 checks.

Copy link
Copy Markdown
Contributor Author

@vpisarev vpisarev Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the tracing turned on sometimes it hangs on unmap:

OpenCV(OpenCL:0): clCreateBuffer(CL_MEM_USE_HOST_PTR|createFlags, sz=100, origdata=0x7fa0d5278680) => 0x7fa0d5277ea0
OpenCV(OpenCL:0): clCreateBuffer() => 0x7fa0d5277ea0
OpenCV(OpenCL:0): handle = clCreateCommandQueue_pfn(ch, dh, props, &retval)
OpenCV(OpenCL:0): clEnqueueReadBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
OpenCV(OpenCL:0): clEnqueueWriteBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
[ INFO:0] Successfully initialized OpenCL cache directory: /var/folders/ll/ddsp7fq10lvb2lf9v91x6ks40000gn/T/opencv/3.4.x-dev/opencl_cache/
[ INFO:0] Preparing OpenCL cache configuration for context: 32-bit--AMD--AMD_Radeon_HD_-_FirePro_D700_Compute_Engine--1_2__Jan_13_2020_20_12_40_
OpenCV(OpenCL:0): clBuildProgram(binary: core/copyset)
OpenCV(OpenCL:0): result = clGetProgramBuildInfo_pfn(handle, devices[0], 0x1181, sizeof(build_status), &build_status, &retsz)
OpenCV(OpenCL:0): clCreateKernel('set')
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=0, cl_mem=0x7fa0d5277ea0)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=1, step_value=10)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=2, offset_value=0)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=3, rows_value=10)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=4, cols_value=5)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=5, size=2, obj=0x7ffeea546ea0)
OpenCV(OpenCL:0): clFinish_pfn(qq)
OpenCV(OpenCL:0): clReleaseKernel_pfn(handle)
OpenCV(OpenCL:0): clEnqueueReadBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
OpenCV(OpenCL:0): clEnqueueWriteBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
OpenCV(OpenCL:0): clEnqueueMapBuffer(handle=0x7fa0d5277ea0, sz=100) => 0x7fa0d5278680
OpenCV(OpenCL:0): clEnqueueUnmapMemObject(handle=0x7fa0d5277ea0, data=0x7fa0d5278680, [sz=100])

sometimes it crashes at the OpenCL kernel (setTo) release,
sometimes it crashes at some other random places.

if the code that removes ACCESS_FAST flag is commented off, no any error/exception happens. The code returns UMat with OpenCL handle=NULL. Then, obviously, OpenCL kernel refuses to run and then it automatically falls back to CPU implementation. But I'm almost sure that this is much worse and error-prone behaviour than the real data copy. I moved ACCESS_FAST remove code into non-SVM branch, because ACCESS_FAST is really for SVM.

…e the OpenCL compiler on Mac generates incorrect code
…ared virtual memory (in OpenCL 2.x), not support vector machine)
…ines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@alalek alalek merged commit 07b4750 into opencv:3.4 Feb 21, 2020
jshiwam pushed a commit to jshiwam/opencv that referenced this pull request Feb 22, 2020
* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)

* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code

* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)

* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
jshiwam added a commit to jshiwam/opencv that referenced this pull request Feb 22, 2020
…m added

dnn: don't require setInput in .dump()

dnn: auto network dump through parameter

Improved GStreamer documentation.

improved documentation for imread()

videoio/MSMF: refactored format handling and selection, property reading and writing

dnn(test): skip failed ngraph tests

add cuda 10 support

3.4 docs for 3.4 branch

Resolve opencv#14566

Merge pull request opencv#16445 from atinfinity:fixed-typo

* fixed typo

* add compatibility code to handle migration

Enable Mask R-CNN with Inference Engine. Full coverage with nGraph

intrin: fixed int64->double conversion for AVX-512

Merge pull request opencv#16608 from vpisarev:fix_mac_ocl_tests

* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)

* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code

* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)

* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)

Merge pull request opencv#16594 from vpisarev:hull_ordering_fix

fixed the ordering of contour convex hull points

* partially fixed the issue opencv#4539

* fixed warnings and test failures

* fixed integer overflow (issue opencv#14521)

* added comment to force buildbot to re-run

* extended the test for the issue 4539. Check the expected behaviour on the original contour as well

* added comment; fixed typo, renamed another variable for a little better clarity

* added yet another part to the test for issue opencv#4539, where we run convexHull and convexityDetects on the original contour, without any manipulations. the rest of the test stays the same
@alalek alalek mentioned this pull request Feb 26, 2020
@vpisarev vpisarev deleted the fix_mac_ocl_tests branch May 28, 2020 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants