fixed several problems when running tests on Mac#16608
Conversation
* OCL_pyrUp * OCL_flip * some basic UMat tests * histogram badarg test (out of range access)
|
@alalek, could you please add the magic comment to enable OpenCL on Mac? |
modules/core/src/ocl.cpp
Outdated
|
|
||
| bool copyOnMap = (flags0 & UMatData::COPY_ON_MAP) != 0; | ||
| if( copyOnMap ) | ||
| accessFlags &= ~ACCESS_FAST; |
There was a problem hiding this comment.
@alalek, the test is "UMat.SyncTemp". When you put some condition and nothing is failed, it does not mean that there is no bug in the logic. Before the patch the code attempts to do Map/Unmap even on machines without unified (read "shared") GPU-CPU memory. Somehow modern drivers on Windows/Linux implement this map unmap by doing implicit copying. On Mac it does not work well. On your macs everything works well probably because there is just iGPU there, no dGPU. I have 2 macs with dGPU, and they just freeze on "UMat.SyncTemp" test for infinite amount of time. After the patch everything works well.
There was a problem hiding this comment.
On Mac it does not work well.
So, where should we put #ifdef __APPLE__?
Dropping ACCESS_FAST flag here does not look like a good idea.
AFAIK, it was designed to fail fast if any data copying is required.
Please change in ocl.cpp:
-#define CV_OPENCL_TRACE_CHECK 0
+#define CV_OPENCL_TRACE_CHECK 1and collect stdout dump on problematic test (e.g, which requires this fix).
BTW, I can't find any place where ACCESS_FAST is enabled (including tests).
GitHub results: 1 declaration + 2 checks.
There was a problem hiding this comment.
With the tracing turned on sometimes it hangs on unmap:
OpenCV(OpenCL:0): clCreateBuffer(CL_MEM_USE_HOST_PTR|createFlags, sz=100, origdata=0x7fa0d5278680) => 0x7fa0d5277ea0
OpenCV(OpenCL:0): clCreateBuffer() => 0x7fa0d5277ea0
OpenCV(OpenCL:0): handle = clCreateCommandQueue_pfn(ch, dh, props, &retval)
OpenCV(OpenCL:0): clEnqueueReadBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
OpenCV(OpenCL:0): clEnqueueWriteBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
[ INFO:0] Successfully initialized OpenCL cache directory: /var/folders/ll/ddsp7fq10lvb2lf9v91x6ks40000gn/T/opencv/3.4.x-dev/opencl_cache/
[ INFO:0] Preparing OpenCL cache configuration for context: 32-bit--AMD--AMD_Radeon_HD_-_FirePro_D700_Compute_Engine--1_2__Jan_13_2020_20_12_40_
OpenCV(OpenCL:0): clBuildProgram(binary: core/copyset)
OpenCV(OpenCL:0): result = clGetProgramBuildInfo_pfn(handle, devices[0], 0x1181, sizeof(build_status), &build_status, &retsz)
OpenCV(OpenCL:0): clCreateKernel('set')
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=0, cl_mem=0x7fa0d5277ea0)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=1, step_value=10)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=2, offset_value=0)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=3, rows_value=10)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=4, cols_value=5)
OpenCV(OpenCL:0): clSetKernelArg('set', arg_index=5, size=2, obj=0x7ffeea546ea0)
OpenCV(OpenCL:0): clFinish_pfn(qq)
OpenCV(OpenCL:0): clReleaseKernel_pfn(handle)
OpenCV(OpenCL:0): clEnqueueReadBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
OpenCV(OpenCL:0): clEnqueueWriteBuffer(q, handle=0x7fa0d5277ea0, CL_TRUE, 0, sz=100, data=0x7fa0d5278680, 0, 0, 0)
OpenCV(OpenCL:0): clEnqueueMapBuffer(handle=0x7fa0d5277ea0, sz=100) => 0x7fa0d5278680
OpenCV(OpenCL:0): clEnqueueUnmapMemObject(handle=0x7fa0d5277ea0, data=0x7fa0d5278680, [sz=100])
sometimes it crashes at the OpenCL kernel (setTo) release,
sometimes it crashes at some other random places.
if the code that removes ACCESS_FAST flag is commented off, no any error/exception happens. The code returns UMat with OpenCL handle=NULL. Then, obviously, OpenCL kernel refuses to run and then it automatically falls back to CPU implementation. But I'm almost sure that this is much worse and error-prone behaviour than the real data copy. I moved ACCESS_FAST remove code into non-SVM branch, because ACCESS_FAST is really for SVM.
…e the OpenCL compiler on Mac generates incorrect code
…ared virtual memory (in OpenCL 2.x), not support vector machine)
…ines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
* fixed several problems when running tests on Mac: * OCL_pyrUp * OCL_flip * some basic UMat tests * histogram badarg test (out of range access) * retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code * moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine) * force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
…m added dnn: don't require setInput in .dump() dnn: auto network dump through parameter Improved GStreamer documentation. improved documentation for imread() videoio/MSMF: refactored format handling and selection, property reading and writing dnn(test): skip failed ngraph tests add cuda 10 support 3.4 docs for 3.4 branch Resolve opencv#14566 Merge pull request opencv#16445 from atinfinity:fixed-typo * fixed typo * add compatibility code to handle migration Enable Mask R-CNN with Inference Engine. Full coverage with nGraph intrin: fixed int64->double conversion for AVX-512 Merge pull request opencv#16608 from vpisarev:fix_mac_ocl_tests * fixed several problems when running tests on Mac: * OCL_pyrUp * OCL_flip * some basic UMat tests * histogram badarg test (out of range access) * retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code * moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine) * force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write) Merge pull request opencv#16594 from vpisarev:hull_ordering_fix fixed the ordering of contour convex hull points * partially fixed the issue opencv#4539 * fixed warnings and test failures * fixed integer overflow (issue opencv#14521) * added comment to force buildbot to re-run * extended the test for the issue 4539. Check the expected behaviour on the original contour as well * added comment; fixed typo, renamed another variable for a little better clarity * added yet another part to the test for issue opencv#4539, where we run convexHull and convexityDetects on the original contour, without any manipulations. the rest of the test stays the same
Mat => UMat => Matchain did not work properly on Discrete video cards.