OpenCL: core support for more formats, including float16#20288
OpenCL: core support for more formats, including float16#20288JoeHowse wants to merge 6 commits intoopencv:masterfrom
Conversation
* Support conversion for CL_HALF_FLOAT (float16) * Support conversion for additional channel orders: CL_A, CL_INTENSITY, CL_LUMINANCE, CL_RG, CL_RA * Comment on why conversion is unsupported for CL_RGB * Predict optimal vector width for float16
* Support float16 in ocl::kernelToStr
* Drop the artificial requirement for OpenCL version >= 1.2 in ocl::Device::halfFPConfig. Even OpenCL 1.0 supports the underlying config property, CL_DEVICE_HALF_FP_CONFIG. * Update opencl_info.hpp to provide info on OpenCL half-float support, like pre-existing info on double-float support.
* Report preferred half-float vector width when dumping OpenCL info
* Support for float16 in ocl_gemm implementation * Performance test cases for ocl_gemm with float16 and float64 * Supporting default range [-1.0, 1.0] for float16 in randu
* Accept float16 input in TestBase::warmup
|
Support for float16 computations is proving to be more problematic than I originally expected. After trying many hardware/driver combinations, I have concluded that OpenCL drivers rarely implement support for float16 computations, even if the hardware is capable of float16 vectorization. I am leaving this pull request open as a draft for discussion. Meanwhile, I plan to factor out some of the other changes (the ones that do not depend on OpenCL float16 computations) as a separate pull request. See details below. Tested hardware/driver combinationsTo check whether OpenCL float16 computations are supported, I am calling On Windows, I tried the following combinations and found that none of them support float16 computations in OpenCL:
Likewise, on Mac, I tried the following combinations and found that none of them support float16 computations in OpenCL:
On Linux, results were mixed:
To-do: On Linux, I should try other types of AMD hardware and AMD drivers (AMDGPU, AMD ROCm). Plans to refactorThe following subset of the changes does not depend on OpenCL float16 computations and could be refactored into a separate pull request:
|
alalek
left a comment
There was a problem hiding this comment.
Improvement of gemm OpenCL code path only without touching of C++ / CPU generic code doesn't makes sense.
Adding performance tests without accuracy doesn't makes sense too.
My suggestion is to drop gemm() changes from this patch as incomplete.
| randu(src3, -10.0, 10.0); | ||
|
|
||
| OCL_TEST_CYCLE() cv::gemm(src1, src2, 0.6, src3, 1.5, dst, flags); | ||
|
|
There was a problem hiding this comment.
Below:
if (CV_MAT_DEPTH(type) == CV_16F)
SANITY_CHECK_NOTHING();
else
SANITY_CHECK(dst, 0.01);
There was a problem hiding this comment.
Sorry, could you please explain this comment/suggestion further?
There was a problem hiding this comment.
This is used to bypass test failure message due to missing sanity data in opencv_extra repository.
It makes sense to bypass check for 16F result accuracy in performance tests (result may be inaccurate on different platforms / OpenCL devices)
| CV_Assert_N( type == matB.type(), (type == CV_32FC1 || type == CV_64FC1 || type == CV_32FC2 || type == CV_64FC2) ); | ||
| CV_Assert_N( type == matB.type(), | ||
| (type == CV_32FC1 || type == CV_64FC1 || type == CV_16FC1 || | ||
| type == CV_32FC2 || type == CV_64FC2 || type == CV_16FC2) ); |
There was a problem hiding this comment.
Does it really work?
- matA of 32F
- matB of 16F
There was a problem hiding this comment.
The assertion's first condition, type == matB.type(), ensures that the two matrices have the same type.
I would be happy to provide a refactored patch, excluding the GEMM changes and anything else that is currently incomplete or problematic. @alalek What are your thoughts on the changes in |
|
The changes besides gemm() (ocl.cpp, ts update) looks good to me. |
|
Thank you for the review. I have made a refactored pull request in #20336. |
|
I am closing this pull request because a refactored version of it (#20336) has been merged. |
Changes
This draft pull request attempts to improve the support for float16, as well as some additional channel formats, via the OCL back-end. Specifically, the changes include the following:
Support conversion for
CL_HALF_FLOAT(float16)Support conversion for additional channel orders:
CL_A,CL_INTENSITY,CL_LUMINANCE,CL_RG,CL_RAComment on why conversion is unsupported for
CL_RGB. (See CV_8UC3 Mat not convertible to ocl::Image2D #8108 and the latest OpenCL documentation on supported types forCL_RGB.)Predict optimal vector width for float16
Support string conversion for float16 kernels
Support querying OpenCL's float16 support (via
Device::halfFPConfig) for any OpenCL version. Previously, the code enforced OpenCL version >= 1.2, but this limitation was artificial. Even OpenCL 1.0 supports the underlying config property,CL_DEVICE_HALF_FP_CONFIG, as shown in the OpenCL 1.0 documentation.Support for float16 in
ocl_gemmimplementationPerformance test cases for
ocl_gemmwith float16 and float64Supporting default range [-1.0, 1.0] for float16 in
randu. (randualready supported float16 with a specified range. With the change,randualso supports float16 with a default range, the same way asrandualready supported a default range for other types.)