Skip to content

OpenCL: core support for more formats, including float16#20288

Closed
JoeHowse wants to merge 6 commits intoopencv:masterfrom
JoeHowse:cl_image-float16-conversions
Closed

OpenCL: core support for more formats, including float16#20288
JoeHowse wants to merge 6 commits intoopencv:masterfrom
JoeHowse:cl_image-float16-conversions

Conversation

@JoeHowse
Copy link
Copy Markdown
Contributor

@JoeHowse JoeHowse commented Jun 21, 2021

Changes

This draft pull request attempts to improve the support for float16, as well as some additional channel formats, via the OCL back-end. Specifically, the changes include the following:

  • Support conversion for CL_HALF_FLOAT (float16)

  • Support conversion for additional channel orders: CL_A, CL_INTENSITY, CL_LUMINANCE, CL_RG, CL_RA

  • Comment on why conversion is unsupported for CL_RGB. (See CV_8UC3 Mat not convertible to ocl::Image2D #8108 and the latest OpenCL documentation on supported types for CL_RGB.)

  • Predict optimal vector width for float16

  • Support string conversion for float16 kernels

  • Support querying OpenCL's float16 support (via Device::halfFPConfig) for any OpenCL version. Previously, the code enforced OpenCL version >= 1.2, but this limitation was artificial. Even OpenCL 1.0 supports the underlying config property, CL_DEVICE_HALF_FP_CONFIG, as shown in the OpenCL 1.0 documentation.

  • Support for float16 in ocl_gemm implementation

  • Performance test cases for ocl_gemm with float16 and float64

  • Supporting default range [-1.0, 1.0] for float16 in randu. (randu already supported float16 with a specified range. With the change, randu also supports float16 with a default range, the same way as randu already supported a default range for other types.)


force_builders=Custom Mac
test_opencl:Custom Mac=ON

JoeHowse added 3 commits June 21, 2021 00:46
* Support conversion for CL_HALF_FLOAT (float16)

* Support conversion for additional channel orders: CL_A, CL_INTENSITY,
  CL_LUMINANCE, CL_RG, CL_RA

* Comment on why conversion is unsupported for CL_RGB

* Predict optimal vector width for float16
* Support float16 in ocl::kernelToStr
* Drop the artificial requirement for OpenCL version >= 1.2 in
  ocl::Device::halfFPConfig. Even OpenCL 1.0 supports the underlying
  config property, CL_DEVICE_HALF_FP_CONFIG.

* Update opencl_info.hpp to provide info on OpenCL half-float support,
  like pre-existing info on double-float support.
@JoeHowse JoeHowse changed the title cl_image: support more formats, including float16 OpenCL: core support for more formats, including float16 Jun 22, 2021
* Report preferred half-float vector width when dumping OpenCL info
@JoeHowse JoeHowse marked this pull request as draft June 27, 2021 00:11
JoeHowse added 2 commits June 26, 2021 21:20
* Support for float16 in ocl_gemm implementation

* Performance test cases for ocl_gemm with float16 and float64

* Supporting default range [-1.0, 1.0] for float16 in randu
* Accept float16 input in TestBase::warmup
@JoeHowse
Copy link
Copy Markdown
Contributor Author

Support for float16 computations is proving to be more problematic than I originally expected. After trying many hardware/driver combinations, I have concluded that OpenCL drivers rarely implement support for float16 computations, even if the hardware is capable of float16 vectorization.

I am leaving this pull request open as a draft for discussion. Meanwhile, I plan to factor out some of the other changes (the ones that do not depend on OpenCL float16 computations) as a separate pull request.

See details below.

Tested hardware/driver combinations

To check whether OpenCL float16 computations are supported, I am calling cv::ocl::Device::halfFPConfig, which simply queries OpenCL's CL_DEVICE_HALF_FP_CONFIG property.

On Windows, I tried the following combinations and found that none of them support float16 computations in OpenCL:

  • NVIDIA GeForce RTX 3080 with the latest Game Ready Driver
  • AMD Radeon Pro WX 9100 with the latest Enterprise Driver
  • AMD Radeon RX 580 with the latest Adrenaline Driver
  • AMD Radeon RX 580 with the latest Enterprise Driver
  • Udoo Bolt V8 board with Vega 8 integrated GPU and the Embedded V-Series APU Catalyst Driver

Likewise, on Mac, I tried the following combinations and found that none of them support float16 computations in OpenCL:

  • Intel Iris Pro integrated GPU in a MacBook Pro mid-2015
  • AMD Radeon Pro WX 9100 as an external GPU
  • AMD Radeon RX 560 as an external GPU

On Linux, results were mixed:

  • NVIDIA GeForce RTX 3080 with the closed-source NVIDIA drivers: Reports that it does not support float16 computations.
  • Udoo Bolt V8 board with Vega 8 integrated GPU and the open-source Mesa (OpenCL 1.1) drivers: Reports that it does support float16 computations. However, in general, when this system runs OpenCV's OpenCL tests, it freezes with a black screen.
  • ODROID-XU4 board with ARM Mali-T628MP6 integrated GPU and the open-source Panfrost drivers: Reports that it does support float16 computations. However, in general, when this system runs OpenCV's OpenCL tests, it reports memory allocation errors. With the changes in this draft pull request, it can get as far as trying to compile a float16 kernel, but then the compiler reports that the half type is unsupported.

To-do: On Linux, I should try other types of AMD hardware and AMD drivers (AMDGPU, AMD ROCm).

Plans to refactor

The following subset of the changes does not depend on OpenCL float16 computations and could be refactored into a separate pull request:

  • Support clImage conversion for additional channel orders: CL_A, CL_INTENSITY, CL_LUMINANCE, CL_RG, CL_RA

  • Comment on why clImage conversion is unsupported for CL_RGB.

  • Support querying OpenCL's float16 support (via Device::halfFPConfig) for any OpenCL version. Previously, the code enforced OpenCL version >= 1.2, but this limitation was artificial.

  • Add performance test cases for ocl_gemm with float64. These cases are skipped if the system does not support OpenCL float64 computations.

  • Support default range [-1.0, 1.0] for float16 in randu. (randu already supported float16 if the range was specified in arguments. With the change, randu also supports float16 with a default range, the same way as randu already supported a default range for other types.)

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement of gemm OpenCL code path only without touching of C++ / CPU generic code doesn't makes sense.

Adding performance tests without accuracy doesn't makes sense too.

My suggestion is to drop gemm() changes from this patch as incomplete.

randu(src3, -10.0, 10.0);

OCL_TEST_CYCLE() cv::gemm(src1, src2, 0.6, src3, 1.5, dst, flags);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below:

if (CV_MAT_DEPTH(type) == CV_16F)
    SANITY_CHECK_NOTHING();
else
    SANITY_CHECK(dst, 0.01);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you please explain this comment/suggestion further?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to bypass test failure message due to missing sanity data in opencv_extra repository.
It makes sense to bypass check for 16F result accuracy in performance tests (result may be inaccurate on different platforms / OpenCL devices)

CV_Assert_N( type == matB.type(), (type == CV_32FC1 || type == CV_64FC1 || type == CV_32FC2 || type == CV_64FC2) );
CV_Assert_N( type == matB.type(),
(type == CV_32FC1 || type == CV_64FC1 || type == CV_16FC1 ||
type == CV_32FC2 || type == CV_64FC2 || type == CV_16FC2) );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really work?

  • matA of 32F
  • matB of 16F

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion's first condition, type == matB.type(), ensures that the two matrices have the same type.

@JoeHowse
Copy link
Copy Markdown
Contributor Author

Improvement of gemm OpenCL code path only without touching of C++ / CPU generic code doesn't makes sense.

Adding performance tests without accuracy doesn't makes sense too.

My suggestion is to drop gemm() changes from this patch as incomplete.

I would be happy to provide a refactored patch, excluding the GEMM changes and anything else that is currently incomplete or problematic.

@alalek What are your thoughts on the changes in ocl.cpp?

@alalek
Copy link
Copy Markdown
Member

alalek commented Jun 30, 2021

The changes besides gemm() (ocl.cpp, ts update) looks good to me.

@JoeHowse
Copy link
Copy Markdown
Contributor Author

Thank you for the review. I have made a refactored pull request in #20336.

@JoeHowse
Copy link
Copy Markdown
Contributor Author

JoeHowse commented Aug 3, 2021

I am closing this pull request because a refactored version of it (#20336) has been merged.

@JoeHowse JoeHowse closed this Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants