core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) by opencv-alalek · Pull Request #24918 · opencv/opencv

opencv-alalek · 2024-01-24T12:03:12Z

relates #24909
relates #24917
relates #24892

Performance changes:

12700K (1 thread) + Intel iGPU

Name of Test	noOCL	convertFp16	convertTo BASE	convertTo PATCH
ConvertFP16FP32MatMat::OCL_Core	3.130	3.152	3.127	3.136
ConvertFP16FP32MatUMat::OCL_Core	3.030	3.996	3.007	2.671
ConvertFP16FP32UMatMat::OCL_Core	3.010	3.101	3.056	2.854
ConvertFP16FP32UMatUMat::OCL_Core	3.016	3.298	2.072	2.061
ConvertFP32FP16MatMat::OCL_Core	2.697	2.652	2.723	2.721
ConvertFP32FP16MatUMat::OCL_Core	2.752	4.268	2.662	2.947
ConvertFP32FP16UMatMat::OCL_Core	2.706	2.601	2.603	2.528
ConvertFP32FP16UMatUMat::OCL_Core	2.704	3.215	1.999	1.988

Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).

12700K + AMD dGPU

Name of Test	noOCL	convertFp16 dGPU	convertTo BASE dGPU	convertTo PATCH dGPU
ConvertFP16FP32MatMat::OCL_Core	3.130	3.133	3.172	3.087
ConvertFP16FP32MatUMat::OCL_Core	3.030	1.713	9.559	1.729
ConvertFP16FP32UMatMat::OCL_Core	3.010	6.515	6.309	4.452
ConvertFP16FP32UMatUMat::OCL_Core	3.016	0.242	23.597	0.170
ConvertFP32FP16MatMat::OCL_Core	2.697	2.641	2.713	2.689
ConvertFP32FP16MatUMat::OCL_Core	2.752	4.076	6.483	4.191
ConvertFP32FP16UMatMat::OCL_Core	2.706	9.042	16.481	1.834
ConvertFP32FP16UMatUMat::OCL_Core	2.704	0.229	15.730	0.176

convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core.
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.

force_builders=Linux OpenCL,Linux AVX2,Win64 OpenCL

asmorkalov

👍

fengyuentau · 2024-01-25T06:42:23Z

modules/core/include/opencv2/core/opencl/opencl_info.hpp

        DUMP_CONFIG_PROPERTY("cv_ocl_current_maxMemAllocSize", device.maxMemAllocSize());

-        const char* doubleSupportStr = device.doubleFPConfig() > 0 ? "Yes" : "No";
+        const char* doubleSupportStr = device.hasFP64() ? "Yes" : "No";


So what is going to happen with doubleFPConfig and halfFPConfig. Are they deprecating as well?

No, they are still needed if we want to compute with proper inf/nans support.

I checked doubleFPConfig in the whole opencv project and it is basically used like this,

bool doubleSupport = ocl::Device::getDefault().doubleFPConfig() > 0

they are still needed if we want to compute with proper inf/nans support

Did I miss anything here? Or it is in the user code instead?

Usage is not correct.

What is the correct way? All these code is wrong?

All are subject for revising.

…16s_usage DNN: avoid CV_16S usage for FP16 #24892 **Merge after**: #24918 TODO: - [x] measure performance changes - [x] optimize convertTo for OpenCL: #24918 12700K iGPU: |Name of Test|0|1|1 vs 0 (x-factor)| |---|:-:|:-:|:-:| |AlexNet::DNNTestNetwork::OCV/OCL_FP16|7.441|7.480|0.99| |CRNN::DNNTestNetwork::OCV/OCL_FP16|10.776|10.736|1.00| |DenseNet_121::DNNTestNetwork::OCV/OCL_FP16|52.762|52.833|1.00| |EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16|60.694|60.721|1.00| |EfficientNet::DNNTestNetwork::OCV/OCL_FP16|33.373|33.173|1.01| |FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16|81.840|81.724|1.00| |GoogLeNet::DNNTestNetwork::OCV/OCL_FP16|20.965|20.927|1.00| |Inception_5h::DNNTestNetwork::OCV/OCL_FP16|22.204|22.173|1.00| |Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|47.115|47.460|0.99| |MPHand::DNNTestNetwork::OCV/OCL_FP16|6.760|6.670|1.01| |MPPalm::DNNTestNetwork::OCV/OCL_FP16|10.188|10.171|1.00| |MPPose::DNNTestNetwork::OCV/OCL_FP16|12.510|12.561|1.00| |MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16|17.290|17.072|1.01| |MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|19.473|19.306|1.01| |MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|22.874|23.404|0.98| |OpenFace::DNNTestNetwork::OCV/OCL_FP16|9.568|9.517|1.01| |OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16|539.899|539.845|1.00| |PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16|18.015|18.769|0.96| |PPOCRv3::DNNTestNetwork::OCV/OCL_FP16|63.122|63.540|0.99| |ResNet_50::DNNTestNetwork::OCV/OCL_FP16|34.947|34.925|1.00| |SFace::DNNTestNetwork::OCV/OCL_FP16|10.249|10.206|1.00| |SSD::DNNTestNetwork::OCV/OCL_FP16|213.068|213.108|1.00| |SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16|4.867|4.878|1.00| |VIT_B_32::DNNTestNetwork::OCV/OCL_FP16|200.563|190.788|1.05| |VitTrack::DNNTestNetwork::OCV/OCL_FP16|7.528|7.173|1.05| |YOLOX::DNNTestNetwork::OCV/OCL_FP16|132.858|132.701|1.00| |YOLOv3::DNNTestNetwork::OCV/OCL_FP16|209.559|208.809|1.00| |YOLOv4::DNNTestNetwork::OCV/OCL_FP16|221.357|220.924|1.00| |YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16|24.446|24.382|1.00| |YOLOv5::DNNTestNetwork::OCV/OCL_FP16|43.922|44.080|1.00| |YOLOv8::DNNTestNetwork::OCV/OCL_FP16|64.159|63.842|1.00| |YuNet::DNNTestNetwork::OCV/OCL_FP16|10.177|10.231|0.99| |opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16|15.121|15.445|0.98| Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>

opencv-alalek added optimization category: core category: ocl labels Jan 24, 2024

opencv-alalek added this to the 4.10.0 milestone Jan 24, 2024

alalek added 5 commits January 24, 2024 13:56

core: convert FP16 perf tests

37743ef

core(ocl): fix convertTo() perfomance

3346d10

core(ocl): add Device::hasFP64() / hasFP16()

05b5684

core(ocl): fix FP16/FP64 checks in convertTo()

fc552ea

core: deprecate convertFp16()

b9b3860

opencv-pushbot force-pushed the gitee/alalek/core_convertfp16_replacement branch from 99ba03c to b9b3860 Compare January 24, 2024 13:56

opencv-alalek marked this pull request as ready for review January 24, 2024 21:57

opencv-alalek requested review from asmorkalov, fengyuentau and vpisarev January 24, 2024 21:59

opencv-alalek mentioned this pull request Jan 24, 2024

DNN: avoid CV_16S usage for FP16 #24892

Merged

2 tasks

asmorkalov approved these changes Jan 25, 2024

View reviewed changes

fengyuentau reviewed Jan 25, 2024

View reviewed changes

fengyuentau approved these changes Jan 26, 2024

View reviewed changes

vpisarev approved these changes Jan 26, 2024

View reviewed changes

asmorkalov self-assigned this Jan 26, 2024

asmorkalov merged commit 40533db into opencv:4.x Jan 26, 2024

This was referenced Feb 3, 2024

5.x merge 4.x #24958

Closed

5.x merge 4.x #24981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement)#24918

core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement)#24918
asmorkalov merged 5 commits intoopencv:4.xfrom
opencv-pushbot:gitee/alalek/core_convertfp16_replacement

opencv-alalek commented Jan 24, 2024 •

edited

Loading

Uh oh!

asmorkalov left a comment

Uh oh!

fengyuentau Jan 25, 2024

Uh oh!

opencv-alalek Jan 25, 2024

Uh oh!

fengyuentau Jan 25, 2024

Uh oh!

opencv-alalek Jan 25, 2024

Uh oh!

fengyuentau Jan 25, 2024

Uh oh!

opencv-alalek Jan 25, 2024

Uh oh!

fengyuentau Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

opencv-alalek commented Jan 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

fengyuentau Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

fengyuentau Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

fengyuentau Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

fengyuentau Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

opencv-alalek commented Jan 24, 2024 •

edited

Loading