implementation for dnn fp16 ocl support by pengli · Pull Request #11397 · opencv/opencv

pengli · 2018-04-26T07:58:12Z

This PR is for the feature "Adding FP16 path in DNN" at #11009

buildworker:Win64 OpenCL=windows-2
allow_multiple_commits=1

dkurt · 2018-04-27T05:54:43Z

modules/dnn/src/dnn.cpp

-    Mat blob_ = blob.getMat();
+    Mat blob_;
+    if (impl->preferableTarget == DNN_TARGET_OPENCL &&
+        impl->preferablePrecision == DNN_PRECISION_FP16)


I think we can use just DNN_TARGET_OPENCL_FP16 target instead of separate precisions enum.

yes, this flag should be good. patch is updated.

dkurt · 2018-04-27T05:59:03Z

modules/dnn/src/layers/prior_box_layer.cpp

-            heights.copyTo(umat_heights);
+            if (use_half)
+            {
+                convertFp16(offsetsX, umat_offsetsX);


All non-weights hyper-parameters should be in the origin precision.

switched back to origin precision for non-weights parameter.

dkurt · 2018-05-03T11:36:24Z

modules/dnn/src/layers/prior_box_layer.cpp

        std::vector<UMat> inputs;
        std::vector<UMat> outputs;

+        bool use_half = (inps.depth() == CV_16S);


PriorBox layer does not use input data. It generates a fixed set of bounding boxes. So I think we need to keep it's output in single precision floats because we can face significant accuracy loss.

thanks for the review, I keep the float precision for prior_box layer.

dkurt · 2018-05-03T11:40:26Z

modules/dnn/test/test_tf_importer.cpp

@@ -302,17 +302,18 @@ TEST(Test_TensorFlow, defun)

 TEST(Test_TensorFlow, fp16)


Can we make this test parametric and add to Test_TensorFlow_layers group?

I changed the fp16 test, make it use DNN_TARGET as the parameter.

dkurt · 2018-05-03T12:28:41Z

There are some performance measurements for DNN_TARGET_OPENCL and DNN_TARGET_OPENCL_FP16 targets:

                              Name of Test                                 fp32     fp16      fp16   
                                                                                               vs    
                                                                                              fp32   
                                                                                           (x-factor)
AlexNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                            15.358   13.377     1.15   
DenseNet_121::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                       67.542  105.010     0.64   
ENet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                               27.672   failed      -     
GoogLeNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                          19.509   27.480     0.71   
Inception_5h::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                       21.778   29.363     0.74   
Inception_v2_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT)        66.978   84.560     0.79   
MobileNet_SSD_Caffe::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                18.848   25.258     0.75   
MobileNet_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT)          skipped   34.681      -     
OpenFace::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                           9.639    13.296     0.72   
OpenPose_pose_coco::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                1055.318 1170.794    0.90   
OpenPose_pose_mpi::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                 1046.968 1156.758    0.91   
OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 752.793  811.864     0.93   
ResNet_50::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                          35.452  101.747     0.35   
SSD::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                               393.071  347.915     1.13   
SqueezeNet_v1_1::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                    5.956    8.726      0.68   
YOLOv3::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                            346.168  440.524     0.79   
opencv_face_detector::DNNTestNetwork::(DNN_BACKEND_DEFAULT)               28.412   33.469     0.85

CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8
GPU: Intel® HD Graphics 530 (Skylake GT2)

dkurt · 2018-05-03T12:36:18Z

modules/dnn/src/dnn.cpp

+
+        if (preferableTarget == DNN_TARGET_OPENCL_FP16)
+        {
+            convertFp16(ld.outputBlobs[pin.oid], output_blob);


I think it's better to check if ld.outputBlobs[pin.oid] contains fp16 values rather preferableTarget == DNN_TARGET_OPENCL_FP16 because if ld.outputBlobs[pin.oid] has fp32 type output_blob will has fp16 one.

dkurt · 2018-05-03T12:45:39Z

modules/dnn/src/dnn.cpp

        blobManager.allocateBlobsForLayer(ld, layerShapesIt->second, pinsForInternalBlobs,
-                                          preferableBackend == DNN_BACKEND_INFERENCE_ENGINE);
+                                          preferableBackend == DNN_BACKEND_INFERENCE_ENGINE,
+                                          preferableTarget == DNN_TARGET_OPENCL_FP16);


We need to allocate halfs if preferableTarget == DNN_TARGET_OPENCL_FP16 and preferableBackend == DNN_BACKEND_DEFAULT because there is one more backend (Intel's Inference Engine) which supports FP16 computations but accepts inputs and outputs in FP32.

dkurt · 2018-05-03T12:55:36Z

modules/dnn/test/test_backends.cpp

            net.setHalideScheduler(halideScheduler);
        }
+
+        net.setInput(inp);


Can we convert input blob to FP16 at the network's initialization stage (i.e. setUpNet)? Before the first forward call we could call any net.set* methods in any order.

code is updated, net.set* can be in any order before forward call.

dkurt · 2018-05-10T13:45:55Z

@pengli, Looks like I measured YOLOv3 's efficiency wrongly in previous posts. See actual numbers below.

Name of Test	fp32	fp16	x-factor
AlexNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	15.610	9.952	1.57
DenseNet_121::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	66.527	86.417	0.77
ENet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	27.526	skipped	-
GoogLeNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	19.631	19.336	1.02
Inception_5h::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	21.723	21.434	1.01
Inception_v2_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	66.995	76.540	0.88
MobileNet_SSD_Caffe::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	18.837	23.211	0.81
MobileNet_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	skipped	26.919	-
OpenFace::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	9.452	11.308	0.84
OpenPose_pose_coco::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	1051.954	1160.058	0.91
OpenPose_pose_mpi::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	1034.653	1145.306	0.90
OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	744.890	805.034	0.93
ResNet_50::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	35.714	38.967	0.92
SSD::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	387.188	344.652	1.12
SqueezeNet_v1_1::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	5.975	7.126	0.84
YOLOv3::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	286.128	285.209	1.00
opencv_face_detector::DNNTestNetwork::(DNN_BACKEND_DEFAULT)	26.438	31.126	0.85

CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8
GPU: Intel® HD Graphics 530 (Skylake GT2)

dkurt · 2018-05-10T18:49:54Z

modules/dnn/test/test_backends.cpp

        throw SkipTestException("");
    Mat sample = imread(findDataFile("dnn/street.png", false));
    Mat inp = blobFromImage(sample, 1.0f / 127.5, Size(300, 300), Scalar(127.5, 127.5, 127.5), false);
+    float l1 = (target == DNN_TARGET_OPENCL_FP16) ? 0.0007 : 0.0;


Please replace to backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16 .

fixed, thank you

dkurt · 2018-05-10T18:50:12Z

modules/dnn/test/test_backends.cpp

        throw SkipTestException("");
    Mat sample = imread(findDataFile("dnn/street.png", false));
    Mat inp = blobFromImage(sample, 1.0f / 127.5, Size(300, 300), Scalar(127.5, 127.5, 127.5), false);
+    float l1 = (target == DNN_TARGET_OPENCL_FP16) ? 0.008 : 0.0;


The same, backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16 .

fixed, thank you

dkurt · 2018-05-10T18:59:48Z

modules/dnn/test/test_caffe_importer.cpp


    Mat ref = blobFromNPY(_tf("mobilenet_ssd_caffe_out.npy"));
-    normAssertDetections(ref, out);
+    normAssertDetections(ref, out, "", 0.0, 4e-4, 5e-3);


Please keep default values 1e-5 and 1e-4 for non-DNN_TARGET_OPENCL_FP16 targets.

fixed, thank you

dkurt · 2018-05-11T06:24:27Z

👍 Looks good to me.
@alalek, Should we choose base branch 3.4 or master according to evolution proposal #11009?

pengli · 2018-05-14T11:58:41Z

@alalek，hi, any feedback？

alalek · 2018-05-14T12:13:26Z

Please rebase this patch on 3.4 branch (almost all DNN patches comes into 3.4, so we want to minimize future merge conflicts)

So, please:

change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
rebase your commits from master onto 3.4 branch. For example:
git rebase -i --onto upstream/3.4 upstream/master
(check list of your commits, save and quit (Esc + "wq" + Enter)
where upstream is configured by following this GitHub guide and fetched (git fetch upstream).
push rebased commits into source branch of your fork (with --force option)

pengli · 2018-05-14T13:45:00Z

@alalek , done, code is rebased onto 3.4 branch.

btw, will you also merge this patchset into master ?

alalek · 2018-05-14T13:55:51Z

Yes, via regular 3.4 => master merges (weekly/bi-weekly).

alalek · 2018-05-14T14:02:31Z

modules/dnn/include/opencv2/dnn/dnn.hpp

        DNN_TARGET_OPENCL_FP16
    };

+    #define IS_DNN_OPENCL_TARGET(id) (id == DNN_TARGET_OPENCL || id == DNN_TARGET_OPENCL_FP16)


We should not garbage global macro namespace. So please:

add CV_ prefix

or move this into src/precomp.hpp file (preferable)

fixed，moved to src/precomp.hpp

alalek · 2018-05-14T22:32:35Z

modules/dnn/perf/perf_net.cpp

 {
-    if (backend == DNN_BACKEND_INFERENCE_ENGINE) throw SkipTestException("");
+    if (backend == DNN_BACKEND_INFERENCE_ENGINE ||
+        backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16)


Please use more brackets to make static code analyzers happy.

alalek · 2018-05-14T22:42:50Z

modules/dnn/src/opencl/batchnorm.cl

-#define Dtype float
-#define Dtype4 float4
-#define Dtype8 float8
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable


Does this break kernel compilation if fp16 does not supported and we request "float" only?

add #if defined (cl_khr_fp16) before using the extension

pengli · 2018-05-15T09:02:26Z

@alalek , it is strange that the windows OCL buildbot is failed, IIRC, it runs successfully with the same code before.

alalek · 2018-05-15T09:09:03Z

Don't worry, looks like it is related to OpenCL runtime .

We have two build machines:

windows-1: Graphics driver is installed via regular Windows Update
windows-2: Graphics drives is installed from downloadcenter.intel.com

Builds on windows-1 are fine.

Currently tests fail on windows-2 machine only. But looks like driver version is not the latest: 23.20.16.4849
I will try to update this driver.

Signed-off-by: Li Peng <peng.li@intel.com>

pengli · 2018-05-16T14:59:11Z

resolve conflict with 3.4

pengli force-pushed the dnn_half branch 3 times, most recently from de0b4a6 to d74705a Compare April 27, 2018 05:44

dkurt reviewed Apr 27, 2018

View reviewed changes

pengli force-pushed the dnn_half branch 16 times, most recently from ac38568 to 3fb53e5 Compare May 3, 2018 08:04

dkurt reviewed May 3, 2018

View reviewed changes

pengli force-pushed the dnn_half branch 4 times, most recently from 497120d to 2b5e0ee Compare May 4, 2018 03:19

tomoaki0705 mentioned this pull request May 10, 2018

dnn: work around of the test failure of opencv_test_dnn #11494

Merged

dkurt reviewed May 10, 2018

View reviewed changes

pengli force-pushed the dnn_half branch 3 times, most recently from cf0954b to 3248a0b Compare May 11, 2018 02:46

pengli changed the base branch from master to 3.4 May 14, 2018 13:41

pengli force-pushed the dnn_half branch from 3248a0b to db58bb0 Compare May 14, 2018 13:42

alalek reviewed May 14, 2018

View reviewed changes

pengli force-pushed the dnn_half branch 4 times, most recently from 03afd37 to 075a883 Compare May 15, 2018 08:09

pli2-intel added 4 commits May 16, 2018 22:44

dnn fp16 support

329abb5

Signed-off-by: Li Peng <peng.li@intel.com>

fp16 ocl support for googlenet

3dd9168

Signed-off-by: Li Peng <peng.li@intel.com>

fp16 ocl support for more layers

ba5e8be

Signed-off-by: Li Peng <peng.li@intel.com>

add fp16 accuracy and perf test

1b517a4

Signed-off-by: Li Peng <peng.li@intel.com>

alalek mentioned this pull request May 21, 2018

Merge 3.4 #11556

Merged

tomoaki0705 mentioned this pull request May 21, 2018

dnn: relax intel only ocl4 dnn #11557

Merged

alalek mentioned this pull request Dec 15, 2020

DNN(OpenCL): fix GEMM kernels with beta=0 #19114

Merged

		@@ -302,17 +302,18 @@ TEST(Test_TensorFlow, defun)

		TEST(Test_TensorFlow, fp16)

Uh oh!

Conversation

pengli commented Apr 26, 2018 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkurt commented May 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkurt commented May 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkurt commented May 11, 2018

Uh oh!

pengli commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented May 14, 2018

Uh oh!

pengli commented May 14, 2018

Uh oh!

alalek commented May 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengli May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengli commented May 15, 2018

Uh oh!

alalek commented May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pengli commented May 16, 2018

Uh oh!

Reviewers

Assignees

pengli commented Apr 26, 2018 •

edited by alalek

Loading

dkurt commented May 10, 2018 •

edited

Loading

pengli commented May 14, 2018 •

edited

Loading

pengli May 15, 2018 •

edited

Loading

alalek commented May 15, 2018 •

edited

Loading