Skip to content

implementation for dnn fp16 ocl support#11397

Merged
opencv-pushbot merged 4 commits intoopencv:3.4from
pengli:dnn_half
May 16, 2018
Merged

implementation for dnn fp16 ocl support#11397
opencv-pushbot merged 4 commits intoopencv:3.4from
pengli:dnn_half

Conversation

@pengli
Copy link
Copy Markdown

@pengli pengli commented Apr 26, 2018

This PR is for the feature "Adding FP16 path in DNN" at #11009

buildworker:Win64 OpenCL=windows-2
allow_multiple_commits=1

@pengli pengli force-pushed the dnn_half branch 3 times, most recently from de0b4a6 to d74705a Compare April 27, 2018 05:44
Mat blob_ = blob.getMat();
Mat blob_;
if (impl->preferableTarget == DNN_TARGET_OPENCL &&
impl->preferablePrecision == DNN_PRECISION_FP16)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use just DNN_TARGET_OPENCL_FP16 target instead of separate precisions enum.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this flag should be good. patch is updated.

heights.copyTo(umat_heights);
if (use_half)
{
convertFp16(offsetsX, umat_offsetsX);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All non-weights hyper-parameters should be in the origin precision.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched back to origin precision for non-weights parameter.

@pengli pengli force-pushed the dnn_half branch 16 times, most recently from ac38568 to 3fb53e5 Compare May 3, 2018 08:04
std::vector<UMat> inputs;
std::vector<UMat> outputs;

bool use_half = (inps.depth() == CV_16S);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PriorBox layer does not use input data. It generates a fixed set of bounding boxes. So I think we need to keep it's output in single precision floats because we can face significant accuracy loss.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review, I keep the float precision for prior_box layer.

@@ -302,17 +302,18 @@ TEST(Test_TensorFlow, defun)

TEST(Test_TensorFlow, fp16)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this test parametric and add to Test_TensorFlow_layers group?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the fp16 test, make it use DNN_TARGET as the parameter.

@dkurt
Copy link
Copy Markdown
Member

dkurt commented May 3, 2018

There are some performance measurements for DNN_TARGET_OPENCL and DNN_TARGET_OPENCL_FP16 targets:

                              Name of Test                                 fp32     fp16      fp16   
                                                                                               vs    
                                                                                              fp32   
                                                                                           (x-factor)
AlexNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                            15.358   13.377     1.15   
DenseNet_121::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                       67.542  105.010     0.64   
ENet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                               27.672   failed      -     
GoogLeNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                          19.509   27.480     0.71   
Inception_5h::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                       21.778   29.363     0.74   
Inception_v2_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT)        66.978   84.560     0.79   
MobileNet_SSD_Caffe::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                18.848   25.258     0.75   
MobileNet_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT)          skipped   34.681      -     
OpenFace::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                           9.639    13.296     0.72   
OpenPose_pose_coco::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                1055.318 1170.794    0.90   
OpenPose_pose_mpi::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                 1046.968 1156.758    0.91   
OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 752.793  811.864     0.93   
ResNet_50::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                          35.452  101.747     0.35   
SSD::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                               393.071  347.915     1.13   
SqueezeNet_v1_1::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                    5.956    8.726      0.68   
YOLOv3::DNNTestNetwork::(DNN_BACKEND_DEFAULT)                            346.168  440.524     0.79   
opencv_face_detector::DNNTestNetwork::(DNN_BACKEND_DEFAULT)               28.412   33.469     0.85   

CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8
GPU: Intel® HD Graphics 530 (Skylake GT2)


if (preferableTarget == DNN_TARGET_OPENCL_FP16)
{
convertFp16(ld.outputBlobs[pin.oid], output_blob);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to check if ld.outputBlobs[pin.oid] contains fp16 values rather preferableTarget == DNN_TARGET_OPENCL_FP16 because if ld.outputBlobs[pin.oid] has fp32 type output_blob will has fp16 one.

blobManager.allocateBlobsForLayer(ld, layerShapesIt->second, pinsForInternalBlobs,
preferableBackend == DNN_BACKEND_INFERENCE_ENGINE);
preferableBackend == DNN_BACKEND_INFERENCE_ENGINE,
preferableTarget == DNN_TARGET_OPENCL_FP16);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to allocate halfs if preferableTarget == DNN_TARGET_OPENCL_FP16 and preferableBackend == DNN_BACKEND_DEFAULT because there is one more backend (Intel's Inference Engine) which supports FP16 computations but accepts inputs and outputs in FP32.

net.setHalideScheduler(halideScheduler);
}

net.setInput(inp);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we convert input blob to FP16 at the network's initialization stage (i.e. setUpNet)? Before the first forward call we could call any net.set* methods in any order.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code is updated, net.set* can be in any order before forward call.

@pengli pengli force-pushed the dnn_half branch 4 times, most recently from 497120d to 2b5e0ee Compare May 4, 2018 03:19
@dkurt
Copy link
Copy Markdown
Member

dkurt commented May 10, 2018

@pengli, Looks like I measured YOLOv3 's efficiency wrongly in previous posts. See actual numbers below.

Name of Test fp32 fp16 x-factor
AlexNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 15.610 9.952 1.57
DenseNet_121::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 66.527 86.417 0.77
ENet::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 27.526 skipped -
GoogLeNet::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 19.631 19.336 1.02
Inception_5h::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 21.723 21.434 1.01
Inception_v2_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 66.995 76.540 0.88
MobileNet_SSD_Caffe::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 18.837 23.211 0.81
MobileNet_SSD_TensorFlow::DNNTestNetwork::(DNN_BACKEND_DEFAULT) skipped 26.919 -
OpenFace::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 9.452 11.308 0.84
OpenPose_pose_coco::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 1051.954 1160.058 0.91
OpenPose_pose_mpi::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 1034.653 1145.306 0.90
OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 744.890 805.034 0.93
ResNet_50::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 35.714 38.967 0.92
SSD::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 387.188 344.652 1.12
SqueezeNet_v1_1::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 5.975 7.126 0.84
YOLOv3::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 286.128 285.209 1.00
opencv_face_detector::DNNTestNetwork::(DNN_BACKEND_DEFAULT) 26.438 31.126 0.85

CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8
GPU: Intel® HD Graphics 530 (Skylake GT2)

throw SkipTestException("");
Mat sample = imread(findDataFile("dnn/street.png", false));
Mat inp = blobFromImage(sample, 1.0f / 127.5, Size(300, 300), Scalar(127.5, 127.5, 127.5), false);
float l1 = (target == DNN_TARGET_OPENCL_FP16) ? 0.0007 : 0.0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace to backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16 .

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thank you

throw SkipTestException("");
Mat sample = imread(findDataFile("dnn/street.png", false));
Mat inp = blobFromImage(sample, 1.0f / 127.5, Size(300, 300), Scalar(127.5, 127.5, 127.5), false);
float l1 = (target == DNN_TARGET_OPENCL_FP16) ? 0.008 : 0.0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16 .

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thank you


Mat ref = blobFromNPY(_tf("mobilenet_ssd_caffe_out.npy"));
normAssertDetections(ref, out);
normAssertDetections(ref, out, "", 0.0, 4e-4, 5e-3);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep default values 1e-5 and 1e-4 for non-DNN_TARGET_OPENCL_FP16 targets.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thank you

@pengli pengli force-pushed the dnn_half branch 3 times, most recently from cf0954b to 3248a0b Compare May 11, 2018 02:46
@dkurt
Copy link
Copy Markdown
Member

dkurt commented May 11, 2018

👍 Looks good to me.
@alalek, Should we choose base branch 3.4 or master according to evolution proposal #11009?

@pengli
Copy link
Copy Markdown
Author

pengli commented May 14, 2018

@alalek,hi, any feedback?

@alalek
Copy link
Copy Markdown
Member

alalek commented May 14, 2018

Please rebase this patch on 3.4 branch (almost all DNN patches comes into 3.4, so we want to minimize future merge conflicts)

So, please:

  • change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
  • rebase your commits from master onto 3.4 branch. For example:
    git rebase -i --onto upstream/3.4 upstream/master
    (check list of your commits, save and quit (Esc + "wq" + Enter)
    where upstream is configured by following this GitHub guide and fetched (git fetch upstream).
  • push rebased commits into source branch of your fork (with --force option)

@pengli pengli changed the base branch from master to 3.4 May 14, 2018 13:41
@pengli
Copy link
Copy Markdown
Author

pengli commented May 14, 2018

@alalek , done, code is rebased onto 3.4 branch.

btw, will you also merge this patchset into master ?

@alalek
Copy link
Copy Markdown
Member

alalek commented May 14, 2018

Yes, via regular 3.4 => master merges (weekly/bi-weekly).

DNN_TARGET_OPENCL_FP16
};

#define IS_DNN_OPENCL_TARGET(id) (id == DNN_TARGET_OPENCL || id == DNN_TARGET_OPENCL_FP16)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not garbage global macro namespace. So please:

  • add CV_ prefix
  • or move this into src/precomp.hpp file (preferable)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed,moved to src/precomp.hpp

{
if (backend == DNN_BACKEND_INFERENCE_ENGINE) throw SkipTestException("");
if (backend == DNN_BACKEND_INFERENCE_ENGINE ||
backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use more brackets to make static code analyzers happy.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#define Dtype float
#define Dtype4 float4
#define Dtype8 float8
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break kernel compilation if fp16 does not supported and we request "float" only?

Copy link
Copy Markdown
Author

@pengli pengli May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add #if defined (cl_khr_fp16) before using the extension

@pengli pengli force-pushed the dnn_half branch 4 times, most recently from 03afd37 to 075a883 Compare May 15, 2018 08:09
@pengli
Copy link
Copy Markdown
Author

pengli commented May 15, 2018

@alalek , it is strange that the windows OCL buildbot is failed, IIRC, it runs successfully with the same code before.

@alalek
Copy link
Copy Markdown
Member

alalek commented May 15, 2018

Don't worry, looks like it is related to OpenCL runtime .

We have two build machines:

  • windows-1: Graphics driver is installed via regular Windows Update
  • windows-2: Graphics drives is installed from downloadcenter.intel.com

Builds on windows-1 are fine.

Currently tests fail on windows-2 machine only. But looks like driver version is not the latest: 23.20.16.4849
I will try to update this driver.

Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
@pengli
Copy link
Copy Markdown
Author

pengli commented May 16, 2018

resolve conflict with 3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants