implementation for dnn fp16 ocl support#11397
Conversation
de0b4a6 to
d74705a
Compare
modules/dnn/src/dnn.cpp
Outdated
| Mat blob_ = blob.getMat(); | ||
| Mat blob_; | ||
| if (impl->preferableTarget == DNN_TARGET_OPENCL && | ||
| impl->preferablePrecision == DNN_PRECISION_FP16) |
There was a problem hiding this comment.
I think we can use just DNN_TARGET_OPENCL_FP16 target instead of separate precisions enum.
There was a problem hiding this comment.
yes, this flag should be good. patch is updated.
| heights.copyTo(umat_heights); | ||
| if (use_half) | ||
| { | ||
| convertFp16(offsetsX, umat_offsetsX); |
There was a problem hiding this comment.
All non-weights hyper-parameters should be in the origin precision.
There was a problem hiding this comment.
switched back to origin precision for non-weights parameter.
ac38568 to
3fb53e5
Compare
| std::vector<UMat> inputs; | ||
| std::vector<UMat> outputs; | ||
|
|
||
| bool use_half = (inps.depth() == CV_16S); |
There was a problem hiding this comment.
PriorBox layer does not use input data. It generates a fixed set of bounding boxes. So I think we need to keep it's output in single precision floats because we can face significant accuracy loss.
There was a problem hiding this comment.
thanks for the review, I keep the float precision for prior_box layer.
| @@ -302,17 +302,18 @@ TEST(Test_TensorFlow, defun) | |||
|
|
|||
| TEST(Test_TensorFlow, fp16) | |||
There was a problem hiding this comment.
Can we make this test parametric and add to Test_TensorFlow_layers group?
There was a problem hiding this comment.
I changed the fp16 test, make it use DNN_TARGET as the parameter.
|
There are some performance measurements for CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8 |
|
|
||
| if (preferableTarget == DNN_TARGET_OPENCL_FP16) | ||
| { | ||
| convertFp16(ld.outputBlobs[pin.oid], output_blob); |
There was a problem hiding this comment.
I think it's better to check if ld.outputBlobs[pin.oid] contains fp16 values rather preferableTarget == DNN_TARGET_OPENCL_FP16 because if ld.outputBlobs[pin.oid] has fp32 type output_blob will has fp16 one.
| blobManager.allocateBlobsForLayer(ld, layerShapesIt->second, pinsForInternalBlobs, | ||
| preferableBackend == DNN_BACKEND_INFERENCE_ENGINE); | ||
| preferableBackend == DNN_BACKEND_INFERENCE_ENGINE, | ||
| preferableTarget == DNN_TARGET_OPENCL_FP16); |
There was a problem hiding this comment.
We need to allocate halfs if preferableTarget == DNN_TARGET_OPENCL_FP16 and preferableBackend == DNN_BACKEND_DEFAULT because there is one more backend (Intel's Inference Engine) which supports FP16 computations but accepts inputs and outputs in FP32.
modules/dnn/test/test_backends.cpp
Outdated
| net.setHalideScheduler(halideScheduler); | ||
| } | ||
|
|
||
| net.setInput(inp); |
There was a problem hiding this comment.
Can we convert input blob to FP16 at the network's initialization stage (i.e. setUpNet)? Before the first forward call we could call any net.set* methods in any order.
There was a problem hiding this comment.
code is updated, net.set* can be in any order before forward call.
497120d to
2b5e0ee
Compare
|
@pengli, Looks like I measured YOLOv3 's efficiency wrongly in previous posts. See actual numbers below.
CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8 |
modules/dnn/test/test_backends.cpp
Outdated
| throw SkipTestException(""); | ||
| Mat sample = imread(findDataFile("dnn/street.png", false)); | ||
| Mat inp = blobFromImage(sample, 1.0f / 127.5, Size(300, 300), Scalar(127.5, 127.5, 127.5), false); | ||
| float l1 = (target == DNN_TARGET_OPENCL_FP16) ? 0.0007 : 0.0; |
There was a problem hiding this comment.
Please replace to backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16 .
modules/dnn/test/test_backends.cpp
Outdated
| throw SkipTestException(""); | ||
| Mat sample = imread(findDataFile("dnn/street.png", false)); | ||
| Mat inp = blobFromImage(sample, 1.0f / 127.5, Size(300, 300), Scalar(127.5, 127.5, 127.5), false); | ||
| float l1 = (target == DNN_TARGET_OPENCL_FP16) ? 0.008 : 0.0; |
There was a problem hiding this comment.
The same, backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16 .
|
|
||
| Mat ref = blobFromNPY(_tf("mobilenet_ssd_caffe_out.npy")); | ||
| normAssertDetections(ref, out); | ||
| normAssertDetections(ref, out, "", 0.0, 4e-4, 5e-3); |
There was a problem hiding this comment.
Please keep default values 1e-5 and 1e-4 for non-DNN_TARGET_OPENCL_FP16 targets.
cf0954b to
3248a0b
Compare
|
@alalek,hi, any feedback? |
|
Please rebase this patch on So, please:
|
|
@alalek , done, code is rebased onto 3.4 branch. btw, will you also merge this patchset into master ? |
|
Yes, via regular 3.4 => master merges (weekly/bi-weekly). |
| DNN_TARGET_OPENCL_FP16 | ||
| }; | ||
|
|
||
| #define IS_DNN_OPENCL_TARGET(id) (id == DNN_TARGET_OPENCL || id == DNN_TARGET_OPENCL_FP16) |
There was a problem hiding this comment.
We should not garbage global macro namespace. So please:
- add
CV_prefix - or move this into src/precomp.hpp file (preferable)
modules/dnn/perf/perf_net.cpp
Outdated
| { | ||
| if (backend == DNN_BACKEND_INFERENCE_ENGINE) throw SkipTestException(""); | ||
| if (backend == DNN_BACKEND_INFERENCE_ENGINE || | ||
| backend == DNN_BACKEND_DEFAULT && target == DNN_TARGET_OPENCL_FP16) |
There was a problem hiding this comment.
Please use more brackets to make static code analyzers happy.
| #define Dtype float | ||
| #define Dtype4 float4 | ||
| #define Dtype8 float8 | ||
| #pragma OPENCL EXTENSION cl_khr_fp16 : enable |
There was a problem hiding this comment.
Does this break kernel compilation if fp16 does not supported and we request "float" only?
There was a problem hiding this comment.
add #if defined (cl_khr_fp16) before using the extension
03afd37 to
075a883
Compare
|
@alalek , it is strange that the windows OCL buildbot is failed, IIRC, it runs successfully with the same code before. |
|
Don't worry, looks like it is related to OpenCL runtime . We have two build machines:
Builds on windows-1 are fine. Currently tests fail on windows-2 machine only. But looks like driver version is not the latest: 23.20.16.4849 |
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
|
resolve conflict with 3.4 |
This PR is for the feature "Adding FP16 path in DNN" at #11009