Flush to zero Convolution denormal weights by dkurt · Pull Request #17295 · opencv/opencv

dkurt · 2020-05-14T20:04:45Z

model from #17259

(base) model_1: 56.99ms
(base) model_2: 170.85ms
(ftz) model_1: 49.18ms
(ftz) model_2: 49.33ms

import numpy as np
import cv2 as cv
import time

net = cv.dnn.readNet('model_1.prototxt', 'model_1.caffemodel')
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
inp = np.random.standard_normal([1, 3, 112, 112]).astype(np.float32)
net.setInput(inp)
net.forward()

speeds = []
for i in range(10):
    start = time.time()
    net.forward()
    speeds.append((time.time() - start) * 1000)
print(np.median(speeds))

net = cv.dnn.readNet('model_1.prototxt', 'model_2.caffemodel')
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
inp = np.random.standard_normal([1, 3, 112, 112]).astype(np.float32)
net.setInput(inp)
net.forward()

speeds = []
for i in range(10):
    start = time.time()
    net.forward()
    speeds.append((time.time() - start) * 1000)
print(np.median(speeds))

opencv_perf_dnn:

Median (ms)

                       Name of Test                         base     ftz      ftz    
                                                                               vs    
                                                                              base   
                                                                           (x-factor)
AlexNet::DNNTestNetwork::OCV/CPU                           14.280  14.235     1.00   
DenseNet_121::DNNTestNetwork::OCV/CPU                      39.178  39.567     0.99   
EAST_text_detection::DNNTestNetwork::OCV/CPU               69.456  69.436     1.00   
ENet::DNNTestNetwork::OCV/CPU                              44.60   23.26      1.91   (separate run)
FastNeuralStyle_eccv16::DNNTestNetwork::OCV/CPU            125.432 124.266    1.01   
GoogLeNet::DNNTestNetwork::OCV/CPU                         15.315  15.273     1.00   
Inception_5h::DNNTestNetwork::OCV/CPU                      16.713  16.819     0.99   
Inception_v2_Faster_RCNN::DNNTestNetwork::OCV/CPU          286.398 290.170    0.99   
Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/CPU       43.136  43.012     1.00   
MobileNet_SSD_Caffe::DNNTestNetwork::OCV/CPU               20.919  20.866     1.00   
MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/CPU       22.510  22.410     1.00   
MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/CPU       31.676  31.742     1.00   
OpenFace::DNNTestNetwork::OCV/CPU                           3.922   3.968     0.99   
OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/CPU 607.502 619.329    0.98   
ResNet_50::DNNTestNetwork::OCV/CPU                         36.333  35.966     1.01   
SSD::DNNTestNetwork::OCV/CPU                               270.494 272.510    0.99   
SqueezeNet_v1_1::DNNTestNetwork::OCV/CPU                    3.918   3.909     1.00   
YOLOv3::DNNTestNetwork::OCV/CPU                            212.866 210.719    1.01   
opencv_face_detector::DNNTestNetwork::OCV/CPU              13.940  14.016     0.99

CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz x8

$ cat /proc/cpuinfo | grep "MHz"
cpu MHz         : 4000.102
cpu MHz         : 4002.573
cpu MHz         : 4179.525
cpu MHz         : 4009.356
cpu MHz         : 3998.455
cpu MHz         : 4001.585
cpu MHz         : 4000.148
cpu MHz         : 4181.959

force_builders=Custom,Custom Win,Custom Mac
build_image:Custom=ubuntu-openvino-2020.2.0:16.04
build_image:Custom Win=openvino-2020.2.0
build_image:Custom Mac=openvino-2020.2.0

test_modules:Custom=dnn,python2,python3,java
test_modules:Custom Win=dnn,python2,python3,java
test_modules:Custom Mac=dnn,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
test_opencl:Custom=OFF
test_bigdata:Custom=1
test_filter:Custom=*

YashasSamaga · 2020-05-15T07:30:32Z

1e-15 in that post was arbitrary. Why not simply enable FTZ in the hardware before convolving?

already_enabled = is_ftz_enabled()
enable_ftz()

// do convolution

if (!already_enabled)
    disable_ftz();

It's possible that there are no denormals in the weights (can check using std::fpclassify) but are being generated during convolution. This can still happen with clipped weights if some input to the convolution layer happens to be really small. So if an attempt is being made to avoid denormals, why not just ignore them completely instead of allowing some amount of denormals?

dkurt · 2020-05-15T07:38:22Z

@YashasSamaga, that's good point but we need more experiment with it. It's hard to determine // do convolution because we have multiple CPU backends which receive fused weights as parameters (TEngine, Intel Inference Engine).

Update: there is FTZ right in the MKL-DNN source code. Ok, I'll do as you recommended, thanks!

JulienMaille · 2020-05-15T20:34:43Z

just FYI, I gave this patch a try and my inference time is 10% slower with the patch (from 92ms to 102ms)
Before this patch, I also tried compiling with FAST_MATH and inference time was unchanged (92ms)

dkurt · 2020-05-15T20:46:09Z

@JulienMaille, I just updated PR. Have you tried version with manual flushing or with intrinsics? Can you provide a reproducer?

JulienMaille · 2020-05-15T21:03:19Z

I tried this:

            // Flush to zero (FTZ) denormal weights: https://github.com/opencv/opencv/issues/17259
            Mat mask = abs(weightsMat) <= 1e-15f;
            weightsMat.setTo(0.0f, mask & (weightsMat > 0.0f));
            weightsMat.setTo(-0.0f, mask & (weightsMat < 0.0f));

Will try the new one and report

JulienMaille · 2020-05-15T21:19:34Z

Same 10% slow down. Do you want an onnx to reproduce?

dkurt · 2020-05-16T07:37:55Z

@JulienMaille, can you share efficiency measurement approach? Here an example with min, median, mean estimation with your model (several runs):

	min	median	mean	std
baseline	79.30ms	80.55ms	81.24ms	3.77
baseline	78.71ms	79.43ms	79.99ms	3.06
baseline	77.19ms	78.43ms	78.97ms	3.18
FTZ	77.85ms	79.33ms	79.86ms	3.26
FTZ	78.77ms	79.46ms	80.02ms	3.00
FTZ	77.73ms	78.64ms	79.14ms	3.05

import numpy as np
import cv2 as cv
import time
print(cv.__file__)

net = cv.dnn.readNet('model.onnx')
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
inp = np.random.standard_normal([1, 1, 256, 256]).astype(np.float32)
net.setInput(inp)
net.forward()

speeds = []
for i in range(1000):
    start = time.time()
    net.forward()
    speeds.append((time.time() - start) * 1000)
print('%.2fms|%.2fms|%.2fms|%.2f' % (np.min(speeds), np.median(speeds), np.mean(speeds), np.std(speeds)))

JulienMaille · 2020-05-16T12:25:53Z

I can do that, the numbers I gave were the average on 20 runs, with the first run discarded.
I'm working with the c++ api, is there a way to access "nightly" python builds with openvino? That would make it easier for testing purpose.

dkurt · 2020-05-16T13:08:42Z

@JulienMaille, you don't need OpenVINO - the changes in this patch won't affect it - it's for default implementation, DNN_BACKEND_OPENCV.

JulienMaille · 2020-05-16T13:08:57Z

Seems like the difference was due to the cpu being busier when I ran the FTZ benchs.
Even when running 100 loops I can get a 10ms difference on avg/median.
Best run for both baseline and FTZ give the same results

avg: 90.7, min: 88, median: 91, std: 2.75

JulienMaille · 2020-05-16T13:17:00Z

Uh, sorry. Ok so for reference (100 loops):

INFERENCE_ENGINE    avg:  90.7, min:  88, median:  91, std: 2.75
BACKEND_OPENCV base avg: 137.5, min: 132, median: 137, std: 5.59
                    avg: 137.4, min: 135, median: 138, std: 5.37
                    avg: 137.2, min: 136, median: 137, std: 5.50
BACKEND_OPENCV FTZ  avg: 139.7, min: 134, median: 138, std: 7.32
                    avg: 137.9, min: 132, median: 137, std: 5.28
                    avg: 137.7, min: 134, median: 138, std: 5.41

asmorkalov · 2020-05-22T12:38:58Z

@dkurt Do you have final decision on the patch?

dkurt · 2020-05-22T16:47:44Z

@asmorkalov , yes, it's a final version.

@alalek, you mentioned that there is also a way to apply FTZ optimization for ARM CPUs, shall I add it here?

alalek

Thank you!

YashasSamaga · 2020-08-22T18:45:28Z

What is the scope of the DAZ and FTZ modes? Do they have to be set for each thread?
What if an exception is thrown during convolution or an early return is made (OCL is doing it)? The DAZ and FTZ modes won't be reset (and hence alter the modes in the end-users' thread that invoked net.forward()?).

An RAII based solution would solve problem 2. The RAII objects will automatically reset the FTZ and DAZ modes after an exception or before a return. It's safer and future-proof.

The following posts suggest that the FTZ and DAZ modes have to be set for each thread:

I don't know how ParallelLoopBody works but if it's reusing worker threads from a global thread pool, the threads might have their own FTZ and DAZ modes.

dkurt changed the title ~~Flush to zero Convolution activation weights~~ Flush to zero Convolution denormal weights May 14, 2020

dkurt linked an issue May 14, 2020 that may be closed by this pull request

same dnn model with different params, the speed is different #17259

Closed

dkurt mentioned this pull request May 14, 2020

same dnn model with different params, the speed is different #17259

Closed

dkurt force-pushed the dnn_fusion_ftz branch from a0240b4 to be03a07 Compare May 15, 2020 20:26

Flush to zero Convolution denormal weights

68d59a2

dkurt force-pushed the dnn_fusion_ftz branch from be03a07 to 68d59a2 Compare May 15, 2020 20:45

asmorkalov added category: dnn optimization pr: Discussion Required labels May 18, 2020

alalek approved these changes May 22, 2020

View reviewed changes

opencv-pushbot merged commit a9b0305 into opencv:3.4 May 22, 2020

This was referenced May 22, 2020

Merge 3.4 #17351

Merged

dnn: *_DENORMALS_ZERO_MODE is defined for SSE3 #17374

Merged

dkurt mentioned this pull request May 29, 2020

OpenVINO 2020.1 doesn't work on Raspberry Pi and NCS2 openvinotoolkit/openvino#411

Closed

dkurt mentioned this pull request Aug 10, 2020

dnn: add exhaustive fusion tests, enable more eltwise fusions #17976

Merged

6 tasks

dkurt deleted the dnn_fusion_ftz branch December 7, 2020 10:59

alalek mentioned this pull request Nov 1, 2021

DNN ONNX backend - Slower model when weights are close to 0, trained with Adam Optimizer (Pytorch) #20985

Open

4 tasks

YashasSamaga mentioned this pull request Nov 12, 2021

FTZ and DAZ modes are incorrectly configured for convolution #21046

Closed

4 tasks

alalek mentioned this pull request Jan 26, 2022

dnn: apply hint to ignore denormals processing #21521

Merged

Uh oh!

Conversation

dkurt commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YashasSamaga commented May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkurt commented May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented May 15, 2020

Uh oh!

dkurt commented May 15, 2020

Uh oh!

JulienMaille commented May 15, 2020

Uh oh!

JulienMaille commented May 15, 2020

Uh oh!

dkurt commented May 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented May 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkurt commented May 16, 2020

Uh oh!

JulienMaille commented May 16, 2020

Uh oh!

JulienMaille commented May 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented May 22, 2020

Uh oh!

dkurt commented May 22, 2020

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

YashasSamaga commented Aug 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dkurt commented May 14, 2020 •

edited

Loading

YashasSamaga commented May 15, 2020 •

edited

Loading

dkurt commented May 15, 2020 •

edited

Loading

dkurt commented May 16, 2020 •

edited

Loading

JulienMaille commented May 16, 2020 •

edited

Loading

JulienMaille commented May 16, 2020 •

edited

Loading

YashasSamaga commented Aug 22, 2020 •

edited

Loading