Flush to zero Convolution denormal weights#17295
Conversation
|
It's possible that there are no denormals in the weights (can check using |
|
@YashasSamaga, that's good point but we need more experiment with it. It's hard to determine Update: there is FTZ right in the MKL-DNN source code. Ok, I'll do as you recommended, thanks! |
|
just FYI, I gave this patch a try and my inference time is 10% slower with the patch (from 92ms to 102ms) |
|
@JulienMaille, I just updated PR. Have you tried version with manual flushing or with intrinsics? Can you provide a reproducer? |
|
I tried this: Will try the new one and report |
|
Same 10% slow down. Do you want an onnx to reproduce? |
|
@JulienMaille, can you share efficiency measurement approach? Here an example with min, median, mean estimation with your model (several runs):
import numpy as np
import cv2 as cv
import time
print(cv.__file__)
net = cv.dnn.readNet('model.onnx')
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
inp = np.random.standard_normal([1, 1, 256, 256]).astype(np.float32)
net.setInput(inp)
net.forward()
speeds = []
for i in range(1000):
start = time.time()
net.forward()
speeds.append((time.time() - start) * 1000)
print('%.2fms|%.2fms|%.2fms|%.2f' % (np.min(speeds), np.median(speeds), np.mean(speeds), np.std(speeds))) |
|
I can do that, the numbers I gave were the average on 20 runs, with the first run discarded. |
|
@JulienMaille, you don't need OpenVINO - the changes in this patch won't affect it - it's for default implementation, |
|
Seems like the difference was due to the cpu being busier when I ran the FTZ benchs.
|
|
Uh, sorry. Ok so for reference (100 loops): |
|
@dkurt Do you have final decision on the patch? |
|
@asmorkalov , yes, it's a final version. @alalek, you mentioned that there is also a way to apply FTZ optimization for ARM CPUs, shall I add it here? |
An RAII based solution would solve problem 2. The RAII objects will automatically reset the FTZ and DAZ modes after an exception or before a return. It's safer and future-proof. The following posts suggest that the FTZ and DAZ modes have to be set for each thread:
I don't know how |
model from #17259
opencv_perf_dnn:CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz x8