-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
cv::dnn::Net::forward() returns wrong timings #20077
Copy link
Copy link
Closed
Labels
category: dnncategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contribOpenCV 4.0+: moved to opencv_contrib
Description
I tried OpenCV 4.5.2-dev on Ubuntu 18.04 and 20.04 with gcc-9, CUDA 11 and NVidia GPUs. This code snippet:
int main() {
using namespace cv::dnn;
Net net = readNetFromDarknet("/home/paul/darknet/cfg/yolov3.cfg", "/home/paul/darknet/yolov3.weights");
net.setPreferableBackend(DNN_BACKEND_CUDA);
net.setPreferableTarget(DNN_TARGET_CUDA);
for (int i = 0; i < 7; i++) {
Mat frame = imread(format("/home/paul/data/s{}.jpg", i));
vector<double> timings;
vector<Mat> preds;
Mat blob = blobFromImage(frame, 1/255.0, cv::Size(608, 608), Scalar(0,0,0), true, false);
net.setInput(blob);
auto t0{high_resolution_clock::now()};
net.forward(preds, net.getUnconnectedOutLayersNames());
auto t1{high_resolution_clock::now()};
print("{:.4f}s vs. {:.4f}s {} preds\n", net.getPerfProfile(timings)/getTickFrequency(), duration<float>{t1-t0}.count(), preds.size());
}
}
produces this output on GTX 1650 Super GPU:
0.2857s vs. 1.2992s 3 preds
0.0038s vs. 0.0565s 3 preds
0.0038s vs. 0.0589s 3 preds
0.0039s vs. 0.0571s 3 preds
0.0039s vs. 0.0520s 3 preds
0.0040s vs. 0.0544s 3 preds
0.0037s vs. 0.0510s 3 preds
and this one on GT 730 GPU:
0.1767s vs. 1.8613s 3 preds
0.0019s vs. 0.8146s 3 preds
0.0023s vs. 0.7725s 3 preds
0.0023s vs. 0.8310s 3 preds
0.0021s vs. 0.8263s 3 preds
0.0021s vs. 0.8203s 3 preds
0.0021s vs. 0.7628s 3 preds
The timings produced by cv::dnn::Net::forward() are more than an order of magnitude too short for a fast GPU and two orders of magnitude too short for a slow GPU.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
category: dnncategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contribOpenCV 4.0+: moved to opencv_contrib