-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Performance Loss from OpenCV 4.5.5 to 4.7.0 using CUDA backend #23278
Description
System Information
OpenCV versions tested: 4.5.5, 4.7.0
Operating System / Platform: Ubuntu 18.04
Device: NVIDIA Jetson TX2 DevKit
CUDA version: 10.2
CUDNN version: 8.2.1
Detailed description
Hi,
I was using OpenCV 4.5.5, backend CUDA on a NVIDIA Jetson TX2 Devkit with the specs defined above. A couple of days I decided to update to OpenCV 4.7.0 to check if I had some boost in performance for the models I'm currently using. However what I did saw was a performance loss (in terms of execution time) for the majority of the models. Do you know what is the reason for this loss of performance?
This is the execution times obtained for both versions of OpenCV:
Test 1
- Device: TX2 DevKit
- CUDA version: 10.2
- CUDNN version: 8.2.1
- OpenCV version: 4.7.0
| Version | Model 1 | Model 2 | Model 3 | Model 4 |
|---|---|---|---|---|
| Input Size | (112, 112) | (112, 112) | (112, 112) | (112, 112) |
| Model Architecture | Resnet100 | MobileFaceNet | Resnet100 | Resnet18 |
| Jetson CPU | 702 | 20.5 | 699 | 167 |
| Jetson GPU | 91.7 | 10.5 | 91.6 | 52.2 |
Test 2
- Device: TX2 DevKit
- CUDA version: 10.2
- CUDNN version: 8.2.1
- OpenCV version: 4.5.5
| Version | Model 1 | Model 2 | Model 3 | Model 4 |
|---|---|---|---|---|
| Input Size | (112, 112) | (112, 112) | (112, 112) | (112, 112) |
| Model Architecture | Resnet100 | MobileFaceNet | Resnet100 | Resnet18 |
| Jetson CPU | 1088 | 23.1 | 1096 | 257 |
| Jetson GPU | 60.9 | 5.34 | 60.7 | 19.9 |
Note: Both tests were built with the same OpenCV flags and requirements, the only thing that changed was the version of both opencv and opencv_contrib. Moreover, all the execution times presented in those tables are in ms.
Steps to reproduce
You can use this piece of code to reproduce this issue/loss of performance:
#include <thread>
#include <fstream>
#include <opencv2/imgproc.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
int main(int argc, char** argv)
{
auto imageToTest = argv[1];
auto modelToTest = argv[2];
int modelInputWidth = atoi(argv[3]);
int modelInputHeight = atoi(argv[4]);
cv::Size currSize = cv::Size(modelInputWidth, modelInputHeight);
std::string modelToTestOnnx = modelToTest;
std::string imagefilename = imageToTest;
unsigned int num_inferences = 100;
cv::dnn::Net net = cv::dnn::readNetFromONNX(modelToTestOnnx);
net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
cv::Mat img = cv::imread(imagefilename, cv::IMREAD_ANYCOLOR);
cv::Mat resized;
cv::resize(img, resized, currSize);
std::vector<cv::Mat> imgBatch = { resized };
bool swaprbchannels = false;
cv::Mat blob = cv::dnn::blobFromImages(imgBatch, 1.0f / 255.0f, cv::Size(), cv::Scalar(), swaprbchannels, false, CV_32F);
net.setInput(blob);
std::vector<cv::String> unconnectedOutLayerNames = net.getUnconnectedOutLayersNames();
std::vector<cv::Mat> outputs;
outputs.clear();
auto timeLoadModelPlusInference1 = std::chrono::high_resolution_clock::now();
net.forward(outputs, unconnectedOutLayerNames);
auto timeLoadModelPlusInference2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms_doubleTimeLoadModelPlusInference = timeLoadModelPlusInference2 - timeLoadModelPlusInference1;
std::cout << "Execution time (load model + inference): " << ms_doubleTimeLoadModelPlusInference.count() << std::endl; // in ms
auto time1 = std::chrono::high_resolution_clock::now();
try {
for (size_t i = 0; i < num_inferences; i++)
net.forward(outputs, unconnectedOutLayerNames);
}
catch (std::exception& ex)
{
std::cout << ex.what() << std::endl;
}
auto time2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms_double = time2 - time1;
std::cout << "Execution time inference only: " << ms_double.count() / num_inferences << std::endl; // in ms
std::cout << "Outputs Size: " << outputs[0].size[0] << "x" << outputs[0].size[1] << std::endl;
std::cout << "Outputs value: " << outputs[0] << std::endl;
}
Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)