Skip to content

DNN inference is using 3.5 time more memory for 4.8.0 when compared to 4.5.2 #24134

@ukoehler

Description

@ukoehler

System Information

OpenCV version 4.8.0 vs. 4.5.2 compiled from source
Operating System: Both Windows and Linux
Compiler GCC 11

Detailed description

During regression testing between version 4.8.0 and 4.5.2 most regression tests ran perfect. All results were the same, however the runtime with larger data sets increased dramatically. For smaller dataset I maybe see a slightly longer runtime. However, it turns out that swapping was the problem for larger datasets.

I used valgrind massif to measure memory usage for a singled out unit test with one of the networks and noticed that 4.5.2 used 553,8 MB of RAM while version 4.8.0 needed 1.89 GB of RAM for the same task.

Find the network data here:
https://drive.google.com/file/d/1hPetNOt76xze8cD3DriDuZira65IeKc4/view?usp=sharing
and
https://drive.google.com/file/d/1swyevb0xsQhKeFxHOz-G00Dl-mz2wVCD/view?usp=sharing

As requested in issue #23223 I provide the performance data (there was NO SWAPPING) for 4.8.0:

conv_0 Convolution 2.84259e+07
bn_0 BatchNorm 0
leaky_1 ReLU 0
conv_1 Convolution 6.85735e+07
bn_1 BatchNorm 0
leaky_2 ReLU 0
conv_2 Convolution 1.24519e+07
bn_2 BatchNorm 0
leaky_3 ReLU 0
conv_3 Convolution 2.511e+07
bn_3 BatchNorm 0
leaky_4 ReLU 0
shortcut_4 Eltwise 1.83846e+06
conv_5 Convolution 6.18066e+07
bn_5 BatchNorm 0
leaky_6 ReLU 0
conv_6 Convolution 8.75183e+06
bn_6 BatchNorm 0
leaky_7 ReLU 0
conv_7 Convolution 2.21703e+07
bn_7 BatchNorm 0
leaky_8 ReLU 0
shortcut_8 Eltwise 1.19246e+06
conv_9 Convolution 9.9281e+06
bn_9 BatchNorm 0
leaky_10 ReLU 0
conv_10 Convolution 2.12841e+07
bn_10 BatchNorm 0
leaky_11 ReLU 0
shortcut_11 Eltwise 1.27645e+06
conv_12 Convolution 6.04233e+07
bn_12 BatchNorm 0
leaky_13 ReLU 0
conv_13 Convolution 7.77107e+06
bn_13 BatchNorm 0
leaky_14 ReLU 0
conv_14 Convolution 2.67491e+07
bn_14 BatchNorm 0
leaky_15 ReLU 0
shortcut_15 Eltwise 927776
conv_16 Convolution 8.20111e+06
bn_16 BatchNorm 0
leaky_17 ReLU 0
conv_17 Convolution 2.45839e+07
bn_17 BatchNorm 0
leaky_18 ReLU 0
shortcut_18 Eltwise 905403
conv_19 Convolution 8.05668e+06
bn_19 BatchNorm 0
leaky_20 ReLU 0
conv_20 Convolution 2.45622e+07
bn_20 BatchNorm 0
leaky_21 ReLU 0
shortcut_21 Eltwise 876047
conv_22 Convolution 8.17928e+06
bn_22 BatchNorm 0
leaky_23 ReLU 0
conv_23 Convolution 2.47782e+07
bn_23 BatchNorm 0
leaky_24 ReLU 0
shortcut_24 Eltwise 869224
conv_25 Convolution 8.12968e+06
bn_25 BatchNorm 0
leaky_26 ReLU 0
conv_26 Convolution 2.407e+07
bn_26 BatchNorm 0
leaky_27 ReLU 0
shortcut_27 Eltwise 867241
conv_28 Convolution 9.44835e+06
bn_28 BatchNorm 0
leaky_29 ReLU 0
conv_29 Convolution 2.51462e+07
bn_29 BatchNorm 0
leaky_30 ReLU 0
shortcut_30 Eltwise 861329
conv_31 Convolution 8.20646e+06
bn_31 BatchNorm 0
leaky_32 ReLU 0
conv_32 Convolution 2.53283e+07
bn_32 BatchNorm 0
leaky_33 ReLU 0
shortcut_33 Eltwise 827745
conv_34 Convolution 8.03971e+06
bn_34 BatchNorm 0
leaky_35 ReLU 0
conv_35 Convolution 2.46168e+07
bn_35 BatchNorm 0
leaky_36 ReLU 0
shortcut_36 Eltwise 868062
conv_37 Convolution 6.40316e+07
bn_37 BatchNorm 0
leaky_38 ReLU 0
conv_38 Convolution 7.39178e+06
bn_38 BatchNorm 0
leaky_39 ReLU 0
conv_39 Convolution 5.90104e+07
bn_39 BatchNorm 0
leaky_40 ReLU 0
shortcut_40 Eltwise 627514
conv_41 Convolution 8.2697e+06
bn_41 BatchNorm 0
leaky_42 ReLU 0
conv_42 Convolution 5.19797e+07
bn_42 BatchNorm 0
leaky_43 ReLU 0
shortcut_43 Eltwise 626462
conv_44 Convolution 8.17896e+06
bn_44 BatchNorm 0
leaky_45 ReLU 0
conv_45 Convolution 5.59095e+07
bn_45 BatchNorm 0
leaky_46 ReLU 0
shortcut_46 Eltwise 638976
conv_47 Convolution 8.28036e+06
bn_47 BatchNorm 0
leaky_48 ReLU 0
conv_48 Convolution 5.31428e+07
bn_48 BatchNorm 0
leaky_49 ReLU 0
shortcut_49 Eltwise 628797
conv_50 Convolution 8.21851e+06
bn_50 BatchNorm 0
leaky_51 ReLU 0
conv_51 Convolution 5.88002e+07
bn_51 BatchNorm 0
leaky_52 ReLU 0
shortcut_52 Eltwise 678821
conv_53 Convolution 8.20721e+06
bn_53 BatchNorm 0
leaky_54 ReLU 0
conv_54 Convolution 5.25305e+07
bn_54 BatchNorm 0
leaky_55 ReLU 0
shortcut_55 Eltwise 643926
conv_56 Convolution 8.41081e+06
bn_56 BatchNorm 0
leaky_57 ReLU 0
conv_57 Convolution 5.20968e+07
bn_57 BatchNorm 0
leaky_58 ReLU 0
shortcut_58 Eltwise 637153
conv_59 Convolution 8.26404e+06
bn_59 BatchNorm 0
leaky_60 ReLU 0
conv_60 Convolution 6.02537e+07
bn_60 BatchNorm 0
leaky_61 ReLU 0
shortcut_61 Eltwise 620270
conv_62 Convolution 7.23844e+07
bn_62 BatchNorm 0
leaky_63 ReLU 0
conv_63 Convolution 9.8334e+06
bn_63 BatchNorm 0
leaky_64 ReLU 0
conv_64 Convolution 2.18966e+08
bn_64 BatchNorm 0
leaky_65 ReLU 0
shortcut_65 Eltwise 437773
conv_66 Convolution 9.76036e+06
bn_66 BatchNorm 0
leaky_67 ReLU 0
conv_67 Convolution 2.08698e+08
bn_67 BatchNorm 0
leaky_68 ReLU 0
shortcut_68 Eltwise 433034
conv_69 Convolution 9.82622e+06
bn_69 BatchNorm 0
leaky_70 ReLU 0
conv_70 Convolution 1.76859e+08
bn_70 BatchNorm 0
leaky_71 ReLU 0
shortcut_71 Eltwise 428305
conv_72 Convolution 9.81772e+06
bn_72 BatchNorm 0
leaky_73 ReLU 0
conv_73 Convolution 1.81768e+08
bn_73 BatchNorm 0
leaky_74 ReLU 0
shortcut_74 Eltwise 433105
conv_75 Convolution 1.01164e+07
bn_75 BatchNorm 0
leaky_76 ReLU 0
conv_76 Convolution 1.83666e+08
bn_76 BatchNorm 0
leaky_77 ReLU 0
conv_77 Convolution 9.73596e+06
bn_77 BatchNorm 0
leaky_78 ReLU 0
conv_78 Convolution 1.88721e+08
bn_78 BatchNorm 0
leaky_79 ReLU 0
conv_79 Convolution 9.82813e+06
bn_79 BatchNorm 0
leaky_80 ReLU 0
conv_80 Convolution 1.83166e+08
bn_80 BatchNorm 0
leaky_81 ReLU 0
conv_81 Convolution 5.09009e+06
permute_82 Permute 95973
yolo_82 Region 920492
identity_83 Identity 2254
conv_84 Convolution 2.71254e+06
bn_84 BatchNorm 0
leaky_85 ReLU 0
upsample_85 Resize 836693
concat_86 Concat 272749
conv_87 Convolution 1.14419e+07
bn_87 BatchNorm 0
leaky_88 ReLU 0
conv_88 Convolution 6.20935e+07
bn_88 BatchNorm 0
leaky_89 ReLU 0
conv_89 Convolution 8.19448e+06
bn_89 BatchNorm 0
leaky_90 ReLU 0
conv_90 Convolution 5.79426e+07
bn_90 BatchNorm 0
leaky_91 ReLU 0
conv_91 Convolution 7.96112e+06
bn_91 BatchNorm 0
leaky_92 ReLU 0
conv_92 Convolution 5.80056e+07
bn_92 BatchNorm 0
leaky_93 ReLU 0
conv_93 Convolution 7.66066e+06
permute_94 Permute 195382
yolo_94 Region 3.38031e+06
identity_95 Identity 2866
conv_96 Convolution 2.2153e+06
bn_96 BatchNorm 0
leaky_97 ReLU 0
upsample_97 Resize 739277
concat_98 Concat 548914
conv_99 Convolution 1.13213e+07
bn_99 BatchNorm 0
leaky_100 ReLU 0
conv_100 Convolution 2.55236e+07
bn_100 BatchNorm 0
leaky_101 ReLU 0
conv_101 Convolution 7.85209e+06
bn_101 BatchNorm 0
leaky_102 ReLU 0
conv_102 Convolution 2.45604e+07
bn_102 BatchNorm 0
leaky_103 ReLU 0
conv_103 Convolution 7.7966e+06
bn_103 BatchNorm 0
leaky_104 ReLU 0
conv_104 Convolution 2.83962e+07
bn_104 BatchNorm 0
leaky_105 ReLU 0
conv_105 Convolution 1.36612e+07
permute_106 Permute 1.00528e+06
yolo_106 Region 1.32232e+07

and 4.5.2

conv_0 Convolution 2.6347e+07
bn_0 BatchNorm 0
leaky_1 ReLU 0
conv_1 Convolution 7.16753e+07
bn_1 BatchNorm 0
leaky_2 ReLU 0
conv_2 Convolution 1.12384e+07
bn_2 BatchNorm 0
leaky_3 ReLU 0
conv_3 Convolution 7.2381e+07
bn_3 BatchNorm 0
leaky_4 ReLU 0
shortcut_4 Eltwise 3.77401e+06
conv_5 Convolution 6.44327e+07
bn_5 BatchNorm 0
leaky_6 ReLU 0
conv_6 Convolution 9.09992e+06
bn_6 BatchNorm 0
leaky_7 ReLU 0
conv_7 Convolution 6.48448e+07
bn_7 BatchNorm 0
leaky_8 ReLU 0
shortcut_8 Eltwise 1.94067e+06
conv_9 Convolution 9.30628e+06
bn_9 BatchNorm 0
leaky_10 ReLU 0
conv_10 Convolution 6.53347e+07
bn_10 BatchNorm 0
leaky_11 ReLU 0
shortcut_11 Eltwise 1.98969e+06
conv_12 Convolution 6.24834e+07
bn_12 BatchNorm 0
leaky_13 ReLU 0
conv_13 Convolution 7.92278e+06
bn_13 BatchNorm 0
leaky_14 ReLU 0
conv_14 Convolution 6.22588e+07
bn_14 BatchNorm 0
leaky_15 ReLU 0
shortcut_15 Eltwise 1.02496e+06
conv_16 Convolution 7.93264e+06
bn_16 BatchNorm 0
leaky_17 ReLU 0
conv_17 Convolution 6.22617e+07
bn_17 BatchNorm 0
leaky_18 ReLU 0
shortcut_18 Eltwise 1.05038e+06
conv_19 Convolution 7.85361e+06
bn_19 BatchNorm 0
leaky_20 ReLU 0
conv_20 Convolution 6.43121e+07
bn_20 BatchNorm 0
leaky_21 ReLU 0
shortcut_21 Eltwise 1.03558e+06
conv_22 Convolution 7.8001e+06
bn_22 BatchNorm 0
leaky_23 ReLU 0
conv_23 Convolution 6.2343e+07
bn_23 BatchNorm 0
leaky_24 ReLU 0
shortcut_24 Eltwise 1.06368e+06
conv_25 Convolution 8.00235e+06
bn_25 BatchNorm 0
leaky_26 ReLU 0
conv_26 Convolution 6.22367e+07
bn_26 BatchNorm 0
leaky_27 ReLU 0
shortcut_27 Eltwise 1.06686e+06
conv_28 Convolution 7.59016e+06
bn_28 BatchNorm 0
leaky_29 ReLU 0
conv_29 Convolution 6.22161e+07
bn_29 BatchNorm 0
leaky_30 ReLU 0
shortcut_30 Eltwise 1.0338e+06
conv_31 Convolution 7.76942e+06
bn_31 BatchNorm 0
leaky_32 ReLU 0
conv_32 Convolution 6.23441e+07
bn_32 BatchNorm 0
leaky_33 ReLU 0
shortcut_33 Eltwise 1.05267e+06
conv_34 Convolution 7.66518e+06
bn_34 BatchNorm 0
leaky_35 ReLU 0
conv_35 Convolution 6.19013e+07
bn_35 BatchNorm 0
leaky_36 ReLU 0
shortcut_36 Eltwise 1.04865e+06
conv_37 Convolution 6.41796e+07
bn_37 BatchNorm 0
leaky_38 ReLU 0
conv_38 Convolution 7.42708e+06
bn_38 BatchNorm 0
leaky_39 ReLU 0
conv_39 Convolution 6.40619e+07
bn_39 BatchNorm 0
leaky_40 ReLU 0
shortcut_40 Eltwise 624418
conv_41 Convolution 7.63641e+06
bn_41 BatchNorm 0
leaky_42 ReLU 0
conv_42 Convolution 6.43142e+07
bn_42 BatchNorm 0
leaky_43 ReLU 0
shortcut_43 Eltwise 652441
conv_44 Convolution 7.49428e+06
bn_44 BatchNorm 0
leaky_45 ReLU 0
conv_45 Convolution 6.42385e+07
bn_45 BatchNorm 0
leaky_46 ReLU 0
shortcut_46 Eltwise 631932
conv_47 Convolution 7.50422e+06
bn_47 BatchNorm 0
leaky_48 ReLU 0
conv_48 Convolution 6.4686e+07
bn_48 BatchNorm 0
leaky_49 ReLU 0
shortcut_49 Eltwise 597918
conv_50 Convolution 7.67551e+06
bn_50 BatchNorm 0
leaky_51 ReLU 0
conv_51 Convolution 6.40918e+07
bn_51 BatchNorm 0
leaky_52 ReLU 0
shortcut_52 Eltwise 652051
conv_53 Convolution 7.59911e+06
bn_53 BatchNorm 0
leaky_54 ReLU 0
conv_54 Convolution 6.41076e+07
bn_54 BatchNorm 0
leaky_55 ReLU 0
shortcut_55 Eltwise 640980
conv_56 Convolution 7.52332e+06
bn_56 BatchNorm 0
leaky_57 ReLU 0
conv_57 Convolution 6.55695e+07
bn_57 BatchNorm 0
leaky_58 ReLU 0
shortcut_58 Eltwise 714310
conv_59 Convolution 7.76392e+06
bn_59 BatchNorm 0
leaky_60 ReLU 0
conv_60 Convolution 6.40974e+07
bn_60 BatchNorm 0
leaky_61 ReLU 0
shortcut_61 Eltwise 631242
conv_62 Convolution 7.06249e+07
bn_62 BatchNorm 0
leaky_63 ReLU 0
conv_63 Convolution 8.10891e+06
bn_63 BatchNorm 0
leaky_64 ReLU 0
conv_64 Convolution 7.02548e+07
bn_64 BatchNorm 0
leaky_65 ReLU 0
shortcut_65 Eltwise 611314
conv_66 Convolution 1.01592e+07
bn_66 BatchNorm 0
leaky_67 ReLU 0
conv_67 Convolution 7.04279e+07
bn_67 BatchNorm 0
leaky_68 ReLU 0
shortcut_68 Eltwise 464534
conv_69 Convolution 8.1087e+06
bn_69 BatchNorm 0
leaky_70 ReLU 0
conv_70 Convolution 7.0373e+07
bn_70 BatchNorm 0
leaky_71 ReLU 0
shortcut_71 Eltwise 740880
conv_72 Convolution 1.54541e+07
bn_72 BatchNorm 0
leaky_73 ReLU 0
conv_73 Convolution 7.02675e+07
bn_73 BatchNorm 0
leaky_74 ReLU 0
shortcut_74 Eltwise 443173
conv_75 Convolution 8.13352e+06
bn_75 BatchNorm 0
leaky_76 ReLU 0
conv_76 Convolution 7.05095e+07
bn_76 BatchNorm 0
leaky_77 ReLU 0
conv_77 Convolution 8.21635e+06
bn_77 BatchNorm 0
leaky_78 ReLU 0
conv_78 Convolution 7.65501e+07
bn_78 BatchNorm 0
leaky_79 ReLU 0
conv_79 Convolution 1.01681e+07
bn_79 BatchNorm 0
leaky_80 ReLU 0
conv_80 Convolution 7.04393e+07
bn_80 BatchNorm 0
leaky_81 ReLU 0
conv_81 Convolution 4.17549e+06
permute_82 Permute 95582
yolo_82 Region 911274
identity_83 Identity 3156
conv_84 Convolution 2.1118e+06
bn_84 BatchNorm 0
leaky_85 ReLU 0
upsample_85 Resize 1.04758e+06
concat_86 Concat 279501
conv_87 Convolution 1.13795e+07
bn_87 BatchNorm 0
leaky_88 ReLU 0
conv_88 Convolution 6.51143e+07
bn_88 BatchNorm 0
leaky_89 ReLU 0
conv_89 Convolution 7.61145e+06
bn_89 BatchNorm 0
leaky_90 ReLU 0
conv_90 Convolution 6.43658e+07
bn_90 BatchNorm 0
leaky_91 ReLU 0
conv_91 Convolution 7.65415e+06
bn_91 BatchNorm 0
leaky_92 ReLU 0
conv_92 Convolution 6.39612e+07
bn_92 BatchNorm 0
leaky_93 ReLU 0
conv_93 Convolution 7.23372e+06
permute_94 Permute 266236
yolo_94 Region 3.42418e+06
identity_95 Identity 3046
conv_96 Convolution 2.09349e+06
bn_96 BatchNorm 0
leaky_97 ReLU 0
upsample_97 Resize 766659
concat_98 Concat 398589
conv_99 Convolution 1.33242e+07
bn_99 BatchNorm 0
leaky_100 ReLU 0
conv_100 Convolution 6.18128e+07
bn_100 BatchNorm 0
leaky_101 ReLU 0
conv_101 Convolution 7.52835e+06
bn_101 BatchNorm 0
leaky_102 ReLU 0
conv_102 Convolution 6.18159e+07
bn_102 BatchNorm 0
leaky_103 ReLU 0
conv_103 Convolution 7.52287e+06
bn_103 BatchNorm 0
leaky_104 ReLU 0
conv_104 Convolution 6.1976e+07
bn_104 BatchNorm 0
leaky_105 ReLU 0
conv_105 Convolution 1.44498e+07
permute_106 Permute 877501
yolo_106 Region 1.30326e+07

Steps to reproduce

        std::string fileName = (*iter);
        cv::Mat image = imreadpng( fileName, cv::IMREAD_UNCHANGED );
        cv::Mat blob = cv::dnn::blobFromImage(image, 1/255.0, cv::Size(416, 416), cv::Scalar(0, 0, 0), true, false);
        net.setInput(blob);
        std::vector<cv::Mat> outs;
        net.forward(outs, outputnames);

        std::vector<double> timings;
        net.getPerfProfile(timings);
        std::vector<std::string> names = net.getLayerNames();
        CV_Assert(names.size() == timings.size());
        for (int i = 0; i < names.size(); ++i) {
            cv::Ptr<cv::dnn::Layer> l = net.getLayer(net.getLayerId(names[i]));
            std::cout << names[i] << " " << l->type << " " << timings[i] << std::endl;
        }

Issue submission checklist

  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
  • I updated to the latest OpenCV version and the issue is still there
  • There is reproducer code and related data files (videos, images, onnx, etc)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions