Added DNN Darknet Yolo v2 for object detection by AlexeyAB · Pull Request #9705 · opencv/opencv

AlexeyAB · 2017-09-24T20:59:34Z

opencv_extra=dnn_model_darknet_yolo_v2

This pullrequest changes

Added neural network Darknet Yolo v2 for object detection: https://pjreddie.com/darknet/yolo/
Added example of usage: yolo_object_detection.cpp / example_dnn-yolo_object_detection.exe

Supported networks: yolo.cfg, yolo-voc.cfg, tiny-yolo.cfg, tiny-yolo-voc.cfg - can be downloaded: https://drive.google.com/drive/folders/0BwRgzHpNbsWBN3JtSjBocng5YW8
Unsupported networks: yolo9000.cfg

Supported layers:

route (as concat-layer)
reorg (as an addition to the reshape-layer)
maxpool
convolutional (conv+bn+relu)
region (detection_out) - added layer

Merge with extra: opencv/opencv_extra#385

Comparison of use:

original Darknet-Yolo-v2: darknet.exe detector test data/voc.data yolo-voc.cfg yolo-voc.weights -i 0 -thresh 0.24 data/dog.jpg
OpenCV Yolo example: example_dnn-yolo_object_detection.exe -cfg=yolo/yolo.cfg -model=yolo/yolo.weights -image=yolo/dog.jpg -min_confidence=0.24

Comparison of results OpenCV-example vs original Darknet: https://github.com/pjreddie/darknet

For cfg, weights and jpg-s from: https://drive.google.com/drive/folders/0BwRgzHpNbsWBN3JtSjBocng5YW8

Network resolution: 416 x 416
threshold = 0.24
nms-threshold = 0.4

yolo.cfg & yolo.weights
- dog.jpg using yolo.cfg
- eagle.jpg using yolo.cfg
- giraffe.jpg using yolo.cfg

yolo-voc.cfg & yolo-voc.weights
- dog.jpg using yolo-voc.cfg
- eagle.jpg using yolo-voc.cfg
- giraffe.jpg using yolo-voc.cfg

How to train (to detect your custom objects): https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Accuracy-speed:

vpisarev · 2017-09-25T10:24:52Z

@AlexeyAB, thank you, this is very valuable contribution! Could you please add some regression test(s) for this functionality?

AlexeyAB · 2017-09-25T23:08:18Z

@vpisarev I added: modules/dnn/test/test_darknet_importer.cpp
Also added test data and models (cfg, weights) for DNN Darknet Yolo v2: opencv/opencv_extra#385

dkurt · 2017-09-26T08:48:49Z

modules/dnn/src/darknet/darknet_io.cpp

+        for (it_type i = net->layers_cfg.begin(); i != net->layers_cfg.end(); ++i) {
+            ++layers_counter;
+            std::map<std::string, std::string> &layer_params = i->second;
+            std::string layer_type = layer_params["type"];


Please, add an assertion for unknown layer types to prevent unexpected errors. In example, I can't read any model now because an every layer_type ends with ] character (convolutional], maxpool]). (Ubuntu OS).

It works now but Reproducibility_TinyYoloVoc and Reproducibility_YoloVoc tests are failed for me. Do them passed locally?

dkurt · 2017-09-26T08:49:17Z

modules/dnn/include/opencv2/dnn/dnn.hpp

+    *  @param darknetModel path to the .weights file with learned network.
+    *  @returns Pointer to the created importer, NULL in failure cases.
+    */
+    CV_EXPORTS_W Ptr<Importer> createDarknetImporter(const String &cfgFile, const String &darknetModel = String());


We defined methods like createCaffeImporter as deprecated. Please keep only readNetFromDarknet.

dkurt · 2017-09-26T08:51:59Z

samples/dnn/yolo_object_detection.cpp

+
+    cv::Mat frame = cv::imread(parser.get<string>("image"), -1);
+
+    if (frame.channels() == 4)


It isn't necessary: just use imread with default argument (http://docs.opencv.org/master/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56).

@AlexeyAB, thanks, but I meant that cv::imread can read images with alpha into 24bit, http://docs.opencv.org/master/d4/da8/group__imgcodecs.html#gga61d9b0126a3e57d9277ac48327799c80af660544735200cbe942eea09232eb822.

@dkurt I fixed it.
Initially I did it as in ssd_object_detection.cpp and I thought that this has some hidden meaning :)

dkurt · 2017-09-26T08:55:38Z

samples/dnn/yolo_object_detection.cpp

+    if (frame.channels() == 4)
+        cvtColor(frame, frame, cv::COLOR_BGRA2BGR);
+    //! [Prepare blob]
+    Mat preprocessedFrame = preprocess(frame, network_width, network_height);


Please, use blobFromImage's arguments to make preprocessing (http://docs.opencv.org/3.3.0/d6/d0f/group__dnn.html#ga0507466a789702eda8ffcdfa37f4d194).

dkurt · 2017-09-27T07:34:23Z

modules/dnn/src/layers/concat_layer.cpp

-        return false;
+
+        // Darknet ROUTE-layer
+        if (useRoute) return true;


Is there some difference between Route layer and Concat? getMemoryShapes returns true if layer can work in-place (all element-wise layers).

I don't know why, but it doesn't work for Yolo if getMemoryShapes returns false.
Route-layer simply copies unchanged outputs from several layers: https://github.com/pjreddie/darknet/blob/master/src/route_layer.c#L83

Used copy_cpu() with INCX=1 and INCY=1: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L208

@AlexeyAB, it seems to me problem is in route layer with a single input (that means problem is in current concat layer with #inputs == 1), https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg#L208. Is it used like an identity layer, right?

@dkurt Yes, route with a single input (bottom layer) is used as identity layer.

@AlexeyAB, could you add an extra branch during route layer creation: add Concat layer if number of inputs more than 1 or Identity layer otherwise?

@dkurt I added identity layer for the 1 input. But why concat layer can't work with 1 input, and why there is no CV_Assert for this case?

dkurt · 2017-09-27T08:02:34Z

modules/dnn/src/darknet/darknet_io.cpp

+
+                setParams.setConcat(layers_vec.size(), layers_vec.data());
+            }
+            else if (layer_type == "reorg")


I'm a bit confused about reorg layer. Let the input is:

channel_0 channel_1 channel_2 channel_3 0 1 4 5 8 9 c d 2 3 6 7 a b e f

and reorgStride = 2. So an output shape is 4x4x1 and values are:

output 1 4 1 5 8 c 9 d 2 6 3 7 a e b f

?

I left unchanged a bit strange original implementation of this layer. It increases the field of view of each final activation.
Reshape: 26 x 26 x 64 -> 13 x 13 x 256

For stride = 2

input 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f

output channel_0 channel_1 channel_2 channel_3 0 2 1 3 4 6 5 7 8 a 9 b c e d f

OpenCV C++ example: http://coliru.stacked-crooked.com/a/eb13942be083fa3d

Darknet C++ example: http://coliru.stacked-crooked.com/a/225d7fb1f25b286c

Param reverse usually absent in the cfg-file of model, so reverse = 0 by default https://github.com/pjreddie/darknet/blob/master/src/parser.c#L387

Then l.reverse = reverse; https://github.com/pjreddie/darknet/blob/master/src/reorg_layer.c#L28

So the function reorg() is called with forward=0: https://github.com/pjreddie/darknet/blob/8215a8864d4ad07e058acafd75b2c6ff6600b9e8/src/reorg_layer.c#L108

reorg_cpu()-implementation uses out[in_index] = x[out_index];: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L25

Thanks! Anyway I suggest to replace it as a single layer or think how we can do the same transformations using existing ones (Permute, Reshape). Reshape layer doesn't change the data by definition neither in one of the frameworks.

@dkurt I added reorg as separate layer reorg_layer.cpp

dkurt · 2017-09-27T08:03:02Z

modules/dnn/src/darknet/darknet_io.cpp

+
+                setParams.setReshape(stride, current_shape.input_channels, current_shape.input_h, current_shape.input_w);
+
+                current_shape.input_channels = 256;


Magic number?

dkurt · 2017-09-27T08:42:34Z

@AlexeyAB, Thank you for the valuable contribution! We need to test all the new usage carefully. Can you add some unit tests with small few-layer networks like we do for the other importers? (https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_torch_importer.cpp and https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/torch/torch_gen_test_data.lua, https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_tf_importer.cpp and https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/tensorflow/generate_tf_models.py). In example, write simple configs, run darknet to initialize the weights, pass some random input and get the output, put configs/weights/inputs/outputs into the opencv_extra/testdata/dnn darknet subfolder?

AlexeyAB · 2017-09-27T11:48:02Z

@dkurt So I already added testdata and models for object detection using DNN Darknet Yolo v2 to the opencv_extra: opencv/opencv_extra#385

testdata/dnn/dogr.jpg - test image resized to the network size 416x416, to eliminate the side effects of resizing
testdata/dnn/tiny-yolo-voc.cfg - tiny model of Yolo v2 for Pascal VOC dataset
testdata/dnn/yolo-voc.cfg - full model of Yolo v2 for Pascal VOC dataset
Changed testdata/dnn/download_models.py:
- Downloads https://pjreddie.com/media/files/yolo-voc.weights - full model of Yolo v2 trained for Pascal VOC dataset
- Downloads https://pjreddie.com/media/files/tiny-yolo-voc.weights - tiny model of Yolo v2 trained for Pascal VOC dataset

In this pull-request there is: modules/dnn/test/test_darknet_importer.cpp

TEST(Test_Darknet, read_tiny_yolo_voc)
TEST(Test_Darknet, read_yolo_voc)
TEST(Reproducibility_TinyYoloVoc, Accuracy) - results compared with original Darknet results: https://github.com/opencv/opencv/pull/9705/files#diff-9360334ee3b0098a7deae1fe9d8ba25dR106
TEST(Reproducibility_YoloVoc, Accuracy) - results compared with original Darknet results: https://github.com/opencv/opencv/pull/9705/files#diff-9360334ee3b0098a7deae1fe9d8ba25dR158

dkurt · 2017-09-27T12:11:15Z

@AlexeyAB, yeah, it's great, but I meant tests for separate layers. First of all it's necessary to protect your work done from the bugs that might appear in future development. Next thing is that BuildBot doesn't test these models for now because they aren't there. My local tests are failed and I think we can solve a problem by small checks for separate layers.

[----------] 1 test from Reproducibility_TinyYoloVoc
[ RUN      ] Reproducibility_TinyYoloVoc.Accuracy
unknown file: Failure
C++ exception with description "/home/dkurtaev/opencv/modules/ts/src/ts_func.cpp:1374: error: (-215) src1.type() == src2.type() && src1.size == src2.size in function norm
" thrown in the test body.
[  FAILED  ] Reproducibility_TinyYoloVoc.Accuracy (101 ms)
[----------] 1 test from Reproducibility_TinyYoloVoc (101 ms total)

[----------] 1 test from Reproducibility_YoloVoc
[ RUN      ] Reproducibility_YoloVoc.Accuracy
/home/dkurtaev/opencv/modules/dnn/test/test_common.hpp:54: Failure
Expected: (normL1) <= (l1), actual: 0.000232658 vs 1e-05
/home/dkurtaev/opencv/modules/dnn/test/test_common.hpp:57: Failure
Expected: (normInf) <= (lInf), actual: 0.00485086 vs 0.0001
[  FAILED  ] Reproducibility_YoloVoc.Accuracy (317 ms)
[----------] 1 test from Reproducibility_YoloVoc (317 ms total)

I referenced how we write unit tests for different frameworks. The binary size of required data is not so huge (i.e. less than 0.5MB for TensorFlow layers) and you can add it in a single PR @ opencv_extra.

AlexeyAB · 2017-09-28T23:07:17Z

@dkurt

I replaced the test image from dogr.jpg to the lossless dog416.png in the opencv_extra and now it works.
I added tests for layers: Region and Reorg. Added to the opencv_extra: region.cfg, region.npy, region.input.npy, reorg.cfg, reorg.npy, reorg.input.npy.

All tests passed on both: Windows 7 x64 and Linux Debian 8.2 x64

[----------] 2 tests from Test_Darknet
[ RUN      ] Test_Darknet.read_tiny_yolo_voc
[       OK ] Test_Darknet.read_tiny_yolo_voc (0 ms)
[ RUN      ] Test_Darknet.read_yolo_voc
[       OK ] Test_Darknet.read_yolo_voc (1 ms)
[----------] 2 tests from Test_Darknet (3 ms total)

[----------] 1 test from Reproducibility_TinyYoloVoc
[ RUN      ] Reproducibility_TinyYoloVoc.Accuracy
[       OK ] Reproducibility_TinyYoloVoc.Accuracy (134 ms)
[----------] 1 test from Reproducibility_TinyYoloVoc (135 ms total)

[----------] 1 test from Reproducibility_YoloVoc
[ RUN      ] Reproducibility_YoloVoc.Accuracy
[       OK ] Reproducibility_YoloVoc.Accuracy (475 ms)
[----------] 1 test from Reproducibility_YoloVoc (475 ms total)
...
[----------] 1 test from Layer_Test_Region
[ RUN      ] Layer_Test_Region.Accuracy
[       OK ] Layer_Test_Region.Accuracy (2 ms)
[----------] 1 test from Layer_Test_Region (3 ms total)

[----------] 1 test from Layer_Test_Reorg
[ RUN      ] Layer_Test_Reorg.Accuracy
[       OK ] Layer_Test_Reorg.Accuracy (0 ms)
[----------] 1 test from Layer_Test_Reorg (1 ms total)

Results for comparison with OpenCV version are obtained on Linux Debian 8.2 using the current last commit of Darknet Yolo v2 compiled with GPU=0, OPENMP=1 and OpenCV=1: https://github.com/pjreddie/darknet

Using commands:

./darknet detector test ./cfg/voc.data ./cfg/tiny-yolo-voc.cfg ./tiny-yolo-voc.weights -thresh 0.24 ./dog416.png
./darknet detector test ./cfg/voc.data ./cfg/yolo-voc.cfg ./yolo-voc.weights -thresh 0.24 ./dog416.png

dkurt · 2017-09-30T12:53:28Z

modules/dnn/src/darknet/darknet_io.cpp

+        }
+        net->transpose = (net->major_ver > 1000) || (net->minor_ver > 1000);
+
+        layerShape current_shape;


Why we track shapes? Doesn't weights file contain kernels shapes?

Yes, weights file doesn't contain kernels shapes.
Also Darknet tracks layers shapes while parsing a cfg-file:

parse_network_cfg() https://github.com/pjreddie/darknet/blob/master/src/parser.c#L630

parse_convolutional() https://github.com/pjreddie/darknet/blob/master/src/parser.c#L169

make_convolutional_layer() https://github.com/pjreddie/darknet/blob/master/src/convolutional_layer.c#L166

convolutional_out_height() https://github.com/pjreddie/darknet/blob/master/src/convolutional_layer.c#L66

int convolutional_out_height(convolutional_layer l) { return (l.h + 2*l.pad - l.size) / l.stride + 1; }

@AlexeyAB, May be we can remove at least width/height sizes tracking? As I can see only current_shape.input_channels is used to read convolutional layer weights.

dkurt · 2017-09-30T13:12:09Z

modules/dnn/src/darknet/darknet_io.cpp

+        ifile.open(darknetModel, std::ios::binary);
+        CV_Assert(ifile.is_open());
+
+        ifile.read(reinterpret_cast<char *>(&net->major_ver), sizeof(int32_t));


Version numbers are used only to decide how many bytes to skip for seen value. transpose aren't used at all. Please make all unused variables from NetParameter are local.

dkurt · 2017-09-30T13:47:27Z

modules/dnn/src/layers/concat_layer.cpp

-        return false;
+
+        // Darknet ROUTE-layer
+        if (useRoute) return true;


@AlexeyAB, it seems to me problem is in route layer with a single input (that means problem is in current concat layer with #inputs == 1), https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg#L208. Is it used like an identity layer, right?

AlexeyAB · 2017-09-30T16:11:59Z

@dkurt Yes, route layer with a single input (bottom layer) is used like an identity layer.

dkurt · 2017-09-30T15:00:13Z

modules/dnn/src/darknet/darknet_io.cpp

+        void setMaxpool(size_t kernel, size_t pad, size_t stride, size_t channels_num)
+        {
+            cv::dnn::experimental_dnn_v1::LayerParams maxpool_param;
+            maxpool_param.set<cv::String>("pool", "max");


Please, setup only actual parameters: "pool", "kernel_size", "pad", "stride".

Ok. Also required maxpool_param.set<cv::String>("pad_mode", "SAME"); for odd sizes of layers.

However only one way of padding strategy is used: manual values or padMode ("SAME", "VALID") from TensorFlow. Please take a look on the "ceil_mode" flag instead: https://github.com/opencv/opencv/blob/master/modules/dnn/src/layers/pooling_layer.cpp#L629.

Accuracy test passed for Tiny-Yolo if padMode="SAME" with any ceil_mode value.

Accuracy test can't be passed for Tiny-Yolo for any values of padMode (padMode="VALID" or if padMod isn't set) with any ceil_mode value.

dkurt · 2017-09-30T16:17:56Z

modules/dnn/src/layers/reorg_layer.cpp

+                        int w2 = i*reorgStride + offset % reorgStride;
+                        int h2 = j*reorgStride + offset / reorgStride;
+                        int out_index = w2 + width*reorgStride*(h2 + height*reorgStride*c2);
+                        dstData[in_index] = srcData[out_index];


Is there no typo in in<->out indices place?

No, there is no typo, initially I left unchanged a bit strange original implementation of this layer.
But now I have changed this place so that there is no confusion.

dkurt · 2017-09-30T16:20:13Z

modules/dnn/src/layers/reorg_layer.cpp

+
+        CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);
+
+        return true;


It seems to me Reorg layer can't work in-place. getMemoryShapes returns true if layer can do it.

dkurt · 2017-09-30T16:21:18Z

modules/dnn/src/layers/region_layer.cpp

+    {
+        CV_Assert(inputs.size() > 0);
+        outputs = std::vector<MatShape>(inputs.size(), shape(inputs[0][1] * inputs[0][2] * anchors, inputs[0][3] / anchors));
+        return true;


The same as Reorg layer: it should returns false.

dkurt · 2017-09-30T16:33:58Z

modules/dnn/src/darknet/darknet_io.cpp

+
+            darknet::LayerParameter lp;
+            std::string layer_name = toString(layer_id);
+            if (use_batch_normalize || use_relu) layer_name = "conv_" + layer_name;


It's better to name layers using type prefix every time. Moreover some of layers just named with numbers and it's hard to debug them.

dkurt · 2017-10-03T06:12:28Z

modules/dnn/src/darknet/darknet_io.cpp

+        }
+
+        cv::dnn::experimental_dnn_v1::LayerParams getParamConvolution(int kernel, int pad,
+            int stride, int filters_num, int channels_num)


Unused variable channels_num

dkurt · 2017-10-03T06:12:44Z

modules/dnn/src/darknet/darknet_io.cpp

+            fused_layer_names.push_back(last_layer);
+        }
+
+        void setMaxpool(size_t kernel, size_t pad, size_t stride, size_t channels_num)


Unused variable channels_num

dkurt · 2017-10-04T12:34:39Z

modules/dnn/src/darknet/darknet_io.hpp

+        std::string top(const int index) const { return layer_name; }
+    };
+
+    struct layerShape {


Unused structure

dkurt · 2017-10-04T12:34:58Z

modules/dnn/src/darknet/darknet_io.cpp

+            params.blobs = blobs;
+        }
+
+        void setLastLayerName(std::string layer_name)


Unused function

dkurt · 2017-10-06T08:40:48Z

modules/dnn/src/layers/reorg_layer.cpp

+            inputs[0][3] / reorgStride));
+
+        CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);
+


Please, add an assertion that total(outputs[0]) == total(inputs[0]).

dkurt · 2017-10-06T08:41:58Z

modules/dnn/src/layers/reorg_layer.cpp

+
+            int out_c = channels / (reorgStride*reorgStride);
+
+            for (int k = 0; k < channels; ++k) {


Please make it more clear: iterate over output dimensions and map them to input ones.

Most likely there was made a logical mistake in the original version of Darknet in the reorg layer: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L9

It works as I described if called reorg(input, output, out_w, out_h, _out_c, ...); #9705 (comment)

But in the original Darknet version function is called in this way reorg(input, output, in_w, in_h, in_c, ...);, so the one-to-one correspondence of the input and output parameters is preserved, but very strange permutations occur.

But because original Darknet works with this implementation of reorg and all models trained using it, then we can't fix this logical mistake.

Why the author has not found and corrected this error? I think:

Perhaps this logical error does not spoil the detection accuracy, so it was not detected.

Theoretically, we can assume that this error even increased the accuracy - so the author found it and left it.

iterate over output dimensions and map them to input ones.

So I can implement it in a such way,
but it works correctly only if (in_w % 2 == 0 && in_h % 2 == 0 && in_c % 4 == 0) : http://coliru.stacked-crooked.com/a/b962d20938362d4f

void reorg_my(const float*const srcData, float *const dstData, int width, int height, int channels, int reorgStride) { int outChannels = channels * reorgStride * reorgStride; int outHeight = height / reorgStride; int outWidth = width / reorgStride; for (int y = 0; y < outHeight; ++y) { for (int x = 0; x < outWidth; ++x) { for (int c = 0; c < outChannels; ++c) { int out_index = x + outWidth*(y + outHeight*c); int step = c / channels; int x_offset = step % reorgStride; int y_offset = reorgStride * ((step / reorgStride) % reorgStride); int in_x = x * reorgStride + x_offset; int out_seq_y = y + c*outHeight; int in_intermediate_y = out_seq_y*2 - out_seq_y%2; in_intermediate_y = in_intermediate_y % (channels*height); int in_c = in_intermediate_y / height; int in_y = in_intermediate_y % height + y_offset; int in_index = in_x + width*(in_y + height*in_c); dstData[out_index] = srcData[in_index]; } } } }

Most likely there was made a logical mistake in the original version of Darknet in the reorg layer: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L9

It works as I described if called reorg(input, output, out_w, out_h, _out_c, ...); #9705 (comment)

But in the original Darknet version function is called in this way reorg(input, output, in_w, in_h, in_c, ...);, so the one-to-one correspondence of the input and output parameters is preserved, but very strange permutations occur.

But because original Darknet works with this implementation of reorg and all models trained using it, then we can't fix this logical mistake.

Why the author has not found and corrected this error? I think:

* Perhaps this logical error does not spoil the detection accuracy, so it was not detected. * Theoretically, we can assume that this error even increased the accuracy - so the author found it and left it.

iterate over output dimensions and map them to input ones.

So I can implement it in a such way,
but it works correctly only if (in_w % 2 == 0 && in_h % 2 == 0 && in_c % 4 == 0) : http://coliru.stacked-crooked.com/a/b962d20938362d4f

void reorg_my(const float*const srcData, float *const dstData, int width, int height, int channels, int reorgStride) { int outChannels = channels * reorgStride * reorgStride; int outHeight = height / reorgStride; int outWidth = width / reorgStride; for (int y = 0; y < outHeight; ++y) { for (int x = 0; x < outWidth; ++x) { for (int c = 0; c < outChannels; ++c) { int out_index = x + outWidth*(y + outHeight*c); int step = c / channels; int x_offset = step % reorgStride; int y_offset = reorgStride * ((step / reorgStride) % reorgStride); int in_x = x * reorgStride + x_offset; int out_seq_y = y + c*outHeight; int in_intermediate_y = out_seq_y*2 - out_seq_y%2; in_intermediate_y = in_intermediate_y % (channels*height); int in_c = in_intermediate_y / height; int in_y = in_intermediate_y % height + y_offset; int in_index = in_x + width*(in_y + height*in_c); dstData[out_index] = srcData[in_index]; } } } }

Is there a GPU version of reorg？

dkurt · 2017-10-06T08:45:23Z

modules/dnn/test/test_darknet_importer.cpp

+    const float confidenceThreshold = 0.24;
+
+    for (int i = 0; i < out.rows; i++) {
+        float const*const prob_ptr = &out.at<float>(i, 5);


float const*const is a bit confusing (there are 4 places with it).

May I ask you to use named constant variables or place a comments because it's hard to understand the magic numbers? Especially it's about samples and tests.

I removed const*const, also added named constant variables and described the format of network output that compared to the reference in the tests.

But why is const*const confusing, is this contrary to the code style conventions that is accepted in OpenCV?
1st const forbids modification of values pointed to by this pointer, 2nd const forbids modification of this pointer.

dkurt · 2017-10-06T08:53:53Z

modules/dnn/src/darknet/darknet_io.cpp

+                getParamConvolution(kernel, pad, stride, filters_num);
+
+            darknet::LayerParameter lp;
+            std::string layer_name = "conv_" + toString(layer_id);


Please try to use cv::format("conv_%d", layer_id) instead toString here and in other places.

dkurt · 2017-10-06T08:59:18Z

modules/dnn/src/darknet/darknet_io.hpp

+
+namespace darknet {
+
+    class LayerParameter {


I hope we can emit this structure. Layers are connected sequentially or using explicit numeric offsets starting from the newly added layer. So I think it's possible to use single vector of layers during network building. May I ask you to try it?

Do you mean that I should to try use cv::dnn::experimental_dnn_v1::LayerParams instead of darknet::LayerParameter?

Yeah, I think we can just parse text and binary files simultaneously: for an every entry in config we create a new one LayerParams and fill it depends on layer type. If specific layer has weights - read them from opened binary file. Then add a layer to final network (addLayerToPrev or addLayer with multiple connections based on id of the new layer and offsets i.e. -1, -4 of route).

@AlexeyAB, on the other hand, let's keep it as is now. I'll just install darknet and compare it with PR and we can merge it.

adamhrv · 2017-11-30T16:22:02Z

Will there be a Python example for CV2 Darknet DNN?

When running readNetFromDarknet --> net.forward() in Python, the YoloV2 yolo-voc.cfg and yolo-voc.weights predictions result doesn't seem to provide any detection info.

This works OK:

net = cv2.dnn.readNetFromDarknet(path_to_prototxt, path_to_model)

This seems to work OK:

imw,imh = (416,416)
blob = cv2.dnn.blobFromImage(cv2.resize(im, (416, 416)))  
net.setInput(blob)
detections = net.forward()

But the detection result doesn't make sense:

print('Detections len: {}'.format(len(detections)))
Detections len: 845

print('Detections 0: {}'.format(detections[0]))
Detections 0: [  5.25983423e-02   2.70044785e-02   9.04742535e-03   2.62199971e-03
   2.96895425e-10   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00]

Referring to yolo_object_detection.cpp

Thanks for your work porting to opencv.

fabito · 2018-01-27T02:43:58Z

This repo shows a custom face detection model demo using python. The models were trained on the Widerface dataset and are available for download (weights and cfgs).

cansik · 2018-04-12T17:14:50Z

It seems that sometimes the implementation does not return the correct detection class type. Even with the default pre-trained 80 class model, sometimes the kind of class is unknown (only zeros in the 5 to 85 column).

Also when I train my own model with two different classes, it only tells me the confidence, but not the class itself. Here is the dump of the result matrix (2 class model, every class is always zero):

[0.0401995, 0.031487927, 0.031374719, 0.056794502, 0.0012705148, 0, 0;
 0.04043559, 0.049086317, 0.22208165, 0.25339442, 0.00040254797, 0, 0;
 0.037753381, 0.036353372, 0.60338461, 0.96344143, 0.00046182834, 0, 0;
 0.034598812, 0.039104842, 0.82008636, 0.50834161, 0.0003691405, 0, 0;
 0.038062122, 0.038740713, 1.2965844, 0.84108508, 0.00030294483, 0, 0;
 0.11062718, 0.023617726, 0.083281159, 0.038289066, 0.00061990606, 0, 0;
 0.11572818, 0.046723619, 0.24326266, 0.19994149, 0.00070420129, 0, 0;
 0.11441542, 0.03014034, 0.48347056, 0.84074426, 0.00024233655, 0, 0;
 0.11178426, 0.035311423, 0.71218336, 0.46975818, 0.00027817892, 0, 0;
 0.11641605, 0.038460143, 1.1835779, 0.68950963, 0.00037787345, 0, 0;
 0.19388326, 0.02510196, 0.091353334, 0.028100489, 0.00042861109, 0, 0;
 0.19292459, 0.048351433, 0.22517532, 0.21802522, 0.00040042991, 0, 0;
 0.18578193, 0.031620376, 0.63702989, 0.94969654, 0.0002352587, 0, 0;
 0.18794204, 0.038929399, 0.74781221, 0.44536978, 0.00019949637, 0, 0;
 0.19130239, 0.03961565, 0.9100669, 0.58336502, 0.00025648749, 0, 0;
 0.27135891, 0.02685423, 0.090977632, 0.032328393, 0.00058053283, 0, 0;
 0.27235663, 0.050213691, 0.25882462, 0.26637048, 0.00038077682, 0, 0;
 0.25689453, 0.0360987, 0.69405353, 0.82805032, 0.00038927642, 0, 0;
 0.25953323, 0.045636557, 0.58337092, 0.38006222, 0.00038867182, 0, 0;
 0.2683081, 0.042552318, 0.89848542, 0.72007042, 0.00038684186, 0, 0;
...

When using the darkent to detect the objects, it is always able to tell what kind of object it is (with the confidence), even if the confidence is very low. Do you experience the same behaviour?

AlexeyAB · 2018-04-12T18:01:54Z

@cansik The fact is that for values that less than threshold: Darknet zeroes the scale, but OpenCV zeroes the prob. And Darknet zeroes prob only if scale isn't zeroed.

You can get the same bounded boxes with the same probability in both OpenCV and Darknet (but not the same probs which less than threshold) only with:

1 year old Darknet
default Yolo v2 model
the same threshold
the same nms threshold
on image whose size is equal to the size of network (416x416) due to different resize approaches: Resizing : keeping aspect ratio, or not AlexeyAB/darknet#232 (comment)

Note: if in the original Darknet scale < thresh, then objectness=0 and prob will not be zeroed even if prob < thresh. But in OpenCV-dnn prob will be zeroed if prob < thresh.

What thresh did you use for Darknet and for OpenCV-dnn-yolo?

OpenCV-dnn-yolo:

opencv/modules/dnn/src/layers/region_layer.cpp

Lines 258 to 261 in 7ae83df

    
           for (int j = 0; j < classes; ++j) { 
        
               float prob = scale*dstData[class_index + j];	// prob = IoU(box, object) = t0 * class-probability 
        
               dstData[class_index + j] = (prob > thresh) ? prob : 0;		// if (IoU < threshold) IoU = 0; 
        
           }

Darknet:
- https://github.com/pjreddie/darknet/blob/508381b37fe75e0e1a01bcb2941cb0b31eb0e4c9/src/region_layer.c#L404
- https://github.com/pjreddie/darknet/blob/508381b37fe75e0e1a01bcb2941cb0b31eb0e4c9/src/region_layer.c#L426-L432

Note: if in the original Darknet scale < thresh, then objectness=0 and prob will not be zeroed even if prob < thresh. But in OpenCV-dnn prob will be zeroed if prob < thresh.

dets[index].objectness = scale > thresh ? scale : 0;

                if(dets[index].objectness){
                    for(j = 0; j < l.classes; ++j){
                        int class_index = entry_index(l, 0, n*l.w*l.h + i, l.coords + 1 + j);
                        float prob = scale*predictions[class_index];
                        dets[index].prob[j] = (prob > thresh) ? prob : 0;
                    }
                }

cansik · 2018-04-19T15:14:16Z

@AlexeyAB As far as I know, the probability gets cleared by opencv. That is ok for me if the confidence is under the threshold.

How do I set the threshold in opencv? Is it possible to lower it zero, to get the probabilities of all predictions? Or do you mean with threshold the threshold defined in the cfg file?

Here is an example which is really strange:

On this image the trained network finds three characters (in opencv). All of them have a confidence higher than 80%, but the probability for these classes is zeroed. Do you know why this happens?

Class: none, Confidence: 0.8652
Class: none, Confidence: 0.8734
Class: none, Confidence: 0.8448

Second example:

For this picture I have exported the result matrix: yolo_results.sheets

Why are there only three probabilities, and not more for each item? Or do I understand the result matrix wrongly?

dkurt · 2018-04-19T15:30:00Z

@cansik, Have you tried to vary thresh parameter of [region] layer in .cfg file? You may set it zero and threshold detections with low confidence.

cansik · 2018-04-19T15:47:55Z

@dkurt Yes that helped, thank you. I was not sure where to set the threshold, and a (stupid) bug in my result evaluation let to no difference, even when I played with this param.

Now everything works as expected. But is it possible to set this threshold directly on the Net object?

dkurt · 2018-04-19T16:15:21Z

@cansik, OpenCV parses .cfg file and extracts this threshold for non-maximum suppression procedure. I think the best solution is to set thresh to zero but post-process output detections in your application. You may try object_detection.py sample. There is a slider you can use to change a confidence threshold and see the difference.

ahmadfaizan1990 · 2019-04-07T06:48:41Z

@AlexeyAB can we visualize our model on tensorboard after training?
if yes kindly show me the steps.
thanks!!!

AlexeyAB · 2019-04-07T11:07:33Z

@ahmadfaizan1990 TensorBoard can visualize only data from models which were trained by using TensorFlow.

For Darknet your can see Loss & mAP (accuracy) chart during training: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

ahmadfaizan1990 · 2019-04-08T06:43:36Z

@ahmadfaizan1990 TensorBoard can visualize only data from models which were trained by using TensorFlow.

For Darknet your can see Loss & mAP (accuracy) chart during training: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

@AlexeyAB thanks for your reply, i have one more question if i change some convolutional layer in cfg file or want to minimize the layers, because after change in cfg file i am getting this error.

AlexeyAB · 2019-04-08T11:07:54Z

@ahmadfaizan1990 Ask here: https://github.com/AlexeyAB/darknet/issues

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from 73d476c to 73f2dc7 Compare September 25, 2017 10:16

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from 73f2dc7 to 0d1f4ae Compare September 25, 2017 21:15

AlexeyAB mentioned this pull request Sep 25, 2017

Added testdata and models for object detection using DNN Darknet Yolo v2 opencv/opencv_extra#385

Merged

dkurt reviewed Sep 26, 2017

View reviewed changes

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from ce7d140 to b32cdab Compare September 26, 2017 15:31

dkurt reviewed Sep 27, 2017

View reviewed changes

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from b32cdab to 7367677 Compare September 27, 2017 10:13

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from bbf860a to c3bc2ca Compare September 28, 2017 22:42

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from dd7b464 to 03e4d3f Compare September 30, 2017 12:11

dkurt reviewed Sep 30, 2017

View reviewed changes

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from 03e4d3f to 43e8ec1 Compare September 30, 2017 15:29

dkurt reviewed Sep 30, 2017

View reviewed changes

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from 43e8ec1 to 1495c29 Compare September 30, 2017 18:54

dkurt reviewed Oct 3, 2017

View reviewed changes

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from 793e696 to 6b5d6fe Compare October 4, 2017 10:53

dkurt reviewed Oct 6, 2017

View reviewed changes

AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from 6b5d6fe to af5f333 Compare October 6, 2017 16:17

opencv-pushbot pushed a commit that referenced this pull request Oct 10, 2017

Merge pull request #9705 from AlexeyAB:dnn_darknet_yolo_v2

b7ff9dd

This was referenced Oct 10, 2017

Darknet Yolo v2 is added to the OpenCV AlexeyAB/darknet#227

Open

Darknet Yolo v2 is added to the OpenCV pjreddie/darknet#242

Open

AlexeyAB mentioned this pull request Nov 29, 2017

reorg_cpu seems unusual pjreddie/darknet#344

Open

chaitan94 mentioned this pull request Dec 12, 2017

Implement deep learning CNN for plate detection openalpr/openalpr#99

Open

AlexeyAB mentioned this pull request Mar 21, 2018

How does 'route' layer work in yolov2? AlexeyAB/darknet#487

Open

This was referenced Apr 27, 2018

differences between original repo and this fork AlexeyAB/darknet#705

Open

Can yolo do realtime detection without using GPU? AlexeyAB/darknet#738

Closed

AlexeyAB mentioned this pull request Jun 20, 2018

Is it possible to simplify the sets to speed things up if I only care about one specific object? pjreddie/darknet#142

Open

AlexeyAB mentioned this pull request Sep 11, 2018

Is that possible that, darknet running on Intel GPU by using OpenCL? AlexeyAB/darknet#1578

Open

AlexeyAB mentioned this pull request Jan 20, 2019

Single Class Lightweight Architecture AlexeyAB/darknet#2251

Open

AlexeyAB mentioned this pull request Mar 8, 2019

Is there pixel shuffle layer? AlexeyAB/darknet#2336

Closed

AlexeyAB mentioned this pull request Apr 2, 2020

Implemented weighted-multi_input-[shortcut] layer with weights-normalization AlexeyAB/darknet#4662

Open

AlexeyAB mentioned this pull request Apr 26, 2020

Comparison of some models on CPU vs VPU (neurochip) vs GPU AlexeyAB/darknet#5079

Open

AlexeyAB mentioned this pull request May 21, 2020

should use reorg_cpu(l.delta, l.w, l.h, l.c, l.batch, l.stride, 0, state.delta); when reverse=1? AlexeyAB/darknet#5688

Open

YashasSamaga mentioned this pull request May 28, 2020

spurious NMS in region layer which is not compatible with darknet #17415

Closed

4 tasks

Wulingtian mentioned this pull request Jan 25, 2021

Is there a GPU version of reorg？ AlexeyAB/darknet#7287

Open

AlexeyAB mentioned this pull request Oct 18, 2021

YOLOv3-tiny in Darknet vs OpenCV DNN: large objects are missed AlexeyAB/darknet#8146

Closed


		cv::Mat frame = cv::imread(parser.get<string>("image"), -1);

		if (frame.channels() == 4)


		setParams.setReshape(stride, current_shape.input_channels, current_shape.input_h, current_shape.input_w);

		current_shape.input_channels = 256;


		CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);

		return true;

		inputs[0][3] / reorgStride));

		CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);


		int out_c = channels / (reorgStride*reorgStride);

		for (int k = 0; k < channels; ++k) {

Uh oh!

Conversation

AlexeyAB commented Sep 24, 2017 • edited by dkurt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This pullrequest changes

Comparison of use:

Comparison of results OpenCV-example vs original Darknet: https://github.com/pjreddie/darknet

How to train (to detect your custom objects): https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Accuracy-speed:

Uh oh!

vpisarev commented Sep 25, 2017

Uh oh!

AlexeyAB commented Sep 25, 2017

Uh oh!

dkurt Sep 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeyAB Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkurt commented Sep 27, 2017

Uh oh!

AlexeyAB commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkurt commented Sep 27, 2017 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexeyAB commented Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeyAB Sep 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeyAB commented Sep 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexeyAB commented Sep 24, 2017 •

edited by dkurt

Loading

dkurt Sep 26, 2017 •

edited

Loading

AlexeyAB Sep 27, 2017 •

edited

Loading

AlexeyAB commented Sep 27, 2017 •

edited

Loading

dkurt commented Sep 27, 2017 •

edited by alalek

Loading

AlexeyAB commented Sep 28, 2017 •

edited

Loading

AlexeyAB Sep 30, 2017 •

edited

Loading

AlexeyAB commented Sep 30, 2017 •

edited

Loading