Skip to content

Added DNN Darknet Yolo v2 for object detection#9705

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
AlexeyAB:dnn_darknet_yolo_v2
Oct 10, 2017
Merged

Added DNN Darknet Yolo v2 for object detection#9705
opencv-pushbot merged 1 commit intoopencv:masterfrom
AlexeyAB:dnn_darknet_yolo_v2

Conversation

@AlexeyAB
Copy link
Copy Markdown
Contributor

@AlexeyAB AlexeyAB commented Sep 24, 2017

opencv_extra=dnn_model_darknet_yolo_v2

This pullrequest changes

Added neural network Darknet Yolo v2 for object detection: https://pjreddie.com/darknet/yolo/
Added example of usage: yolo_object_detection.cpp / example_dnn-yolo_object_detection.exe

Supported layers:

  • route (as concat-layer)
  • reorg (as an addition to the reshape-layer)
  • maxpool
  • convolutional (conv+bn+relu)
  • region (detection_out) - added layer

Merge with extra: opencv/opencv_extra#385


Comparison of use:

  • original Darknet-Yolo-v2: darknet.exe detector test data/voc.data yolo-voc.cfg yolo-voc.weights -i 0 -thresh 0.24 data/dog.jpg

  • OpenCV Yolo example: example_dnn-yolo_object_detection.exe -cfg=yolo/yolo.cfg -model=yolo/yolo.weights -image=yolo/dog.jpg -min_confidence=0.24


Comparison of results OpenCV-example vs original Darknet: https://github.com/pjreddie/darknet

For cfg, weights and jpg-s from: https://drive.google.com/drive/folders/0BwRgzHpNbsWBN3JtSjBocng5YW8

  • Network resolution: 416 x 416
  • threshold = 0.24
  • nms-threshold = 0.4
  1. yolo.cfg & yolo.weights

    • dog.jpg using yolo.cfg
      coco_dog

    • eagle.jpg using yolo.cfg
      coco_eagle

    • giraffe.jpg using yolo.cfg
      coco_giraffe


  1. yolo-voc.cfg & yolo-voc.weights

    • dog.jpg using yolo-voc.cfg
      voc_dog

    • eagle.jpg using yolo-voc.cfg
      voc_eagle

    • giraffe.jpg using yolo-voc.cfg
      voc_giraffe


How to train (to detect your custom objects): https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects


Accuracy-speed:

68747470733a2f2f6873746f2e6f72672f66696c65732f6132342f3231652f3036382f61323432316530363839666234336630383538346465396434346332323135662e6a7067

68747470733a2f2f6873746f2e6f72672f66696c65732f3361362f6664662f6235332f33613666646662353333663334636565396235326264643962623062313964392e6a7067

@vpisarev
Copy link
Copy Markdown
Contributor

@AlexeyAB, thank you, this is very valuable contribution! Could you please add some regression test(s) for this functionality?

@AlexeyAB
Copy link
Copy Markdown
Contributor Author

@vpisarev I added: modules/dnn/test/test_darknet_importer.cpp
Also added test data and models (cfg, weights) for DNN Darknet Yolo v2: opencv/opencv_extra#385

for (it_type i = net->layers_cfg.begin(); i != net->layers_cfg.end(); ++i) {
++layers_counter;
std::map<std::string, std::string> &layer_params = i->second;
std::string layer_type = layer_params["type"];
Copy link
Copy Markdown
Member

@dkurt dkurt Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add an assertion for unknown layer types to prevent unexpected errors. In example, I can't read any model now because an every layer_type ends with ] character (convolutional], maxpool]). (Ubuntu OS).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now but Reproducibility_TinyYoloVoc and Reproducibility_YoloVoc tests are failed for me. Do them passed locally?

* @param darknetModel path to the .weights file with learned network.
* @returns Pointer to the created importer, NULL in failure cases.
*/
CV_EXPORTS_W Ptr<Importer> createDarknetImporter(const String &cfgFile, const String &darknetModel = String());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We defined methods like createCaffeImporter as deprecated. Please keep only readNetFromDarknet.


cv::Mat frame = cv::imread(parser.get<string>("image"), -1);

if (frame.channels() == 4)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't necessary: just use imread with default argument (http://docs.opencv.org/master/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkurt I fixed it.
Initially I did it as in ssd_object_detection.cpp and I thought that this has some hidden meaning :)

if (frame.channels() == 4)
cvtColor(frame, frame, cv::COLOR_BGRA2BGR);
//! [Prepare blob]
Mat preprocessedFrame = preprocess(frame, network_width, network_height);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use blobFromImage's arguments to make preprocessing (http://docs.opencv.org/3.3.0/d6/d0f/group__dnn.html#ga0507466a789702eda8ffcdfa37f4d194).

@AlexeyAB AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from ce7d140 to b32cdab Compare September 26, 2017 15:31
return false;

// Darknet ROUTE-layer
if (useRoute) return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some difference between Route layer and Concat? getMemoryShapes returns true if layer can work in-place (all element-wise layers).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but it doesn't work for Yolo if getMemoryShapes returns false.
Route-layer simply copies unchanged outputs from several layers: https://github.com/pjreddie/darknet/blob/master/src/route_layer.c#L83

Used copy_cpu() with INCX=1 and INCY=1: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L208

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeyAB, it seems to me problem is in route layer with a single input (that means problem is in current concat layer with #inputs == 1), https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg#L208. Is it used like an identity layer, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkurt Yes, route with a single input (bottom layer) is used as identity layer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeyAB, could you add an extra branch during route layer creation: add Concat layer if number of inputs more than 1 or Identity layer otherwise?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkurt I added identity layer for the 1 input. But why concat layer can't work with 1 input, and why there is no CV_Assert for this case?


setParams.setConcat(layers_vec.size(), layers_vec.data());
}
else if (layer_type == "reorg")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about reorg layer. Let the input is:

channel_0  channel_1  channel_2  channel_3
0 1        4 5        8 9        c d
2 3        6 7        a b        e f

and reorgStride = 2. So an output shape is 4x4x1 and values are:

output
1 4 1 5
8 c 9 d
2 6 3 7
a e b f

?

Copy link
Copy Markdown
Contributor Author

@AlexeyAB AlexeyAB Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left unchanged a bit strange original implementation of this layer. It increases the field of view of each final activation.
Reshape: 26 x 26 x 64 -> 13 x 13 x 256
reorg

For stride = 2

input
    0, 1, 2, 3,
    4, 5, 6, 7, 
    8, 9, a, b,
    c, d, e, f
output
channel_0  channel_1  channel_2  channel_3
0 2        1 3        4 6        5 7
8 a        9 b        c e        d f

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Anyway I suggest to replace it as a single layer or think how we can do the same transformations using existing ones (Permute, Reshape). Reshape layer doesn't change the data by definition neither in one of the frameworks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkurt I added reorg as separate layer reorg_layer.cpp


setParams.setReshape(stride, current_shape.input_channels, current_shape.input_h, current_shape.input_w);

current_shape.input_channels = 256;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number?

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Sep 27, 2017

@AlexeyAB, Thank you for the valuable contribution! We need to test all the new usage carefully. Can you add some unit tests with small few-layer networks like we do for the other importers? (https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_torch_importer.cpp and https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/torch/torch_gen_test_data.lua, https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_tf_importer.cpp and https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/tensorflow/generate_tf_models.py). In example, write simple configs, run darknet to initialize the weights, pass some random input and get the output, put configs/weights/inputs/outputs into the opencv_extra/testdata/dnn darknet subfolder?

@AlexeyAB
Copy link
Copy Markdown
Contributor Author

AlexeyAB commented Sep 27, 2017

@dkurt So I already added testdata and models for object detection using DNN Darknet Yolo v2 to the opencv_extra: opencv/opencv_extra#385

  • testdata/dnn/dogr.jpg - test image resized to the network size 416x416, to eliminate the side effects of resizing
  • testdata/dnn/tiny-yolo-voc.cfg - tiny model of Yolo v2 for Pascal VOC dataset
  • testdata/dnn/yolo-voc.cfg - full model of Yolo v2 for Pascal VOC dataset
  • Changed testdata/dnn/download_models.py:
    • Downloads https://pjreddie.com/media/files/yolo-voc.weights - full model of Yolo v2 trained for Pascal VOC dataset
    • Downloads https://pjreddie.com/media/files/tiny-yolo-voc.weights - tiny model of Yolo v2 trained for Pascal VOC dataset

In this pull-request there is: modules/dnn/test/test_darknet_importer.cpp

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Sep 27, 2017

@AlexeyAB, yeah, it's great, but I meant tests for separate layers. First of all it's necessary to protect your work done from the bugs that might appear in future development. Next thing is that BuildBot doesn't test these models for now because they aren't there. My local tests are failed and I think we can solve a problem by small checks for separate layers.

[----------] 1 test from Reproducibility_TinyYoloVoc
[ RUN      ] Reproducibility_TinyYoloVoc.Accuracy
unknown file: Failure
C++ exception with description "/home/dkurtaev/opencv/modules/ts/src/ts_func.cpp:1374: error: (-215) src1.type() == src2.type() && src1.size == src2.size in function norm
" thrown in the test body.
[  FAILED  ] Reproducibility_TinyYoloVoc.Accuracy (101 ms)
[----------] 1 test from Reproducibility_TinyYoloVoc (101 ms total)

[----------] 1 test from Reproducibility_YoloVoc
[ RUN      ] Reproducibility_YoloVoc.Accuracy
/home/dkurtaev/opencv/modules/dnn/test/test_common.hpp:54: Failure
Expected: (normL1) <= (l1), actual: 0.000232658 vs 1e-05
/home/dkurtaev/opencv/modules/dnn/test/test_common.hpp:57: Failure
Expected: (normInf) <= (lInf), actual: 0.00485086 vs 0.0001
[  FAILED  ] Reproducibility_YoloVoc.Accuracy (317 ms)
[----------] 1 test from Reproducibility_YoloVoc (317 ms total)

I referenced how we write unit tests for different frameworks. The binary size of required data is not so huge (i.e. less than 0.5MB for TensorFlow layers) and you can add it in a single PR @ opencv_extra.

@AlexeyAB AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from bbf860a to c3bc2ca Compare September 28, 2017 22:42
@AlexeyAB
Copy link
Copy Markdown
Contributor Author

AlexeyAB commented Sep 28, 2017

@dkurt

  1. I replaced the test image from dogr.jpg to the lossless dog416.png in the opencv_extra and now it works.
  2. I added tests for layers: Region and Reorg. Added to the opencv_extra: region.cfg, region.npy, region.input.npy, reorg.cfg, reorg.npy, reorg.input.npy.

All tests passed on both: Windows 7 x64 and Linux Debian 8.2 x64

[----------] 2 tests from Test_Darknet
[ RUN      ] Test_Darknet.read_tiny_yolo_voc
[       OK ] Test_Darknet.read_tiny_yolo_voc (0 ms)
[ RUN      ] Test_Darknet.read_yolo_voc
[       OK ] Test_Darknet.read_yolo_voc (1 ms)
[----------] 2 tests from Test_Darknet (3 ms total)

[----------] 1 test from Reproducibility_TinyYoloVoc
[ RUN      ] Reproducibility_TinyYoloVoc.Accuracy
[       OK ] Reproducibility_TinyYoloVoc.Accuracy (134 ms)
[----------] 1 test from Reproducibility_TinyYoloVoc (135 ms total)

[----------] 1 test from Reproducibility_YoloVoc
[ RUN      ] Reproducibility_YoloVoc.Accuracy
[       OK ] Reproducibility_YoloVoc.Accuracy (475 ms)
[----------] 1 test from Reproducibility_YoloVoc (475 ms total)
...
[----------] 1 test from Layer_Test_Region
[ RUN      ] Layer_Test_Region.Accuracy
[       OK ] Layer_Test_Region.Accuracy (2 ms)
[----------] 1 test from Layer_Test_Region (3 ms total)

[----------] 1 test from Layer_Test_Reorg
[ RUN      ] Layer_Test_Reorg.Accuracy
[       OK ] Layer_Test_Reorg.Accuracy (0 ms)
[----------] 1 test from Layer_Test_Reorg (1 ms total)

Results for comparison with OpenCV version are obtained on Linux Debian 8.2 using the current last commit of Darknet Yolo v2 compiled with GPU=0, OPENMP=1 and OpenCV=1: https://github.com/pjreddie/darknet

Using commands:

  • ./darknet detector test ./cfg/voc.data ./cfg/tiny-yolo-voc.cfg ./tiny-yolo-voc.weights -thresh 0.24 ./dog416.png

  • ./darknet detector test ./cfg/voc.data ./cfg/yolo-voc.cfg ./yolo-voc.weights -thresh 0.24 ./dog416.png

@AlexeyAB AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from dd7b464 to 03e4d3f Compare September 30, 2017 12:11
}
net->transpose = (net->major_ver > 1000) || (net->minor_ver > 1000);

layerShape current_shape;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we track shapes? Doesn't weights file contain kernels shapes?

Copy link
Copy Markdown
Contributor Author

@AlexeyAB AlexeyAB Sep 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, weights file doesn't contain kernels shapes.
Also Darknet tracks layers shapes while parsing a cfg-file:

int convolutional_out_height(convolutional_layer l)
{
    return (l.h + 2*l.pad - l.size) / l.stride + 1;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeyAB, May be we can remove at least width/height sizes tracking? As I can see only current_shape.input_channels is used to read convolutional layer weights.

ifile.open(darknetModel, std::ios::binary);
CV_Assert(ifile.is_open());

ifile.read(reinterpret_cast<char *>(&net->major_ver), sizeof(int32_t));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version numbers are used only to decide how many bytes to skip for seen value. transpose aren't used at all. Please make all unused variables from NetParameter are local.

return false;

// Darknet ROUTE-layer
if (useRoute) return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeyAB, it seems to me problem is in route layer with a single input (that means problem is in current concat layer with #inputs == 1), https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg#L208. Is it used like an identity layer, right?

@AlexeyAB
Copy link
Copy Markdown
Contributor Author

AlexeyAB commented Sep 30, 2017

@dkurt Yes, route layer with a single input (bottom layer) is used like an identity layer.

void setMaxpool(size_t kernel, size_t pad, size_t stride, size_t channels_num)
{
cv::dnn::experimental_dnn_v1::LayerParams maxpool_param;
maxpool_param.set<cv::String>("pool", "max");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, setup only actual parameters: "pool", "kernel_size", "pad", "stride".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Also required maxpool_param.set<cv::String>("pad_mode", "SAME"); for odd sizes of layers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However only one way of padding strategy is used: manual values or padMode ("SAME", "VALID") from TensorFlow. Please take a look on the "ceil_mode" flag instead: https://github.com/opencv/opencv/blob/master/modules/dnn/src/layers/pooling_layer.cpp#L629.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Accuracy test passed for Tiny-Yolo if padMode="SAME" with any ceil_mode value.
  • Accuracy test can't be passed for Tiny-Yolo for any values of padMode (padMode="VALID" or if padMod isn't set) with any ceil_mode value.

int w2 = i*reorgStride + offset % reorgStride;
int h2 = j*reorgStride + offset / reorgStride;
int out_index = w2 + width*reorgStride*(h2 + height*reorgStride*c2);
dstData[in_index] = srcData[out_index];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no typo in in<->out indices place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no typo, initially I left unchanged a bit strange original implementation of this layer.
But now I have changed this place so that there is no confusion.


CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);

return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me Reorg layer can't work in-place. getMemoryShapes returns true if layer can do it.

{
CV_Assert(inputs.size() > 0);
outputs = std::vector<MatShape>(inputs.size(), shape(inputs[0][1] * inputs[0][2] * anchors, inputs[0][3] / anchors));
return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as Reorg layer: it should returns false.


darknet::LayerParameter lp;
std::string layer_name = toString(layer_id);
if (use_batch_normalize || use_relu) layer_name = "conv_" + layer_name;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to name layers using type prefix every time. Moreover some of layers just named with numbers and it's hard to debug them.

}

cv::dnn::experimental_dnn_v1::LayerParams getParamConvolution(int kernel, int pad,
int stride, int filters_num, int channels_num)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable channels_num

fused_layer_names.push_back(last_layer);
}

void setMaxpool(size_t kernel, size_t pad, size_t stride, size_t channels_num)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable channels_num

@AlexeyAB AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch 2 times, most recently from 793e696 to 6b5d6fe Compare October 4, 2017 10:53
std::string top(const int index) const { return layer_name; }
};

struct layerShape {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused structure

params.blobs = blobs;
}

void setLastLayerName(std::string layer_name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function

inputs[0][3] / reorgStride));

CV_Assert(outputs[0][0] > 0 && outputs[0][1] > 0 && outputs[0][2] > 0 && outputs[0][3] > 0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add an assertion that total(outputs[0]) == total(inputs[0]).


int out_c = channels / (reorgStride*reorgStride);

for (int k = 0; k < channels; ++k) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it more clear: iterate over output dimensions and map them to input ones.

Copy link
Copy Markdown
Contributor Author

@AlexeyAB AlexeyAB Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely there was made a logical mistake in the original version of Darknet in the reorg layer: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L9

It works as I described if called reorg(input, output, out_w, out_h, _out_c, ...); #9705 (comment)

But in the original Darknet version function is called in this way reorg(input, output, in_w, in_h, in_c, ...);, so the one-to-one correspondence of the input and output parameters is preserved, but very strange permutations occur.

But because original Darknet works with this implementation of reorg and all models trained using it, then we can't fix this logical mistake.

Why the author has not found and corrected this error? I think:

  • Perhaps this logical error does not spoil the detection accuracy, so it was not detected.
  • Theoretically, we can assume that this error even increased the accuracy - so the author found it and left it.

iterate over output dimensions and map them to input ones.

So I can implement it in a such way,
but it works correctly only if (in_w % 2 == 0 && in_h % 2 == 0 && in_c % 4 == 0) : http://coliru.stacked-crooked.com/a/b962d20938362d4f

void reorg_my(const float*const srcData,  float *const dstData, int width, int height, int channels, int reorgStride)
{
	int outChannels = channels * reorgStride * reorgStride;
	int outHeight = height / reorgStride;
	int outWidth = width / reorgStride;

	for (int y = 0; y < outHeight; ++y) {
		for (int x = 0; x < outWidth; ++x) {
			for (int c = 0; c < outChannels; ++c) {
				int out_index = x + outWidth*(y + outHeight*c);

				int step = c / channels;
				int x_offset = step % reorgStride;
				int y_offset = reorgStride * ((step / reorgStride) % reorgStride);

				int in_x = x * reorgStride + x_offset;
				
				int out_seq_y = y + c*outHeight;
				int in_intermediate_y = out_seq_y*2 - out_seq_y%2;
				in_intermediate_y = in_intermediate_y % (channels*height);
				int in_c = in_intermediate_y / height;
				int in_y = in_intermediate_y % height + y_offset;
						
				int in_index = in_x + width*(in_y + height*in_c);
				dstData[out_index] = srcData[in_index];
			}
		}
	}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely there was made a logical mistake in the original version of Darknet in the reorg layer: https://github.com/pjreddie/darknet/blob/master/src/blas.c#L9

It works as I described if called reorg(input, output, out_w, out_h, _out_c, ...); #9705 (comment)

But in the original Darknet version function is called in this way reorg(input, output, in_w, in_h, in_c, ...);, so the one-to-one correspondence of the input and output parameters is preserved, but very strange permutations occur.

But because original Darknet works with this implementation of reorg and all models trained using it, then we can't fix this logical mistake.

Why the author has not found and corrected this error? I think:

* Perhaps this logical error does not spoil the detection accuracy, so it was not detected.

* Theoretically, we can assume that this error even increased the accuracy - so the author found it and left it.

iterate over output dimensions and map them to input ones.

So I can implement it in a such way,
but it works correctly only if (in_w % 2 == 0 && in_h % 2 == 0 && in_c % 4 == 0) : http://coliru.stacked-crooked.com/a/b962d20938362d4f

void reorg_my(const float*const srcData,  float *const dstData, int width, int height, int channels, int reorgStride)
{
	int outChannels = channels * reorgStride * reorgStride;
	int outHeight = height / reorgStride;
	int outWidth = width / reorgStride;

	for (int y = 0; y < outHeight; ++y) {
		for (int x = 0; x < outWidth; ++x) {
			for (int c = 0; c < outChannels; ++c) {
				int out_index = x + outWidth*(y + outHeight*c);

				int step = c / channels;
				int x_offset = step % reorgStride;
				int y_offset = reorgStride * ((step / reorgStride) % reorgStride);

				int in_x = x * reorgStride + x_offset;
				
				int out_seq_y = y + c*outHeight;
				int in_intermediate_y = out_seq_y*2 - out_seq_y%2;
				in_intermediate_y = in_intermediate_y % (channels*height);
				int in_c = in_intermediate_y / height;
				int in_y = in_intermediate_y % height + y_offset;
						
				int in_index = in_x + width*(in_y + height*in_c);
				dstData[out_index] = srcData[in_index];
			}
		}
	}
}

Is there a GPU version of reorg?

const float confidenceThreshold = 0.24;

for (int i = 0; i < out.rows; i++) {
float const*const prob_ptr = &out.at<float>(i, 5);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float const*const is a bit confusing (there are 4 places with it).

May I ask you to use named constant variables or place a comments because it's hard to understand the magic numbers? Especially it's about samples and tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed const*const, also added named constant variables and described the format of network output that compared to the reference in the tests.

But why is const*const confusing, is this contrary to the code style conventions that is accepted in OpenCV?
1st const forbids modification of values pointed to by this pointer, 2nd const forbids modification of this pointer.

getParamConvolution(kernel, pad, stride, filters_num);

darknet::LayerParameter lp;
std::string layer_name = "conv_" + toString(layer_id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to use cv::format("conv_%d", layer_id) instead toString here and in other places.


namespace darknet {

class LayerParameter {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can emit this structure. Layers are connected sequentially or using explicit numeric offsets starting from the newly added layer. So I think it's possible to use single vector of layers during network building. May I ask you to try it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should to try use cv::dnn::experimental_dnn_v1::LayerParams instead of darknet::LayerParameter?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can just parse text and binary files simultaneously: for an every entry in config we create a new one LayerParams and fill it depends on layer type. If specific layer has weights - read them from opened binary file. Then add a layer to final network (addLayerToPrev or addLayer with multiple connections based on id of the new layer and offsets i.e. -1, -4 of route).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeyAB, on the other hand, let's keep it as is now. I'll just install darknet and compare it with PR and we can merge it.

@AlexeyAB AlexeyAB force-pushed the dnn_darknet_yolo_v2 branch from 6b5d6fe to af5f333 Compare October 6, 2017 16:17
@adamhrv
Copy link
Copy Markdown

adamhrv commented Nov 30, 2017

Will there be a Python example for CV2 Darknet DNN?

When running readNetFromDarknet --> net.forward() in Python, the YoloV2 yolo-voc.cfg and yolo-voc.weights predictions result doesn't seem to provide any detection info.

This works OK:

net = cv2.dnn.readNetFromDarknet(path_to_prototxt, path_to_model)

This seems to work OK:

imw,imh = (416,416)
blob = cv2.dnn.blobFromImage(cv2.resize(im, (416, 416)))  
net.setInput(blob)
detections = net.forward()

But the detection result doesn't make sense:

print('Detections len: {}'.format(len(detections)))
Detections len: 845
print('Detections 0: {}'.format(detections[0]))
Detections 0: [  5.25983423e-02   2.70044785e-02   9.04742535e-03   2.62199971e-03
   2.96895425e-10   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00]

Referring to yolo_object_detection.cpp

Thanks for your work porting to opencv.

@fabito
Copy link
Copy Markdown

fabito commented Jan 27, 2018

This repo shows a custom face detection model demo using python. The models were trained on the Widerface dataset and are available for download (weights and cfgs).

@cansik
Copy link
Copy Markdown

cansik commented Apr 12, 2018

It seems that sometimes the implementation does not return the correct detection class type. Even with the default pre-trained 80 class model, sometimes the kind of class is unknown (only zeros in the 5 to 85 column).

Also when I train my own model with two different classes, it only tells me the confidence, but not the class itself. Here is the dump of the result matrix (2 class model, every class is always zero):

[0.0401995, 0.031487927, 0.031374719, 0.056794502, 0.0012705148, 0, 0;
 0.04043559, 0.049086317, 0.22208165, 0.25339442, 0.00040254797, 0, 0;
 0.037753381, 0.036353372, 0.60338461, 0.96344143, 0.00046182834, 0, 0;
 0.034598812, 0.039104842, 0.82008636, 0.50834161, 0.0003691405, 0, 0;
 0.038062122, 0.038740713, 1.2965844, 0.84108508, 0.00030294483, 0, 0;
 0.11062718, 0.023617726, 0.083281159, 0.038289066, 0.00061990606, 0, 0;
 0.11572818, 0.046723619, 0.24326266, 0.19994149, 0.00070420129, 0, 0;
 0.11441542, 0.03014034, 0.48347056, 0.84074426, 0.00024233655, 0, 0;
 0.11178426, 0.035311423, 0.71218336, 0.46975818, 0.00027817892, 0, 0;
 0.11641605, 0.038460143, 1.1835779, 0.68950963, 0.00037787345, 0, 0;
 0.19388326, 0.02510196, 0.091353334, 0.028100489, 0.00042861109, 0, 0;
 0.19292459, 0.048351433, 0.22517532, 0.21802522, 0.00040042991, 0, 0;
 0.18578193, 0.031620376, 0.63702989, 0.94969654, 0.0002352587, 0, 0;
 0.18794204, 0.038929399, 0.74781221, 0.44536978, 0.00019949637, 0, 0;
 0.19130239, 0.03961565, 0.9100669, 0.58336502, 0.00025648749, 0, 0;
 0.27135891, 0.02685423, 0.090977632, 0.032328393, 0.00058053283, 0, 0;
 0.27235663, 0.050213691, 0.25882462, 0.26637048, 0.00038077682, 0, 0;
 0.25689453, 0.0360987, 0.69405353, 0.82805032, 0.00038927642, 0, 0;
 0.25953323, 0.045636557, 0.58337092, 0.38006222, 0.00038867182, 0, 0;
 0.2683081, 0.042552318, 0.89848542, 0.72007042, 0.00038684186, 0, 0;
...

When using the darkent to detect the objects, it is always able to tell what kind of object it is (with the confidence), even if the confidence is very low. Do you experience the same behaviour?

@AlexeyAB
Copy link
Copy Markdown
Contributor Author

AlexeyAB commented Apr 12, 2018

@cansik The fact is that for values that less than threshold: Darknet zeroes the scale, but OpenCV zeroes the prob. And Darknet zeroes prob only if scale isn't zeroed.

You can get the same bounded boxes with the same probability in both OpenCV and Darknet (but not the same probs which less than threshold) only with:

Note: if in the original Darknet scale < thresh, then objectness=0 and prob will not be zeroed even if prob < thresh. But in OpenCV-dnn prob will be zeroed if prob < thresh.

What thresh did you use for Darknet and for OpenCV-dnn-yolo?


  1. OpenCV-dnn-yolo:
    for (int j = 0; j < classes; ++j) {
    float prob = scale*dstData[class_index + j]; // prob = IoU(box, object) = t0 * class-probability
    dstData[class_index + j] = (prob > thresh) ? prob : 0; // if (IoU < threshold) IoU = 0;
    }

  1. Darknet:

Note: if in the original Darknet scale < thresh, then objectness=0 and prob will not be zeroed even if prob < thresh. But in OpenCV-dnn prob will be zeroed if prob < thresh.

dets[index].objectness = scale > thresh ? scale : 0;
                if(dets[index].objectness){
                    for(j = 0; j < l.classes; ++j){
                        int class_index = entry_index(l, 0, n*l.w*l.h + i, l.coords + 1 + j);
                        float prob = scale*predictions[class_index];
                        dets[index].prob[j] = (prob > thresh) ? prob : 0;
                    }
                }

@cansik
Copy link
Copy Markdown

cansik commented Apr 19, 2018

@AlexeyAB As far as I know, the probability gets cleared by opencv. That is ok for me if the confidence is under the threshold.

How do I set the threshold in opencv? Is it possible to lower it zero, to get the probabilities of all predictions? Or do you mean with threshold the threshold defined in the cfg file?

Here is an example which is really strange:

image

On this image the trained network finds three characters (in opencv). All of them have a confidence higher than 80%, but the probability for these classes is zeroed. Do you know why this happens?

Class: none, Confidence: 0.8652
Class: none, Confidence: 0.8734
Class: none, Confidence: 0.8448

Second example:

image

For this picture I have exported the result matrix: yolo_results.sheets

Why are there only three probabilities, and not more for each item? Or do I understand the result matrix wrongly?

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Apr 19, 2018

@cansik, Have you tried to vary thresh parameter of [region] layer in .cfg file? You may set it zero and threshold detections with low confidence.

@cansik
Copy link
Copy Markdown

cansik commented Apr 19, 2018

@dkurt Yes that helped, thank you. I was not sure where to set the threshold, and a (stupid) bug in my result evaluation let to no difference, even when I played with this param.

Now everything works as expected. But is it possible to set this threshold directly on the Net object?

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Apr 19, 2018

@cansik, OpenCV parses .cfg file and extracts this threshold for non-maximum suppression procedure. I think the best solution is to set thresh to zero but post-process output detections in your application. You may try object_detection.py sample. There is a slider you can use to change a confidence threshold and see the difference.

@ahmadfaizan1990
Copy link
Copy Markdown

@AlexeyAB can we visualize our model on tensorboard after training?
if yes kindly show me the steps.
thanks!!!

@AlexeyAB
Copy link
Copy Markdown
Contributor Author

AlexeyAB commented Apr 7, 2019

@ahmadfaizan1990 TensorBoard can visualize only data from models which were trained by using TensorFlow.

For Darknet your can see Loss & mAP (accuracy) chart during training: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

68747470733a2f2f6873746f2e6f72672f776562742f79642f766c2f61672f7964766c616775746f66327a636e6a6f64737467726f656e3861632e6a706567

@ahmadfaizan1990
Copy link
Copy Markdown

@ahmadfaizan1990 TensorBoard can visualize only data from models which were trained by using TensorFlow.

For Darknet your can see Loss & mAP (accuracy) chart during training: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

68747470733a2f2f6873746f2e6f72672f776562742f79642f766c2f61672f7964766c616775746f66327a636e6a6f64737467726f656e3861632e6a706567

@AlexeyAB thanks for your reply, i have one more question if i change some convolutional layer in cfg file or want to minimize the layers, because after change in cfg file i am getting this error.
Untitled

@AlexeyAB
Copy link
Copy Markdown
Contributor Author

AlexeyAB commented Apr 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants