You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
YOLOv2 added in Added DNN Darknet Yolo v2 for object detection #9705. YOLOv2 detected objects at one scale; hence, there is just one region layer. The region layer would output unfiltered detections. The obvious step after region layer was to perform NMS. Hence, NMS was presumably integrated into the region layer.
YOLOv3 support was added in YOLOv3 support #11322. The improved YOLOv3 was detecting objects at three different scales. Each scale had a separate yolo layer in darknet and NMS was performed on all the detections at the end. The YOLOv3 support #11322 creates a region layer for each yolo layer and hence performs NMS in each region layer.
Is the nms spurious?
The NMS was added in the YOLOv2 PR where there was just one region layer. The next step after the region computation would be NMS (which is probably why NMS was integrated into region). YOLOv3 has three region layers. Is it right to use the region layer as it is without disabling NMS?
Performance penalty for the CUDA backend:
The NMS is performed on CPU in the CUDA backend. This has a huge performance toll. The switch from GPU to CPU during inference introduces additional synchronization overhead, interference from OS scheduling decisions and prevents the latency hiding ability of CUDA streams.
The toll is so significant that FPS of YOLOv4 improves by almost 40% on RTX 2070 Super when NMS is disabled.
Issue submission checklist
I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues,
answers.opencv.org, Stack Overflow, etc and have not found solution
I updated to latest OpenCV version and the issue is still there
There is reproducer code and related data files: videos, images, onnx, etc
System information (version)
Detailed description
History of the region layer:
Is the nms spurious?
The NMS was added in the YOLOv2 PR where there was just one region layer. The next step after the region computation would be NMS (which is probably why NMS was integrated into region). YOLOv3 has three region layers. Is it right to use the region layer as it is without disabling NMS?
Performance penalty for the CUDA backend:
The NMS is performed on CPU in the CUDA backend. This has a huge performance toll. The switch from GPU to CPU during inference introduces additional synchronization overhead, interference from OS scheduling decisions and prevents the latency hiding ability of CUDA streams.
The toll is so significant that FPS of YOLOv4 improves by almost 40% on RTX 2070 Super when NMS is disabled.
Issue submission checklist
answers.opencv.org, Stack Overflow, etc and have not found solution