Skip to content

cuda4dnn(region): optimize kernels#16096

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
YashasSamaga:cuda4dnn-region-optimize
Dec 9, 2019
Merged

cuda4dnn(region): optimize kernels#16096
opencv-pushbot merged 1 commit intoopencv:masterfrom
YashasSamaga:cuda4dnn-region-optimize

Conversation

@YashasSamaga
Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga commented Dec 8, 2019

This pullrequest changes

  • optimizations to region kernels based on profiling results
  • optimizations mainly YOLOv3's region pathway

The CUDA part of the region layer took nearly 700us for single image inference on GTX 1050. It now takes around 270us. That's over 2.6x improvement.

The YOLOv2 path is poorly optimized but it's better than before. It can be optimized further if required (I don't think anybody uses YOLOv2 anyway).

Benchmark:

  • This PR + PR16092
  • GTX 1050 and 7700HQ

Warmup runs: 3
Benchmark runs: 100

Model CUDA backend Darknet
YOLOv3 54.154ms 57.384ms

Benchmark code: https://gist.github.com/YashasSamaga/26eb2eb16be2cc749e3394d300a7585e

DISCLAIMER: I am not very comfortable editing darknet code but I hope it's correct.

force_builders=Custom,docs
buildworker:Custom=linux-4
docker_image:Custom=ubuntu-cuda:16.04

@YashasSamaga YashasSamaga force-pushed the cuda4dnn-region-optimize branch from 9103f52 to dd3f517 Compare December 8, 2019 17:38
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Looks good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants