cuda4dnn: improve host-device transfer performance by YashasSamaga · Pull Request #16230 · opencv/opencv

YashasSamaga · 2019-12-24T13:40:18Z

This pullrequest changes

performs fp conversions on GPU
eliminates the costly intermediate host memory allocation required for every transfer involving fp16 data

Test Size = number of 32-bit floats

Test Size: 100
	CPU Conversion H2D Time: 0.0088ms
	GPU Conversion H2D Time: 0.0148ms
	CPU Conversion D2H Time: 0.0077ms
	GPU Conversion D2H Time: 0.0098ms
Test Size: 1000
	CPU Conversion H2D Time: 0.0068ms
	GPU Conversion H2D Time: 0.0147ms
	CPU Conversion D2H Time: 0.0078ms
	GPU Conversion D2H Time: 0.0098ms
Test Size: 10000
	CPU Conversion H2D Time: 0.0125ms
	GPU Conversion H2D Time: 0.0187ms
	CPU Conversion D2H Time: 0.0140ms
	GPU Conversion D2H Time: 0.0123ms
Test Size: 100000
	CPU Conversion H2D Time: 0.0622ms
	GPU Conversion H2D Time: 0.0496ms
	CPU Conversion D2H Time: 0.0780ms
	GPU Conversion D2H Time: 0.0423ms
Test Size: 1000000
	CPU Conversion H2D Time: 0.4920ms
	GPU Conversion H2D Time: 0.3530ms
	CPU Conversion D2H Time: 0.6348ms
	GPU Conversion D2H Time: 0.3262ms
Test Size: 10000000
	CPU Conversion H2D Time: 8.5243ms
	GPU Conversion H2D Time: 3.5249ms
	CPU Conversion D2H Time: 7.6002ms
	GPU Conversion D2H Time: 3.1775ms

It's slower to perform the conversion on GPU for small data sizes instead of converting on CPU and then transfering. But the difference is in the order of few microseconds to few tens of microseconds. However, for large data sizes, the GPU beats CPU by a large margin.

force_builders=Custom,docs
buildworker:Custom=linux-4
docker_image:Custom=ubuntu-cuda:18.04

build_image:Custom Mac=openvino-2019r3.0
build_image:Custom Win=openvino-2019r3.0
test_opencl:Custom Win=OFF
test_modules:Custom Mac=dnn,java,python3

alalek

Looks good to me! Thank you 👍

YashasSamaga force-pushed the cuda4dnn-fp-conversion branch from 11675c2 to f5fe63f Compare December 25, 2019 08:11

YashasSamaga changed the title ~~cuda4dnn: improve host-device transfer performance~~ [WIP] cuda4dnn: improve host-device transfer performance Dec 27, 2019

YashasSamaga force-pushed the cuda4dnn-fp-conversion branch 2 times, most recently from 23442dd to ab06364 Compare December 28, 2019 05:15

perfor fp conversions on GPU

01f97f1

YashasSamaga changed the title ~~[WIP] cuda4dnn: improve host-device transfer performance~~ cuda4dnn: improve host-device transfer performance Dec 29, 2019

YashasSamaga force-pushed the cuda4dnn-fp-conversion branch from ab06364 to 01f97f1 Compare December 29, 2019 18:50

alalek approved these changes Jan 5, 2020

View reviewed changes

opencv-pushbot pushed a commit that referenced this pull request Jan 5, 2020

Merge pull request #16230 from YashasSamaga:cuda4dnn-fp-conversion

1f2b2c5

opencv-pushbot merged commit 01f97f1 into opencv:master Jan 5, 2020

YashasSamaga mentioned this pull request Jan 8, 2020

Enable cuda4dnn on hardware without support for __half #16218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda4dnn: improve host-device transfer performance#16230

cuda4dnn: improve host-device transfer performance#16230
opencv-pushbot merged 1 commit intoopencv:masterfrom
YashasSamaga:cuda4dnn-fp-conversion

YashasSamaga commented Dec 24, 2019 •

edited by alalek

Loading

Uh oh!

alalek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

YashasSamaga commented Dec 24, 2019 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This pullrequest changes

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YashasSamaga commented Dec 24, 2019 •

edited by alalek

Loading