cudnnFindEx by aromnvidia · Pull Request #1594 · pytorch/pytorch

aromnvidia · 2017-05-19T06:40:20Z

Thank you guys again for your comments and recommendations on my prev. pull request.
Here is updated version of my implementation.

torch/lib/THC/THCGeneral.c

+  size_t large = 0;
+  return THCudaMemGetInfo1(state, freeBytes, totalBytes, &large);
+}
+


torch/lib/THC/THCGeneral.h.in

 THC_API void THCudaHostRecord(THCState *state, void *ptr);

 THC_API cudaError_t THCudaMemGetInfo(THCState *state, size_t* freeBytes, size_t* totalBytes);
+THC_API cudaError_t THCudaMemGetInfo1(THCState *state, size_t* freeBytes, size_t* totalBytes, size_t* large);


torch/csrc/cudnn/Conv.cpp

+	    CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,
+	    CUDNN_CONVOLUTION_FWD_ALGO_DIRECT,
+            CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD,
+            CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED


apaszke

The PR looks good now. We only need one more change so we don't make the allocator cache unnecessarily large.

torch/csrc/cudnn/Conv.cpp

+
+    CHECK(cudnnFindConvolutionForwardAlgorithmEx(handle, conv.idesc.desc, in,
+        conv.wdesc.desc, wght, conv.cdesc.desc, conv.odesc.desc, out, 1, &algoCount, 
+        &perfResults, ws.data, ws.size));


aromnvidia · 2017-05-24T11:04:19Z

Sorry, Adam. Removed your comment by mistake. I thought I remove mine.

I thought it's not the right place to call emptyCache after CHECK(cudnnFindConvolutionForwardAlgorithmEx(handle, conv.idesc.desc, in... We will fall into deadlock on cuda_free_mutex. I'd suggested to flush the cache in chooseAlgorithm() after findAlgorithm(state, handle, conv, benchmark, in, out, wght, algo);

apaszke · 2017-05-24T11:20:33Z

Yeah sure. Everything is ok as long as the cache is going to be flushed only when benchmarks are run (this should only happen a few times during training).

Actually I've just started wondering if it wouldn't be simpler to limit the max workspace size to max(largest_cached_free_block, free_device_memory) and use use cudaMalloc (without using THC) to allocate the benchmarks workspace. This shouldn't change the state of cache at all, and wouldn't require the flush.

ngimel · 2017-05-24T16:08:42Z

You can't cudaMalloc largest_cached_free_block without first emptying cache, right? And in some unfortunate cases subsequent cudaFree'ing can interfere with nccl (though in some unfortunate cases that may happen also when caching allocator can't allocate requested block and starts cleaning up). Grrr, this is so hard to solve cleanly.

apaszke · 2017-05-24T17:01:17Z

Right, I meant to use the max and:

if there's more free space on the device then use cudaMalloc,
if the largest cached (and free!) block is larger, use that amount with the caching allocator. This shouldn't alter its state (no mallocs/frees called).

cudnnFind (no ex) used to call cudaFree too, and this part is already guarded by a free mutex.

colesbury · 2017-05-24T17:13:03Z

We discussed this earlier and I don't think these heuristics are a good idea. If you're worried that the caching allocator will unnecessarily hold on to large blocks, call THCCachingAllocator_emptyCache after benchmarking.

…oc/cudaFree

apaszke · 2017-05-25T12:25:54Z

@pytorchbot test this please

soumith · 2017-05-25T17:34:04Z

@aromnvidia looks like you haven't tested the GPU builds after you pushed your latest commits? They actually fail with basic compiler errors.
https://build.pytorch.org/job/pytorch-PR-py2-linux/38//console

soumith · 2017-05-26T06:36:30Z

@pytorchbot test this please

soumith · 2017-05-26T14:09:18Z

@pytorchbot add to whitelist

soumith · 2017-06-02T15:53:20Z

this is now merged into master.

…002d19 Summary: Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e Included changes: - **[f461f7a](onnx/onnx@f461f7a)**: Show the op's type and name when the shape inference is failed. (pytorch#1623) <Jerry> - **[ab8aaf9](onnx/onnx@ab8aaf9)**: Add scan test case (pytorch#1586) <G. Ramalingam> - **[c95357e](onnx/onnx@c95357e)**: link the tutorial (pytorch#1650) <Lu Fang> - **[d7e2420](onnx/onnx@d7e2420)**: Upgrade label encoder to support more input types (pytorch#1596) <Wei-Sheng Chin> - **[6425108](onnx/onnx@6425108)**: Add Doc about Adding New Operator into ONNX (pytorch#1647) <Lu Fang> - **[295889c](onnx/onnx@295889c)**: use an empty initializer to create map (pytorch#1643) <Lu Fang> - **[e38f3ec](onnx/onnx@e38f3ec)**: Remove redundant const (pytorch#1639) <daquexian> - **[ea694bf](onnx/onnx@ea694bf)**: implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (pytorch#1565) <Armen> - **[6db386e](onnx/onnx@6db386e)**: make output shape clear enough for Softmax family (pytorch#1634) <Lu Fang> - **[2b67c6e](onnx/onnx@2b67c6e)**: fix batchnorm doc (pytorch#1633) <Lu Fang> - **[c901784](onnx/onnx@c901784)**: remove inappropriate consts (pytorch#1632) <Lu Fang> - **[de82119](onnx/onnx@de82119)**: Shape inference fix for broadcast, concat and scan (pytorch#1594) <KeDengMS> - **[d7ffe3b](onnx/onnx@d7ffe3b)**: Update Optimizer Docs (pytorch#1607) <Armen> - **[d09d139](onnx/onnx@d09d139)**: mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (pytorch#1466) <Yuta Okamoto> - **[eb4b7c2](onnx/onnx@eb4b7c2)**: allow variadic parameters of different types (pytorch#1615) <G. Ramalingam> - **[4166246](onnx/onnx@4166246)**: Fix onnxifi test (pytorch#1617) <Yinghai Lu> - **[6706a4d](onnx/onnx@6706a4d)**: Fix a bug in vector address access (pytorch#1598) <Raymond Yang> - **[ae39866](onnx/onnx@ae39866)**: Separate types of inputs 1 and 2 in OneHot op. (pytorch#1610) <Spandan Tiwari> - **[45ba661](onnx/onnx@45ba661)**: Handle new types in the switch. (pytorch#1608) <Dmitri Smirnov> - **[14853b6](onnx/onnx@14853b6)**: Bump docker image version to 230 used in CircleCI (pytorch#1606) <bddppq> - **[e0993b8](onnx/onnx@e0993b8)**: [onnxifi] Make sure that backend handles run async. (pytorch#1599) <Roman Dzhabarov> - **[e6965cc](onnx/onnx@e6965cc)**: Introduce SparseTensor ML proto (pytorch#1554) <Dmitri Smirnov> - **[75b782f](onnx/onnx@75b782f)**: In driver test check the return status of onnxGetBackendIDs (pytorch#1597) <bddppq> - **[c05b364](onnx/onnx@c05b364)**: Make CI log less verbose (pytorch#1595) <bddppq> - **[fa568e4](onnx/onnx@fa568e4)**: Loop type shape inferencing (pytorch#1591) <Scott McKay> - **[937e64c](onnx/onnx@937e64c)**: add uint8 (pytorch#1590) <Lu Fang> - **[f86e951](onnx/onnx@f86e951)**: Add domain as an optional parameter for make_node function (pytorch#1588) <Young Kim> - **[ff45588](onnx/onnx@ff45588)**: Remove unreachable code in shape_inference.h (pytorch#1585) <Changming Sun> - **[f7dcad0](onnx/onnx@f7dcad0)**: Add several hyperbolic function ops. (pytorch#1499) <Sergii Dymchenko> - **[a60ac7d](onnx/onnx@a60ac7d)**: Add OneHot op to ONNX. (pytorch#1567) <Spandan Tiwari> - **[f6c3a7e](onnx/onnx@f6c3a7e)**: [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (pytorch#1583) <Roman Dzhabarov> - **[88d1784](onnx/onnx@88d1784)**: Fix MaxUnpool shape inference when output_shape is provided as input (pytorch#1578) <Spandan Tiwari> - **[20041b7](onnx/onnx@20041b7)**: Add type shape inferencing for the If operator (pytorch#1571) <Scott McKay> - **[d6c4c75](onnx/onnx@d6c4c75)**: Add a virtual destructor to GraphInferencer (pytorch#1574) <Changming Sun> - **[a339598](onnx/onnx@a339598)**: fix ConvTranspose spec (pytorch#1566) <Wenhao Hu> Differential Revision: D13263831 fbshipit-source-id: 0c158dd12c45d704b6f37f63f3d74ed34ef2f534

…002d19 (#14568) Summary: Pull Request resolved: #14568 Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e Included changes: - **[f461f7a](onnx/onnx@f461f7a)**: Show the op's type and name when the shape inference is failed. (#1623) <Jerry> - **[ab8aaf9](onnx/onnx@ab8aaf9)**: Add scan test case (#1586) <G. Ramalingam> - **[c95357e](onnx/onnx@c95357e)**: link the tutorial (#1650) <Lu Fang> - **[d7e2420](onnx/onnx@d7e2420)**: Upgrade label encoder to support more input types (#1596) <Wei-Sheng Chin> - **[6425108](onnx/onnx@6425108)**: Add Doc about Adding New Operator into ONNX (#1647) <Lu Fang> - **[295889c](onnx/onnx@295889c)**: use an empty initializer to create map (#1643) <Lu Fang> - **[e38f3ec](onnx/onnx@e38f3ec)**: Remove redundant const (#1639) <daquexian> - **[ea694bf](onnx/onnx@ea694bf)**: implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (#1565) <Armen> - **[6db386e](onnx/onnx@6db386e)**: make output shape clear enough for Softmax family (#1634) <Lu Fang> - **[2b67c6e](onnx/onnx@2b67c6e)**: fix batchnorm doc (#1633) <Lu Fang> - **[c901784](onnx/onnx@c901784)**: remove inappropriate consts (#1632) <Lu Fang> - **[de82119](onnx/onnx@de82119)**: Shape inference fix for broadcast, concat and scan (#1594) <KeDengMS> - **[d7ffe3b](onnx/onnx@d7ffe3b)**: Update Optimizer Docs (#1607) <Armen> - **[d09d139](onnx/onnx@d09d139)**: mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (#1466) <Yuta Okamoto> - **[eb4b7c2](onnx/onnx@eb4b7c2)**: allow variadic parameters of different types (#1615) <G. Ramalingam> - **[4166246](onnx/onnx@4166246)**: Fix onnxifi test (#1617) <Yinghai Lu> - **[6706a4d](onnx/onnx@6706a4d)**: Fix a bug in vector address access (#1598) <Raymond Yang> - **[ae39866](onnx/onnx@ae39866)**: Separate types of inputs 1 and 2 in OneHot op. (#1610) <Spandan Tiwari> - **[45ba661](onnx/onnx@45ba661)**: Handle new types in the switch. (#1608) <Dmitri Smirnov> - **[14853b6](onnx/onnx@14853b6)**: Bump docker image version to 230 used in CircleCI (#1606) <bddppq> - **[e0993b8](onnx/onnx@e0993b8)**: [onnxifi] Make sure that backend handles run async. (#1599) <Roman Dzhabarov> - **[e6965cc](onnx/onnx@e6965cc)**: Introduce SparseTensor ML proto (#1554) <Dmitri Smirnov> - **[75b782f](onnx/onnx@75b782f)**: In driver test check the return status of onnxGetBackendIDs (#1597) <bddppq> - **[c05b364](onnx/onnx@c05b364)**: Make CI log less verbose (#1595) <bddppq> - **[fa568e4](onnx/onnx@fa568e4)**: Loop type shape inferencing (#1591) <Scott McKay> - **[937e64c](onnx/onnx@937e64c)**: add uint8 (#1590) <Lu Fang> - **[f86e951](onnx/onnx@f86e951)**: Add domain as an optional parameter for make_node function (#1588) <Young Kim> - **[ff45588](onnx/onnx@ff45588)**: Remove unreachable code in shape_inference.h (#1585) <Changming Sun> - **[f7dcad0](onnx/onnx@f7dcad0)**: Add several hyperbolic function ops. (#1499) <Sergii Dymchenko> - **[a60ac7d](onnx/onnx@a60ac7d)**: Add OneHot op to ONNX. (#1567) <Spandan Tiwari> - **[f6c3a7e](onnx/onnx@f6c3a7e)**: [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (#1583) <Roman Dzhabarov> - **[88d1784](onnx/onnx@88d1784)**: Fix MaxUnpool shape inference when output_shape is provided as input (#1578) <Spandan Tiwari> - **[20041b7](onnx/onnx@20041b7)**: Add type shape inferencing for the If operator (#1571) <Scott McKay> - **[d6c4c75](onnx/onnx@d6c4c75)**: Add a virtual destructor to GraphInferencer (#1574) <Changming Sun> - **[a339598](onnx/onnx@a339598)**: fix ConvTranspose spec (#1566) <Wenhao Hu> Reviewed By: zrphercule Differential Revision: D13263831 fbshipit-source-id: a2ff22c6454e2430429e5a7d18d21661a7ffb0cb

aromnvidia added 6 commits May 15, 2017 10:35

substitute cudnnFind* functions with cudnnFind*Ex

c5d7e94

clean up the code

e73166c

clean up the code

47e18cb

Fix an issue

5e48957

Rewrite the code for cudnnFind*Ex function as it was recommended

559060e

remove redundant 'const' qualifier

66dca16

apaszke reviewed May 19, 2017

View reviewed changes

aromnvidia added 2 commits May 21, 2017 19:00

Rename variables and routines

b81d207

Fix

cb1dfa6

apaszke reviewed May 22, 2017

View reviewed changes

torch/csrc/cudnn/Conv.cpp

CHECK(cudnnFindConvolutionForwardAlgorithmEx(handle, conv.idesc.desc, in,

conv.wdesc.desc, wght, conv.cdesc.desc, conv.odesc.desc, out, 1, &algoCount,

&perfResults, ws.data, ws.size));

This comment was marked as off-topic.

Sign in to view

aromnvidia added 2 commits May 25, 2017 02:54

We don't need this lock here since cudnnFind*Ex doesn't call cudaMall…

63d8ec3

…oc/cudaFree

call emptyCache() to flush large blocks after benchmarking

a9651b9

apaszke approved these changes May 25, 2017

View reviewed changes

Update Conv.cpp

91abf2b

Update Conv.cpp

12ef45d

apaszke approved these changes Jun 1, 2017

View reviewed changes

apaszke closed this Jun 1, 2017

apaszke reopened this Jun 1, 2017

soumith closed this Jun 2, 2017

houseroad mentioned this pull request Nov 29, 2018

Automatic update of fbcode/onnx to f461f7aad9987635b4aff108620ed7918f002d19 #14568

Closed

ezyang added the open source label Jun 24, 2019

Conversation

aromnvidia commented May 19, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

aromnvidia commented May 24, 2017

Uh oh!

apaszke commented May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented May 24, 2017

Uh oh!

apaszke commented May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colesbury commented May 24, 2017

Uh oh!

apaszke commented May 25, 2017

Uh oh!

soumith commented May 25, 2017

Uh oh!

soumith commented May 26, 2017

Uh oh!

soumith commented May 26, 2017

Uh oh!

soumith commented Jun 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

apaszke commented May 24, 2017 •

edited

Loading

apaszke commented May 24, 2017 •

edited

Loading