Cudnn find ex by aromnvidia · Pull Request #1561 · pytorch/pytorch

aromnvidia · 2017-05-15T19:19:06Z

cudnnFind* functions were substituted with cudnnFind*Ex.

torch/csrc/cudnn/Conv.cpp

+       THCudaMalloc(state, &data, max_ws_size);
+       // if failed now then put workspace size equal to 0
+       max_ws_size = (NULL == data)? 0: max_ws_size;
+    }


torch/csrc/cudnn/Conv.cpp

+         CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED
+    };
+    size_t max_ws_size = 0;
+    void *data = NULL; // workspace


torch/csrc/cudnn/Conv.cpp

+	   THCudaFree(state, data);
+           data = NULL;
+	}
+	THCudaMalloc(state, &data, sz);


torch/csrc/cudnn/Conv.cpp

+        }
+    }
+    if(NULL == data){ // in case we free mem before allocation of bigger chunk and failed on allocation after that
+       THCudaMalloc(state, &data, max_ws_size);


apaszke · 2017-05-16T10:41:55Z

Actually I just realized that doing these mallocs + frees isn't the best idea in our case. We're using a caching allocator and this implementation of findAlgorithm can disrupt it's state quite heavily (e.g. if FFT requires 8GB of workspace, we'll allocate that and cache this block!). It'd be better to avoid these allocs and use cudaMemGetInfo + use cacheInfo from THCDeviceAllocator to determine a cap on the workspace size.

ngimel · 2017-05-16T16:55:45Z

You still need to allocate a workspace after you determined a cap on it, which will mean that you'd allocate and cache an even bigger block. It might be Ok, you'll be able to split it later if needed, but I'm not sure it is strictly better than allocating max you possibly need for this convolution.
On the other hand, I agree with @apaszke that you should not be allocating/freeing at each iteration of the algo loop, I think it is best to find maximum workspace required by applicable algos and try to allocate that. Keep in mind that sometimes an inordinate (~40GB) workspace is returned, you'd have to ignore that.

aromnvidia · 2017-05-19T02:19:46Z

Thank you all for comments!

…#1561) * do not re-compute unary op with output and allow expr duplication in debug print.

New tests introduced for testing NHWC and NCHW batchnorm on MIOpen : - test_batchnorm_nhwc_miopen_cuda_float32 - test_batchnorm_nchw_miopen_cuda_float32 This test verifies weight and bias gradients, running_mean and running_var We can add other dtypes later How to run: `MIOPEN_ENABLE_LOGGING_CMD=1 python -u test/test_nn.py -v -k test_batchnorm_nhwc_miopen_cuda_float32` There is a difference in running_variance for NHWC batchnorm fp32 between MIOpen and native ``` MIOPEN_ENABLE_LOGGING_CMD=1 python -u test/test_nn.py -v -k test_batchnorm_nhwc_miopen_cuda_float32 ... self.assertEqual(mod.running_var, ref_mod.running_var) AssertionError: Tensor-likes are not close! Mismatched elements: 8 / 8 (100.0%) Greatest absolute difference: 0.05455732345581055 at index (5,) (up to 1e-05 allowed) Greatest relative difference: 0.030772637575864792 at index (5,) (up to 1.3e-06 allowed) ```

aromnvidia added 4 commits May 15, 2017 10:35

substitute cudnnFind* functions with cudnnFind*Ex

c5d7e94

clean up the code

e73166c

clean up the code

47e18cb

Fix an issue

5e48957

apaszke reviewed May 16, 2017

View reviewed changes

Rewrite the code for cudnnFind*Ex function as it was recommended

559060e

aromnvidia closed this May 19, 2017

ezyang added the open source label Jun 24, 2019

zasdfgbnm pushed a commit to zasdfgbnm/pytorch that referenced this pull request Apr 7, 2022

Fix segmenter issue with unary op forwarding and debug print (pytorch…

2c49c56

…#1561) * do not re-compute unary op with output and allow expr duplication in debug print.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cudnn find ex#1561

Cudnn find ex#1561
aromnvidia wants to merge 5 commits intopytorch:masterfrom
aromnvidia:cudnnFindEx

aromnvidia commented May 15, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke commented May 16, 2017

Uh oh!

ngimel commented May 16, 2017

Uh oh!

aromnvidia commented May 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

aromnvidia commented May 15, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke commented May 16, 2017

Uh oh!

ngimel commented May 16, 2017

Uh oh!

aromnvidia commented May 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants