Fix cuDNN dropout state cache by apaszke · Pull Request #10662 · pytorch/pytorch

apaszke · 2018-08-18T22:48:16Z

Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event.

@soumith

facebook-github-bot

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

aten/src/ATen/native/cudnn/RNN.cpp

 struct DropoutState {
+  // Both buffer and event are lazily instantiated when a dropout state is needed
+  // for the first time. Note that in this case needed != used, as we don't need
+  // a bufer to e.g. run RNNs in test mode.


aten/src/ATen/native/cudnn/RNN.cpp

+  // a bufer to e.g. run RNNs in test mode.
  at::Tensor buffer;
-  cuda::CUDAEvent event;
+  at::optional<cuda::CUDAEvent> event;


Summary: Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event. soumith Pull Request resolved: pytorch/pytorch#10662 Reviewed By: soumith Differential Revision: D9393629 Pulled By: apaszke fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90

Summary: Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event. soumith Pull Request resolved: pytorch#10662 Reviewed By: soumith Differential Revision: D9393629 Pulled By: apaszke fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90

Fix cuDNN dropout state cache

f1a2ca1

apaszke requested review from colesbury, ezyang, gchanan, soumith and zdevito as code owners August 18, 2018 22:48

facebook-github-bot reviewed Aug 18, 2018

View reviewed changes

ezyang reviewed Aug 19, 2018

View reviewed changes

aten/src/ATen/native/cudnn/RNN.cpp

// a bufer to e.g. run RNNs in test mode.

at::Tensor buffer;

cuda::CUDAEvent event;

at::optional<cuda::CUDAEvent> event;

This comment was marked as off-topic.

Sign in to view

soumith approved these changes Aug 20, 2018

View reviewed changes

facebook-github-bot closed this in 9ad9191 Aug 20, 2018

ezyang added open source merged labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cuDNN dropout state cache#10662

Fix cuDNN dropout state cache#10662
apaszke wants to merge 1 commit intopytorch:masterfrom
apaszke:cudnn_dropout_cache_fix

apaszke commented Aug 18, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

apaszke commented Aug 18, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants