Skip to content

Fix cuDNN dropout state cache#10662

Closed
apaszke wants to merge 1 commit intopytorch:masterfrom
apaszke:cudnn_dropout_cache_fix
Closed

Fix cuDNN dropout state cache#10662
apaszke wants to merge 1 commit intopytorch:masterfrom
apaszke:cudnn_dropout_cache_fix

Conversation

@apaszke
Copy link
Contributor

@apaszke apaszke commented Aug 18, 2018

Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event.

@soumith

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

struct DropoutState {
// Both buffer and event are lazily instantiated when a dropout state is needed
// for the first time. Note that in this case needed != used, as we don't need
// a bufer to e.g. run RNNs in test mode.

This comment was marked as off-topic.

// a bufer to e.g. run RNNs in test mode.
at::Tensor buffer;
cuda::CUDAEvent event;
at::optional<cuda::CUDAEvent> event;

This comment was marked as off-topic.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Aug 20, 2018
Summary:
Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event.

soumith
Pull Request resolved: pytorch/pytorch#10662

Reviewed By: soumith

Differential Revision: D9393629

Pulled By: apaszke

fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90
PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event.

soumith
Pull Request resolved: pytorch#10662

Reviewed By: soumith

Differential Revision: D9393629

Pulled By: apaszke

fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants