[doc] Add LSTM non-deterministic workaround#40893
[doc] Add LSTM non-deterministic workaround#40893xwang233 wants to merge 6 commits intopytorch:masterfrom
Conversation
|
cc @ptrblck |
💊 CI failures summary and remediationsAs of commit 7f71fa0 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| where :math:`k = \frac{1}{\text{hidden\_size}}` | ||
|
|
||
| .. warning:: | ||
| There are known deterministic issues for LSTM using cuDNN 7.6.5, 8.0 on CUDA 10.1 or later. |
There was a problem hiding this comment.
known non-determinism issues. Is it for LSTM only, or RNN/GRU are also affected?
There was a problem hiding this comment.
Is the non-deterministic behavior really only on those two versions of cuDNN and only if the version of CUDA is 10.1 or later?
There was a problem hiding this comment.
It could be related to other RNN. I'll check that and add docs at other places if necessary.
There was a problem hiding this comment.
You may also want to cover yourself and say "On some versions of cuDNN and CUDA..." It's not great, since then people will never know if they may hit this issue or not, but it's better than telling them it may only happen in cases X and Y and then seeing it happen in case Z, too.
Too bad there's no way to query for whether the function will be deterministic or not in the current environment, or request that it be run deterministically.
There was a problem hiding this comment.
I tested on cuda 10.2, cudnn 7.6.5. RNN and LSTM are affected. GRU is deterministic.
|
Can you also cross reference this from https://pytorch.org/docs/stable/notes/randomness.html ? We try to keep a list of all non-deterministic ops in that note. |
| This may affect performance. | ||
|
|
||
| On CUDA 10.2 or later, set environment variable | ||
| (note the leading colon symbol) |
There was a problem hiding this comment.
Do you mean the CUBLAS_WORKSPACE_CONFIG values? Either one would be fine.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Related: pytorch#35661 Preview  Pull Request resolved: pytorch#40893 Reviewed By: vincentqb Differential Revision: D22535418 Pulled By: ngimel fbshipit-source-id: f194ddaff8ec6d03a3616c87466e2cbbe7e429a9
Related: #35661
Preview
