[CUDNN] RNNv6 API deprecation support#115719
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115719
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 9fea679 with merge base 362bc6d ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
albanD
left a comment
There was a problem hiding this comment.
That quite a lot of ifdefing but I guess there is no way around it? :(
aten/src/ATen/cudnn/Descriptors.h
Outdated
| proj_size ? proj_size : hidden_size, | ||
| num_layers, | ||
| dropout_desc_.desc(), | ||
| false ? CUDNN_RNN_PADDED_IO_DISABLED : CUDNN_RNN_PADDED_IO_ENABLED)); |
There was a problem hiding this comment.
good catch, this was from debugging earlier, reverted to be conditioned on packed
| std::vector<int> seqLengthArray(batch_size, 1); | ||
| // TODO(eqy): There's probably a smarter way to do this than O(SN) | ||
| for (auto it = batch_sizes.begin(); it != batch_sizes.end(); it++) { | ||
| // everyone starts at sequence length 1 so we skip an iteration |
There was a problem hiding this comment.
What does this mean? That wasn't there in the old code
There was a problem hiding this comment.
Added a comment to correspond to the packed batch explanation---cuDNN wants the per-sequence sequence lengths of a packed batch as if they were unpacked. It saves having to create the vector of TensorDescriptors that we had to do previously, but the trade is that now we have to provide this additional metadata.
aten/src/ATen/native/cudnn/RNN.cpp
Outdated
|
|
||
| void set(IntArrayRef input_sizes, IntArrayRef batch_sizes_, bool batch_first) { | ||
| #if defined(CUDNN_VERSION) && CUDNN_VERSION >= RNNV8VERSION | ||
| int batch_first = -1; |
There was a problem hiding this comment.
Is this a global variable?!
There was a problem hiding this comment.
It was hiding as part of the struct but the declaration location makes it look global ;)
Removed as this was a vestigial variable from when I thought we needed to track the layout for creation of the RNNDataDescriptors.
aten/src/ATen/native/cudnn/RNN.cpp
Outdated
| mode, | ||
| #if defined(CUDNN_VERSION) && CUDNN_VERSION >= RNNV8VERSION | ||
| input_size, | ||
| false, // bogus |
There was a problem hiding this comment.
Should we do something about it?
There was a problem hiding this comment.
Added a comment, basically it's a "don't care" value for this function call.
|
I can't really tell what's going on with the unexpected #endif complaint from the VS build. Do we have any precompiled headers here or something else that would cause this? |
|
Is there a windows-only #ifdef in there that breaks yours? :p |
What's also interesting is that there are some win-vs builds that don't seem to break |
|
That's fun, it looks like now there is some nondeterminism? Made some trivial changes but the previously succeeding |
|
Now we're back to the case where one vs build shows the unexpected endif but the other doesn't? :/ |
|
Ah, looks like it failed but the output still returns success, very interesting |
|
@pytorchmergebot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
d4ac762 to
defa0fd
Compare
albanD
left a comment
There was a problem hiding this comment.
Thanks for the update.
Sounds good once CI is green!
aten/src/ATen/native/cudnn/RNN.cpp
Outdated
| return r; | ||
| } | ||
|
|
||
| auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) { |
There was a problem hiding this comment.
Two nits: can those be unsigned values (i.e. vector size can't really be negative, and same I assume is batch size and everything else. Also, why do they need to be const?
| auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) { | |
| auto rnn_descriptor(const Tensor& tensor, uint32_t batch_size, uint32_t seq_len, uint32_t vector_size) { |
There was a problem hiding this comment.
I was taking the approach of making things const unless proven otherwise---will take another pass making this more consistent.
|
@pytorchmergebot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
4555754 to
7b730b5
Compare
|
@pytorchmergebot rebase main |
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
|
Successfully rebased |
7b730b5 to
9fea679
Compare
|
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Adds back `CUDNN_TENSOR_OP_MATH` which was erroneously dropped by #115719 CC @malfet @ptrblck Pull Request resolved: #120277 Approved by: https://github.com/drisspg
The cuDNN RNNv6 API has been deprecated and support will be dropped in a recent release; this PR migrates to the newer API to support newer cuDNN versions that would otherwise break the build. Note that it may not be tested yet in upstream CI if the upstream CI cuDNN version is less than 8.9.7. CC @ptrblck @malfet Pull Request resolved: pytorch#115719 Approved by: https://github.com/albanD, https://github.com/malfet
The cuDNN RNNv6 API has been deprecated and support will be dropped in a recent release; this PR migrates to the newer API to support newer cuDNN versions that would otherwise break the build.
Note that it may not be tested yet in upstream CI if the upstream CI cuDNN version is less than 8.9.7.
CC @ptrblck @malfet
cc @csarofeen @ptrblck @xwang233 @zou3519 @mikaylagawarecki