Skip to content

[CUDNN] RNNv6 API deprecation support#115719

Closed
eqy wants to merge 39 commits intopytorch:mainfrom
eqy:rnn9
Closed

[CUDNN] RNNv6 API deprecation support#115719
eqy wants to merge 39 commits intopytorch:mainfrom
eqy:rnn9

Conversation

@eqy
Copy link
Copy Markdown
Collaborator

@eqy eqy commented Dec 13, 2023

The cuDNN RNNv6 API has been deprecated and support will be dropped in a recent release; this PR migrates to the newer API to support newer cuDNN versions that would otherwise break the build.

Note that it may not be tested yet in upstream CI if the upstream CI cuDNN version is less than 8.9.7.

CC @ptrblck @malfet

cc @csarofeen @ptrblck @xwang233 @zou3519 @mikaylagawarecki

@eqy eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: rnn Issues related to RNN support (LSTM, GRU, etc) open source ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Dec 13, 2023
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Dec 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115719

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 9fea679 with merge base 362bc6d (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Copy Markdown
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That quite a lot of ifdefing but I guess there is no way around it? :(

proj_size ? proj_size : hidden_size,
num_layers,
dropout_desc_.desc(),
false ? CUDNN_RNN_PADDED_IO_DISABLED : CUDNN_RNN_PADDED_IO_ENABLED));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, this was from debugging earlier, reverted to be conditioned on packed

std::vector<int> seqLengthArray(batch_size, 1);
// TODO(eqy): There's probably a smarter way to do this than O(SN)
for (auto it = batch_sizes.begin(); it != batch_sizes.end(); it++) {
// everyone starts at sequence length 1 so we skip an iteration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? That wasn't there in the old code

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to correspond to the packed batch explanation---cuDNN wants the per-sequence sequence lengths of a packed batch as if they were unpacked. It saves having to create the vector of TensorDescriptors that we had to do previously, but the trade is that now we have to provide this additional metadata.


void set(IntArrayRef input_sizes, IntArrayRef batch_sizes_, bool batch_first) {
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= RNNV8VERSION
int batch_first = -1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a global variable?!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was hiding as part of the struct but the declaration location makes it look global ;)

Removed as this was a vestigial variable from when I thought we needed to track the layout for creation of the RNNDataDescriptors.

mode,
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= RNNV8VERSION
input_size,
false, // bogus
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something about it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, basically it's a "don't care" value for this function call.

@albanD albanD requested a review from malfet December 13, 2023 16:42
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 13, 2023
@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 14, 2023

I can't really tell what's going on with the unexpected #endif complaint from the VS build. Do we have any precompiled headers here or something else that would cause this?

@albanD
Copy link
Copy Markdown
Collaborator

albanD commented Dec 14, 2023

Is there a windows-only #ifdef in there that breaks yours? :p

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 14, 2023

Is there a windows-only #ifdef in there that breaks yours? :p

What's also interesting is that there are some win-vs builds that don't seem to break
e.g.,

trunk / win-vs2019-cuda11.8-py3 / build (push) Successful in 18m

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 14, 2023

That's fun, it looks like now there is some nondeterminism? Made some trivial changes but the previously succeeding trunk / win-vs2019-cuda11.8-py3 / build (push) seems to fail now.

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 15, 2023

Now we're back to the case where one vs build shows the unexpected endif but the other doesn't? :/

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 15, 2023

Ah, looks like it failed but the output still returns success, very interesting

C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\sccache-cl.exe  /nologo /TP -DAT_PER_OPERATOR_HEADERS -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXPERIMENTAL_CUDNN_V8_API -DUSE_EXTERNAL_MZCRC -DUSE_MEM_EFF_ATTENTION -DUSE_MIMALLOC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -Dtorch_cuda_EXPORTS -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\third_party\mimalloc\include -IC:\actions-runner\_work\pytorch\pytorch\aten\src\THC -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\cuda -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -external:I C:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\protobuf\src -external:I C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\XNNPACK\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\ittapi\include -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\eigen -external:I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\include\oneapi\dnnl -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\ideep\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\NVTX\c\include -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -external:I C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\magma\include -external:W0 /DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /O2 /Ob2 /DNDEBUG /bigobj -DNDEBUG -MD -DMKL_HAS_SBGEMM -DCAFFE2_USE_GLOO /EHsc /bigobj -O2 -std:c++17 /showIncludes /Focaffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cudnn\RNN.cpp.obj /Fdcaffe2\CMakeFiles\torch_cuda.dir\ /FS -c C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\native\cudnn\RNN.cpp
C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\native\cudnn\RNN.cpp(2161): fatal error C1020: unexpected #endif

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 19, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased rnn9 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rnn9 && git pull --rebase)

Copy link
Copy Markdown
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.
Sounds good once CI is green!

return r;
}

auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits: can those be unsigned values (i.e. vector size can't really be negative, and same I assume is batch size and everything else. Also, why do they need to be const?

Suggested change
auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) {
auto rnn_descriptor(const Tensor& tensor, uint32_t batch_size, uint32_t seq_len, uint32_t vector_size) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was taking the approach of making things const unless proven otherwise---will take another pass making this more consistent.

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 27, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased rnn9 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rnn9 && git pull --rebase)

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 27, 2023

@pytorchmergebot rebase main

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased rnn9 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rnn9 && git pull --rebase)

@eqy
Copy link
Copy Markdown
Collaborator Author

eqy commented Dec 27, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Feb 23, 2024
Adds back `CUDNN_TENSOR_OP_MATH` which was erroneously dropped by #115719

CC @malfet @ptrblck

Pull Request resolved: #120277
Approved by: https://github.com/drisspg
@ptrblck ptrblck mentioned this pull request Apr 7, 2024
CLiqing pushed a commit to CLiqing/pytorch that referenced this pull request Nov 18, 2024
The cuDNN RNNv6 API has been deprecated and support will be dropped in a recent release; this PR migrates to the newer API to support newer cuDNN versions that would otherwise break the build.

Note that it may not be tested yet in upstream CI if the upstream CI cuDNN version is less than 8.9.7.

CC @ptrblck @malfet

Pull Request resolved: pytorch#115719
Approved by: https://github.com/albanD, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: cudnn Related to torch.backends.cudnn, and CuDNN support module: rnn Issues related to RNN support (LSTM, GRU, etc) open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants