[CUDNN] RNNv6 API deprecation support by eqy · Pull Request #115719 · pytorch/pytorch

eqy · 2023-12-13T01:14:09Z

The cuDNN RNNv6 API has been deprecated and support will be dropped in a recent release; this PR migrates to the newer API to support newer cuDNN versions that would otherwise break the build.

Note that it may not be tested yet in upstream CI if the upstream CI cuDNN version is less than 8.9.7.

CC @ptrblck @malfet

cc @csarofeen @ptrblck @xwang233 @zou3519 @mikaylagawarecki

pytorch-bot · 2023-12-13T01:14:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115719

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 9fea679 with merge base 362bc6d ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh)
distributed/fsdp/test_shard_utils.py::TestShardUtilsDistributedDTensor::test_create_chunk_dtensor
periodic / linux-focal-rocm5.7-py3.8 / test (distributed, 2, 2, linux.rocm.gpu) (gh)
distributed/test_c10d_functional_native.py::C10DFunctionalNativeTest::test_inductor_all_gather_into_tensor_coalesced
periodic / win-vs2019-cuda11.8-py3 / test (default, 2, 4, windows.g5.4xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

That quite a lot of ifdefing but I guess there is no way around it? :(

aten/src/ATen/cudnn/Descriptors.h

albanD · 2023-12-13T16:35:07Z

aten/src/ATen/cudnn/Descriptors.h

+          proj_size ? proj_size : hidden_size,
+          num_layers,
+          dropout_desc_.desc(),
+          false ? CUDNN_RNN_PADDED_IO_DISABLED : CUDNN_RNN_PADDED_IO_ENABLED));


good catch, this was from debugging earlier, reverted to be conditioned on packed

albanD · 2023-12-13T16:36:47Z

aten/src/ATen/native/cudnn/RNN.cpp

+    std::vector<int> seqLengthArray(batch_size, 1);
+    // TODO(eqy): There's probably a smarter way to do this than O(SN)
+    for (auto it = batch_sizes.begin(); it != batch_sizes.end(); it++) {
+      // everyone starts at sequence length 1 so we skip an iteration


What does this mean? That wasn't there in the old code

Added a comment to correspond to the packed batch explanation---cuDNN wants the per-sequence sequence lengths of a packed batch as if they were unpacked. It saves having to create the vector of TensorDescriptors that we had to do previously, but the trade is that now we have to provide this additional metadata.

albanD · 2023-12-13T16:38:25Z

aten/src/ATen/native/cudnn/RNN.cpp


-    void set(IntArrayRef input_sizes, IntArrayRef batch_sizes_, bool batch_first) {
+#if defined(CUDNN_VERSION) && CUDNN_VERSION >= RNNV8VERSION
+    int batch_first = -1;


Is this a global variable?!

It was hiding as part of the struct but the declaration location makes it look global ;)

Removed as this was a vestigial variable from when I thought we needed to track the layout for creation of the RNNDataDescriptors.

albanD · 2023-12-13T16:41:03Z

aten/src/ATen/native/cudnn/RNN.cpp

      mode,
+#if defined(CUDNN_VERSION) && CUDNN_VERSION >= RNNV8VERSION
+      input_size,
+      false, // bogus


Should we do something about it?

Added a comment, basically it's a "don't care" value for this function call.

eqy · 2023-12-14T01:23:31Z

I can't really tell what's going on with the unexpected #endif complaint from the VS build. Do we have any precompiled headers here or something else that would cause this?

albanD · 2023-12-14T16:56:11Z

Is there a windows-only #ifdef in there that breaks yours? :p

eqy · 2023-12-14T21:28:55Z

Is there a windows-only #ifdef in there that breaks yours? :p

What's also interesting is that there are some win-vs builds that don't seem to break
e.g.,

trunk / win-vs2019-cuda11.8-py3 / build (push) Successful in 18m

eqy · 2023-12-14T23:28:42Z

That's fun, it looks like now there is some nondeterminism? Made some trivial changes but the previously succeeding trunk / win-vs2019-cuda11.8-py3 / build (push) seems to fail now.

eqy · 2023-12-15T04:00:27Z

Now we're back to the case where one vs build shows the unexpected endif but the other doesn't? :/

eqy · 2023-12-15T04:02:10Z

Ah, looks like it failed but the output still returns success, very interesting

C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\bin\sccache-cl.exe  /nologo /TP -DAT_PER_OPERATOR_HEADERS -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXPERIMENTAL_CUDNN_V8_API -DUSE_EXTERNAL_MZCRC -DUSE_MEM_EFF_ATTENTION -DUSE_MIMALLOC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -Dtorch_cuda_EXPORTS -IC:\actions-runner\_work\pytorch\pytorch\build\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src -IC:\actions-runner\_work\pytorch\pytorch\build -IC:\actions-runner\_work\pytorch\pytorch -IC:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\benchmark\include -IC:\actions-runner\_work\pytorch\pytorch\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\onnx -IC:\actions-runner\_work\pytorch\pytorch\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\build\third_party\foxi -IC:\actions-runner\_work\pytorch\pytorch\third_party\mimalloc\include -IC:\actions-runner\_work\pytorch\pytorch\aten\src\THC -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\cuda -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\..\..\..\third_party\cutlass\include -IC:\actions-runner\_work\pytorch\pytorch\build\caffe2\aten\src -IC:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\cuda\..\.. -IC:\actions-runner\_work\pytorch\pytorch\c10\.. -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api -IC:\actions-runner\_work\pytorch\pytorch\torch\csrc\api\include -external:I C:\actions-runner\_work\pytorch\pytorch\build\third_party\gloo -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\gloo -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googlemock\include -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\googletest\googletest\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\protobuf\src -external:I C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\XNNPACK\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\ittapi\include -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\eigen -external:I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\ideep\mkl-dnn\include\oneapi\dnnl -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\ideep\include -external:I C:\actions-runner\_work\pytorch\pytorch\third_party\NVTX\c\include -external:I C:\actions-runner\_work\pytorch\pytorch\cmake\..\third_party\cudnn_frontend\include -external:I C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\magma\include -external:W0 /DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /O2 /Ob2 /DNDEBUG /bigobj -DNDEBUG -MD -DMKL_HAS_SBGEMM -DCAFFE2_USE_GLOO /EHsc /bigobj -O2 -std:c++17 /showIncludes /Focaffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cudnn\RNN.cpp.obj /Fdcaffe2\CMakeFiles\torch_cuda.dir\ /FS -c C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\native\cudnn\RNN.cpp
C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen\native\cudnn\RNN.cpp(2161): fatal error C1020: unexpected #endif

eqy · 2023-12-19T00:16:34Z

@pytorchmergebot rebase

pytorchmergebot · 2023-12-19T00:18:27Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-12-19T00:18:32Z

Successfully rebased rnn9 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rnn9 && git pull --rebase)

albanD

Thanks for the update.
Sounds good once CI is green!

aten/src/ATen/native/cudnn/RNN.cpp

malfet · 2023-12-19T19:39:24Z

aten/src/ATen/native/cudnn/RNN.cpp

+    return r;
+  }
+
+  auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) {


Two nits: can those be unsigned values (i.e. vector size can't really be negative, and same I assume is batch size and everything else. Also, why do they need to be const?

Suggested change

auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) {

auto rnn_descriptor(const Tensor& tensor, uint32_t batch_size, uint32_t seq_len, uint32_t vector_size) {

I was taking the approach of making things const unless proven otherwise---will take another pass making this more consistent.

aten/src/ATen/native/cudnn/RNN.cpp

eqy · 2023-12-27T02:23:12Z

@pytorchmergebot rebase

pytorchmergebot · 2023-12-27T02:24:50Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-12-27T02:24:55Z

Successfully rebased rnn9 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rnn9 && git pull --rebase)

eqy · 2023-12-27T05:42:46Z

@pytorchmergebot rebase main

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

pytorchmergebot · 2023-12-27T05:44:40Z

Successfully rebased rnn9 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rnn9 && git pull --rebase)

eqy · 2023-12-27T08:10:38Z

@pytorchmergebot merge

pytorchmergebot · 2023-12-27T08:12:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@malfet

Adds back `CUDNN_TENSOR_OP_MATH` which was erroneously dropped by #115719 CC @malfet @ptrblck Pull Request resolved: #120277 Approved by: https://github.com/drisspg

@ptrblck

The cuDNN RNNv6 API has been deprecated and support will be dropped in a recent release; this PR migrates to the newer API to support newer cuDNN versions that would otherwise break the build. Note that it may not be tested yet in upstream CI if the upstream CI cuDNN version is less than 8.9.7. CC @ptrblck @malfet Pull Request resolved: pytorch#115719 Approved by: https://github.com/albanD, https://github.com/malfet

eqy added the ciflow/inductor label Dec 13, 2023

albanD reviewed Dec 13, 2023

View reviewed changes

albanD requested a review from malfet December 13, 2023 16:42

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 13, 2023

pytorchmergebot force-pushed the rnn9 branch from d4ac762 to defa0fd Compare December 19, 2023 00:18

eqy self-assigned this Dec 19, 2023

eqy requested a review from albanD December 19, 2023 18:36

albanD approved these changes Dec 19, 2023

View reviewed changes

malfet approved these changes Dec 19, 2023

View reviewed changes

pytorchmergebot force-pushed the rnn9 branch from 4555754 to 7b730b5 Compare December 27, 2023 02:24

eqy and others added 15 commits December 27, 2023 05:44

bisecting ifdefs 1

2cf6609

bisecting ifdefs 2

56d6cdb

bisecting ifdefs 3

9c78d7a

bisecting ifdefs 4

fe06845

bisecting ifdefs 5

fa247cb

bisecting ifdefs 5

6862f7b

bisecting ifdefs 6

a7ebbba

bisecting ifdefs 7

f1e2b0c

check in

b0e7b0d

hope this fixes vs2019

c9dcab9

Update aten/src/ATen/native/cudnn/RNN.cpp

1679f25

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Update aten/src/ATen/native/cudnn/RNN.cpp

23f998f

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Update aten/src/ATen/native/cudnn/RNN.cpp

9f82513

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Update aten/src/ATen/native/cudnn/RNN.cpp

580cada

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

fixes

9fea679

pytorchmergebot force-pushed the rnn9 branch from 7b730b5 to 9fea679 Compare December 27, 2023 05:44

pytorchmergebot added the merging label Dec 27, 2023

pytorchmergebot added Merged and removed merging labels Dec 27, 2023

pytorchmergebot closed this in e14026b Dec 27, 2023

eqy mentioned this pull request Feb 21, 2024

[cuDNN][cuDNN RNNv8 API] Fix math type behavior in cuDNN RNN #120277

Closed

ptrblck mentioned this pull request Apr 7, 2024

Cudnn 9.2 is out! #119400

Closed

eqy mentioned this pull request Apr 8, 2024

[BE]: Update cudnn to 9.1.0.70 #123475

Closed

eqy mentioned this pull request Apr 23, 2024

Caffe2 usage of cuDNN RNNv6 API blocks upgrade to cuDNN v9+ #124790

Closed

	auto rnn_descriptor(const Tensor& tensor, const int batch_size, const int seq_len, const int vector_size) {
	auto rnn_descriptor(const Tensor& tensor, uint32_t batch_size, uint32_t seq_len, uint32_t vector_size) {

Conversation

eqy commented Dec 13, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115719

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eqy commented Dec 14, 2023

Uh oh!

albanD commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eqy commented Dec 14, 2023

Uh oh!

eqy commented Dec 14, 2023

Uh oh!

eqy commented Dec 15, 2023

Uh oh!

eqy commented Dec 15, 2023

Uh oh!

eqy commented Dec 19, 2023

Uh oh!

pytorchmergebot commented Dec 19, 2023

Uh oh!

pytorchmergebot commented Dec 19, 2023

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eqy commented Dec 27, 2023

Uh oh!

pytorchmergebot commented Dec 27, 2023

Uh oh!

pytorchmergebot commented Dec 27, 2023

Uh oh!

eqy commented Dec 27, 2023

Uh oh!

pytorchmergebot commented Dec 27, 2023

Uh oh!

eqy commented Dec 27, 2023

Uh oh!

pytorchmergebot commented Dec 27, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

eqy commented Dec 13, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 13, 2023 •

edited

Loading

albanD commented Dec 14, 2023 •

edited

Loading