[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit by eqy · Pull Request #77002 · pytorch/pytorch

eqy · 2022-05-06T22:58:48Z

(reopening due to botched merge)
The cuDNN V8 API (main support merged in #60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic.

CC @ptrblck @ngimel @xwang233

facebook-github-bot · 2022-05-06T22:58:54Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/77002
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 753ae22 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ngimel

This looks good, left small comments.

ngimel · 2022-05-10T21:48:27Z

aten/src/ATen/native/cudnn/Conv_v8.cpp

how many cudnn specific params do we anticipate? Did the time come to have a special cudnn config in the global context, instead of querying benchmark/deterministic/tf32/benchmarklimit one-by-one? (No action required on this PR, but something to think about for the future)

ngimel · 2022-05-10T21:51:23Z

torch/csrc/Module.cpp

should you warn unconditionally here? This can only be a result of user calling set_benchmark_limit, so whatever value is passed, ROCM should warn.

ngimel · 2022-05-10T21:51:43Z

torch/csrc/Module.cpp

again, unconditional warning?

ngimel · 2022-05-11T20:47:27Z

Resolve conflicts please?

ngimel · 2022-05-17T21:52:20Z

Have you checked the CI failures @eqy? I can't now, because github is showing unicorns.

eqy · 2022-05-17T21:57:17Z

Have you checked the CI failures @eqy? I can't now, because github is showing unicorns.

Looked spurious (e.g., jit tests) but I will rebase to check (also couldn't identify what was the cause of the docs failure).

eqy · 2022-05-18T01:38:28Z

mea culpa, looks like some merge garbage was left in the docs by accident, hopefully fixed now

ngimel · 2022-05-21T00:21:36Z

@pytorchbot merge this

pytorchmergebot · 2022-05-21T00:24:55Z

Merge failed due to Matched rule superuser, but PR has not been reviewed yet
Raised by https://github.com/pytorch/pytorch/actions/runs/2361317500

ngimel · 2022-05-24T00:07:56Z

@pytorchbot merge this

github-actions · 2022-05-24T00:12:21Z

Hey @eqy.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002)" This reverts commit c274f2a. Reverted #77002 on behalf of https://github.com/malfet due to please, as it breaks internal CI, but also no CUDA heads should be included from `torch/csrc/Module.cpp`, but rather should be implemented/registered in `torch/csrc/cuda/Module.cpp`