Skip to content

[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit#77002

Closed
eqy wants to merge 7 commits intopytorch:masterfrom
eqy:cudnnv8_benchmark_limit
Closed

[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit#77002
eqy wants to merge 7 commits intopytorch:masterfrom
eqy:cudnnv8_benchmark_limit

Conversation

@eqy
Copy link
Collaborator

@eqy eqy commented May 6, 2022

(reopening due to botched merge)
The cuDNN V8 API (main support merged in #60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic.

CC @ptrblck @ngimel @xwang233

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 6, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 753ae22 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@soulitzer soulitzer requested a review from ngimel May 9, 2022 04:43
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 9, 2022
@eqy eqy force-pushed the cudnnv8_benchmark_limit branch from ec25620 to 5cb2a63 Compare May 9, 2022 18:11
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, left small comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many cudnn specific params do we anticipate? Did the time come to have a special cudnn config in the global context, instead of querying benchmark/deterministic/tf32/benchmarklimit one-by-one? (No action required on this PR, but something to think about for the future)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you warn unconditionally here? This can only be a result of user calling set_benchmark_limit, so whatever value is passed, ROCM should warn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, unconditional warning?

@ngimel
Copy link
Collaborator

ngimel commented May 11, 2022

Resolve conflicts please?

@eqy eqy force-pushed the cudnnv8_benchmark_limit branch 2 times, most recently from 86ee006 to 5cc58ae Compare May 11, 2022 22:09
@ngimel
Copy link
Collaborator

ngimel commented May 17, 2022

Have you checked the CI failures @eqy? I can't now, because github is showing unicorns.

@eqy
Copy link
Collaborator Author

eqy commented May 17, 2022

Have you checked the CI failures @eqy? I can't now, because github is showing unicorns.

Looked spurious (e.g., jit tests) but I will rebase to check (also couldn't identify what was the cause of the docs failure).

@eqy eqy force-pushed the cudnnv8_benchmark_limit branch from 5cc58ae to 185061e Compare May 17, 2022 22:32
@eqy
Copy link
Collaborator Author

eqy commented May 18, 2022

mea culpa, looks like some merge garbage was left in the docs by accident, hopefully fixed now

@ngimel
Copy link
Collaborator

ngimel commented May 21, 2022

@pytorchbot merge this

@pytorchmergebot
Copy link
Collaborator

Merge failed due to Matched rule superuser, but PR has not been reviewed yet
Raised by https://github.com/pytorch/pytorch/actions/runs/2361317500

@ngimel
Copy link
Collaborator

ngimel commented May 24, 2022

@pytorchbot merge this

@github-actions
Copy link
Contributor

Hey @eqy.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

pytorchmergebot added a commit that referenced this pull request May 24, 2022
…under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002)"

This reverts commit c274f2a.

Reverted #77002 on behalf of https://github.com/malfet due to please, as it breaks internal CI, but also no CUDA heads should be included from `torch/csrc/Module.cpp`, but rather should be implemented/registered in `torch/csrc/cuda/Module.cpp`
dreiss added a commit that referenced this pull request May 24, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 24, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 24, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 24, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

ghstack-source-id: a1ca617
Pull Request resolved: #78221
dreiss added a commit that referenced this pull request May 24, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 24, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

ghstack-source-id: c8241b4
Pull Request resolved: #78221
swang392 pushed a commit that referenced this pull request May 25, 2022
…rch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002)

(reopening due to botched merge)
The cuDNN V8 API (main support merged in #60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic.

CC @ptrblck @ngimel @xwang233
Pull Request resolved: #77002
Approved by: https://github.com/ngimel
dreiss added a commit that referenced this pull request May 25, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 25, 2022
…h Bazel in CI"

This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 25, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 25, 2022
…h Bazel in CI"

This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 25, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

[ghstack-poisoned]
dreiss added a commit that referenced this pull request May 25, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

ghstack-source-id: 8164cfc
Pull Request resolved: #78221
pytorchmergebot pushed a commit that referenced this pull request May 25, 2022
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

Pull Request resolved: #78221

Approved by: https://github.com/malfet
facebook-github-bot pushed a commit that referenced this pull request May 26, 2022
Summary:
This can potentially catch issues like #77002 before they hit
Meta-internal CI, because Bazel is more strict about header visibility
than (our current setup of) CMake.

Pull Request resolved: #78221

Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/8449ac770c29a52257a3f2842c1dfd06d2c2a497

Reviewed By: mehtanirav

Differential Revision: D36707271

Pulled By: dreiss

fbshipit-source-id: d9c3d17e9227b7660a0819e9894767035d10863f
pytorchmergebot pushed a commit that referenced this pull request May 26, 2022
This should prevent failures like #77002 from sneaking in as CUDAConfig.h would no longer be available for cpu builds.
Note from 2018 about MIOpen builds do no seems relevant, though CUDAConfig.h is still needed by ROCm (tested in https://github.com/pytorch/pytorch/runs/6613660811?check_suite_focus=true build)
Pull Request resolved: #78218
Approved by: https://github.com/seemethere, https://github.com/atalman
facebook-github-bot pushed a commit that referenced this pull request May 31, 2022
Summary:
This should prevent failures like #77002 from sneaking in as CUDAConfig.h would no longer be available for cpu builds.
Note from 2018 about MIOpen builds do no seems relevant, though CUDAConfig.h is still needed by ROCm (tested in https://github.com/pytorch/pytorch/runs/6613660811?check_suite_focus=true build)

Pull Request resolved: #78218
Approved by: https://github.com/seemethere, https://github.com/atalman

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/dfd78bf4ab6ce42a85458553724556995fa0b4a4

Reviewed By: seemethere

Differential Revision: D36783414

Pulled By: malfet

fbshipit-source-id: 7277a395772c4dbdc6fc2f55a6e9b594f724b955
pytorchmergebot pushed a commit that referenced this pull request Jul 7, 2022
…torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#78299)

Reopen of #77002 to address comments by @malfet

CC @ngimel @ptrblck
Pull Request resolved: #78299
Approved by: https://github.com/ngimel
facebook-github-bot pushed a commit that referenced this pull request Jul 8, 2022
…torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#78299) (#78299)

Summary:
Reopen of #77002 to address comments by malfet

CC ngimel ptrblck

Pull Request resolved: #78299
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/ae6dd20ba7725ea7c10d759f82891d86eb724c11

Reviewed By: mehtanirav, cpuhrsch

Differential Revision: D37690491

fbshipit-source-id: 6c603a706663dcc7086755a5d4b6e382233dad7b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged open source Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants