Simplify device guard code generation.#47765
Simplify device guard code generation.#47765ezyang wants to merge 1 commit intogh/ezyang/864/basefrom
Conversation
There are no semantic changes, just trying to reduce the condition nest. This is not byte-for-byte compatible as I removed some whitespace for clarity. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 0966628 (more details on the Dr. CI page):
🚧 1 ongoing upstream failure:These were probably caused by upstream breakages that are not fixed yet:
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 2 times. |
bhosmer
left a comment
There was a problem hiding this comment.
LG, one nontrivial suggestion inline
| cuda_guard = "" | ||
| # TODO: It's not entirely clear that generic dispatch keys should | ||
| # get device guard, as the internal kernels they call ought | ||
| # to also device guard |
There was a problem hiding this comment.
#47778 ...unless there's one already 😁
| """ | ||
| # Initialization happens on construction of CUDA tensors, so | ||
| # that all methods/functions that operates on CUDA tensors can | ||
| # assume initialization has already occurred |
There was a problem hiding this comment.
Might clarify the comment if it were made explicit that the code below it is going to be in a constructing kernel - otherwise it looks like it's doing the opposite of what the comment is saying. (Edit: but see my comment later about restructuring the if statement for a broader suggestion)
|
|
||
| if f.device_guard and has_tensor_options: | ||
| if local.use_c10_dispatcher() is UseC10Dispatcher.full: | ||
| cuda_guard = "const DeviceGuard device_guard(device_or_default(device));" |
There was a problem hiding this comment.
Now that the code is cleaner cuda_guard seems like a weird name for this, how about just device_guard?
| cuda_guard = """\ | ||
| // DeviceGuard omitted | ||
| """ | ||
| cuda_guard = "// DeviceGuard omitted" |
There was a problem hiding this comment.
High-level comment on how this logic is structured: it might be clarifying to use has_tensor_options as the outer partition - I think it cleans up dependencies and better aligns with the semantics of the wrapper we're generating (note: the snippet is big bc it also tucks self_args, candidate_args and device_of in the else where they belong):
device_guard = "// DeviceGuard omitted" # default
if has_tensor_options:
# kernel is creating a tensor
if is_cuda_dispatch_key(self.dispatch_key):
# initialize CUDA on construction of CUDA tensors
init_cuda = "globalContext().lazyInitCUDA();"
if f.device_guard:
if local.use_c10_dispatcher() is UseC10Dispatcher.full:
device_guard = "const DeviceGuard device_guard(device_or_default(device));"
else:
device_guard = "const DeviceGuard device_guard(options.device());"
else:
# kernel is operating on existing tensors
if f.device_guard:
self_args = (a for a in f.func.arguments if a.name == "self")
# There is precedence for which argument we use to do
# device guard. This describes the precedence order.
candidate_args = itertools.chain(self_args, f.func.out_arguments, f.func.arguments)
# Only tensor like arguments are eligible
device_of = next((f'{a.name}' for a in candidate_args if a.type.is_tensor_like()), None)
if device_of is not None:
device_guard = f"const OptionalDeviceGuard device_guard(device_of({device_of}));"| cuda_guard = "const DeviceGuard device_guard(device_or_default(device));" | ||
| else: | ||
| assert local.use_c10_dispatcher() in [UseC10Dispatcher.with_codegenerated_unboxing_wrapper, | ||
| UseC10Dispatcher.hacky_wrapper_for_legacy_signatures] |
There was a problem hiding this comment.
I know it's not new code, but is this assert necessary? It's a 3-place enum, right?
There was a problem hiding this comment.
We litigated this in #45742 (see #45742 (comment)) and smessmer did add the helper functions, but it looks like a bunch of direct tests against the individual enums crept back in (and I approved them, oops!)
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Based on #47765 Differential Revision: [D27487085](https://our.internmc.facebook.com/intern/diff/D27487085/) [ghstack-poisoned]
Based on #47765 Differential Revision: [D27487085](https://our.internmc.facebook.com/intern/diff/D27487085/) [ghstack-poisoned]
Pull Request resolved: #55112 Based on #47765 ghstack-source-id: 125446022 Differential Revision: [D27487085](https://our.internmc.facebook.com/intern/diff/D27487085/)
Summary: Pull Request resolved: pytorch#55112 Based on pytorch#47765 ghstack-source-id: 126114775 Test Plan: buck build //caffe2/aten/... Reviewed By: ezyang Differential Revision: D27487085 fbshipit-source-id: 157fcd19f538ce0c1e053e3e974b48bdb93a0226
Stack from ghstack:
There are no semantic changes, just trying to reduce the condition nest.
This is not byte-for-byte compatible as I removed some whitespace
for clarity.
Signed-off-by: Edward Z. Yang ezyang@fb.com