Manually call lazyInitCUDA in structured CUDA calls by ezyang · Pull Request #61882 · pytorch/pytorch

ezyang · 2021-07-20T02:58:07Z

Stack from ghstack:

Manually call lazyInitCUDA in structured CUDA calls #61882 Manually call lazyInitCUDA in structured CUDA calls

If you directly call the native implementation that bypasses the
initialization, which is bad! This probably slows things down a little
though...

Fixes problem uncovered by #61642

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D29783856

If you directly call the native implementation that bypasses the initialization, which is bad! This probably slows things down a little though... Fixes problem uncovered by #61642 Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

facebook-github-bot · 2021-07-20T02:58:14Z

💊 CI failures summary and remediations

As of commit 09a3f41 (more details on the Dr. CI page and at hud.pytorch.org/pr/61882):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

If you directly call the native implementation that bypasses the initialization, which is bad! This probably slows things down a little though... Fixes problem uncovered by #61642 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: f61845e Pull Request resolved: #61882

ezyang · 2021-07-20T02:59:31Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2021-07-20T17:03:15Z

Do you know by how much it slows things down?

ezyang · 2021-07-20T17:35:39Z

I haven't run the performance. It's no worse than calling the fully dispatched at::empty factory function, though, since that already calls it in the wrapper (as in, structured kernels had some ill gotten gains that we're being more realistic about.)

However, your comment got me thinking about whether or not initializing CUDA in structured functions really is necessary. After all, if you are given a CUDA tensor, the invariant ought to be that CUDA is already initialized. Indeed #61642 only tickles the problem through a very particular case:

TEST(CUDACaffe2ToPytorch, Op) {
  if (!at::cuda::is_available()) return;
  caffe2::Tensor c2_tensor =
      caffe2::empty({3, 3}, at::dtype<int64_t>().device(caffe2::CUDA));
  auto data = c2_tensor.mutable_data<int64_t>();
  {
    caffe2::CUDAContext context;
    caffe2::math::Set<int64_t>(9, 111, data, &context);
  }
  at::Tensor at_tensor(c2_tensor);
  ASSERT_TRUE(at_tensor.is_cuda());

  ASSERT_EQ(at::sum(at_tensor).item<int64_t>(), 999);
}

what I bet is happening is that at::Tensor at_tensor(c2_tensor); is how this test bypasses lazy initialization entirely. So maybe if we just add it there that will suffice...

ezyang · 2021-07-20T17:44:05Z

Drat! I can't easily force the initialization in the constructor because it's in ATen/core and the context is in ATen

bhosmer

Makes sense. Is it worth checking the slowdown before landing, or do we not really have a choice if we want to fix it?

ezyang · 2021-07-22T13:45:54Z

Is it worth checking the slowdown before landing, or do we not really have a choice if we want to fix it?

I think the correct terminal state is to not initialize here, and fix the caffe2-to-aten constructor. I'm going to go ahead and land this for now to unblock the other PRs though.

facebook-github-bot · 2021-07-22T14:51:41Z

@ezyang merged this pull request in f3f7e92.

facebook-github-bot added the cla signed label Jul 20, 2021

ezyang requested a review from bdhirsh July 20, 2021 02:59

ezyang requested a review from ngimel July 20, 2021 17:33

bhosmer approved these changes Jul 21, 2021

View reviewed changes

facebook-github-bot closed this in f3f7e92 Jul 22, 2021

facebook-github-bot added the Merged label Jul 22, 2021

facebook-github-bot deleted the gh/ezyang/1048/head branch July 26, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually call lazyInitCUDA in structured CUDA calls#61882

Manually call lazyInitCUDA in structured CUDA calls#61882
ezyang wants to merge 1 commit intogh/ezyang/1048/basefrom
gh/ezyang/1048/head

ezyang commented Jul 20, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 20, 2021 •

edited

Loading

Uh oh!

ezyang commented Jul 20, 2021

Uh oh!

ngimel commented Jul 20, 2021

Uh oh!

ezyang commented Jul 20, 2021

Uh oh!

ezyang commented Jul 20, 2021

Uh oh!

bhosmer left a comment

Uh oh!

ezyang commented Jul 22, 2021

Uh oh!

facebook-github-bot commented Jul 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ezyang commented Jul 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

ezyang commented Jul 20, 2021

Uh oh!

ngimel commented Jul 20, 2021

Uh oh!

ezyang commented Jul 20, 2021

Uh oh!

ezyang commented Jul 20, 2021

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jul 22, 2021

Uh oh!

facebook-github-bot commented Jul 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented Jul 20, 2021 •

edited

Loading

facebook-github-bot commented Jul 20, 2021 •

edited

Loading