Skip to content

Manually call lazyInitCUDA in structured CUDA calls#61882

Closed
ezyang wants to merge 1 commit intogh/ezyang/1048/basefrom
gh/ezyang/1048/head
Closed

Manually call lazyInitCUDA in structured CUDA calls#61882
ezyang wants to merge 1 commit intogh/ezyang/1048/basefrom
gh/ezyang/1048/head

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jul 20, 2021

Stack from ghstack:

If you directly call the native implementation that bypasses the
initialization, which is bad! This probably slows things down a little
though...

Fixes problem uncovered by #61642

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D29783856

If you directly call the native implementation that bypasses the
initialization, which is bad!  This probably slows things down a little
though...

Fixes problem uncovered by #61642

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 20, 2021

💊 CI failures summary and remediations

As of commit 09a3f41 (more details on the Dr. CI page and at hud.pytorch.org/pr/61882):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ezyang added a commit that referenced this pull request Jul 20, 2021
If you directly call the native implementation that bypasses the
initialization, which is bad!  This probably slows things down a little
though...

Fixes problem uncovered by #61642

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: f61845e
Pull Request resolved: #61882
@ezyang ezyang requested a review from bdhirsh July 20, 2021 02:59
@ezyang
Copy link
Contributor Author

ezyang commented Jul 20, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel
Copy link
Collaborator

ngimel commented Jul 20, 2021

Do you know by how much it slows things down?

@ezyang ezyang requested a review from ngimel July 20, 2021 17:33
@ezyang
Copy link
Contributor Author

ezyang commented Jul 20, 2021

I haven't run the performance. It's no worse than calling the fully dispatched at::empty factory function, though, since that already calls it in the wrapper (as in, structured kernels had some ill gotten gains that we're being more realistic about.)

However, your comment got me thinking about whether or not initializing CUDA in structured functions really is necessary. After all, if you are given a CUDA tensor, the invariant ought to be that CUDA is already initialized. Indeed #61642 only tickles the problem through a very particular case:

TEST(CUDACaffe2ToPytorch, Op) {
  if (!at::cuda::is_available()) return;
  caffe2::Tensor c2_tensor =
      caffe2::empty({3, 3}, at::dtype<int64_t>().device(caffe2::CUDA));
  auto data = c2_tensor.mutable_data<int64_t>();
  {
    caffe2::CUDAContext context;
    caffe2::math::Set<int64_t>(9, 111, data, &context);
  }
  at::Tensor at_tensor(c2_tensor);
  ASSERT_TRUE(at_tensor.is_cuda());

  ASSERT_EQ(at::sum(at_tensor).item<int64_t>(), 999);
}

what I bet is happening is that at::Tensor at_tensor(c2_tensor); is how this test bypasses lazy initialization entirely. So maybe if we just add it there that will suffice...

@ezyang
Copy link
Contributor Author

ezyang commented Jul 20, 2021

Drat! I can't easily force the initialization in the constructor because it's in ATen/core and the context is in ATen

Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Is it worth checking the slowdown before landing, or do we not really have a choice if we want to fix it?

@ezyang
Copy link
Contributor Author

ezyang commented Jul 22, 2021

Is it worth checking the slowdown before landing, or do we not really have a choice if we want to fix it?

I think the correct terminal state is to not initialize here, and fix the caffe2-to-aten constructor. I'm going to go ahead and land this for now to unblock the other PRs though.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in f3f7e92.

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1048/head branch July 26, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants