[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels by mcarilli · Pull Request #50169 · pytorch/pytorch

mcarilli · 2021-01-06T23:01:49Z

Immediately-upstreamable part of #50148.

This PR fixes what I'm fairly sure is a subtle bug with custom Philox class usage in jitted kernels. Philox constructors in kernels take the cuda rng generator's current offset. The Philox constructor then carries out offset/4 (a uint64_t division) to compute its internal offset in its virtual Philox bitstream of 128-bit chunks. In other words, it assumes the incoming offset is a multiple of 4. But (in current code) that's not guaranteed. For example, the increments used by these eager kernels could easily make offset not divisible by 4.

I figured the easiest fix was to round all incoming increments up to the nearest multiple of 4 in CUDAGeneratorImpl itself.

Another option would be to round the current offset up to the next multiple of 4 at the jit point of use. But that would be a jit-specific offset jump, so jit rng kernels wouldn't have a prayer of being bitwise accurate with eager rng kernels that used non-multiple-of-4 offsets. Restricting the offset to multiples of 4 for everyone at least gives jit rng the chance to match eager rng. (Of course, there are still many other ways the numerics could diverge, like if a jit kernel launches a different number of threads than an eager kernel, or assigns threads to data elements differently.)

facebook-github-bot · 2021-01-06T23:01:58Z

💊 CI failures summary and remediations

As of commit 25cdf69 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 6 times.

codecov · 2021-01-07T02:32:25Z

Codecov Report

Merging #50169 (25cdf69) into master (eef5eb0) will increase coverage by 0.18%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #50169      +/-   ##
==========================================
+ Coverage   80.49%   80.68%   +0.18%     
==========================================
  Files        1900     1900              
  Lines      206254   206254              
==========================================
+ Hits       166018   166409     +391     
+ Misses      40236    39845     -391

ngimel · 2021-01-08T19:27:46Z

Please add the PR description as a note somewhere in the code.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-11T19:57:39Z

@ngimel merged this pull request in 271240a.

…kernels (pytorch#50169) Summary: Immediately-upstreamable part of pytorch#50148. This PR fixes what I'm fairly sure is a subtle bug with custom `Philox` class usage in jitted kernels. `Philox` [constructors in kernels](https://github.com/pytorch/pytorch/blob/30206b504ed5e786ad2792061ec5ebe4b9b6abe9/torch/csrc/jit/codegen/cuda/codegen.cpp#L102) take the cuda rng generator's current offset. The Philox constructor then carries out [`offset/4`](https://github.com/pytorch/pytorch/blob/677f0d6383cde8700c41a6ca8e69a6f1d9748b4e/torch/csrc/jit/codegen/cuda/runtime/random_numbers.cu#L13) (a uint64_t division) to compute its internal offset in its virtual Philox bitstream of 128-bit chunks. In other words, it assumes the incoming offset is a multiple of 4. But (in current code) that's not guaranteed. For example, the increments used by [these eager kernels](https://github.com/pytorch/pytorch/blob/677f0d6383cde8700c41a6ca8e69a6f1d9748b4e/aten/src/ATen/native/cuda/Distributions.cu#L171-L216) could easily make offset not divisible by 4. I figured the easiest fix was to round all incoming increments up to the nearest multiple of 4 in CUDAGeneratorImpl itself. Another option would be to round the current offset up to the next multiple of 4 at the jit point of use. But that would be a jit-specific offset jump, so jit rng kernels wouldn't have a prayer of being bitwise accurate with eager rng kernels that used non-multiple-of-4 offsets. Restricting the offset to multiples of 4 for everyone at least gives jit rng the chance to match eager rng. (Of course, there are still many other ways the numerics could diverge, like if a jit kernel launches a different number of threads than an eager kernel, or assigns threads to data elements differently.) Pull Request resolved: pytorch#50169 Reviewed By: mruberry Differential Revision: D25857934 Pulled By: ngimel fbshipit-source-id: 43a75e2d0c8565651b0f12a5694c744fd86ece99

checked out CUDAGeneratorImpl diff

b48fcd5

facebook-github-bot added the cla signed label Jan 6, 2021

mcarilli requested a review from ngimel January 6, 2021 23:02

pytorchbot added the open source label Jan 6, 2021

H-Huang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 7, 2021

ngimel approved these changes Jan 8, 2021

View reviewed changes

Michael Carilli added 2 commits January 8, 2021 15:55

comments to explain

df962b2

Rephrase explanation

25cdf69

ngimel approved these changes Jan 9, 2021

View reviewed changes

facebook-github-bot reviewed Jan 9, 2021

View reviewed changes

facebook-github-bot closed this in 271240a Jan 11, 2021

facebook-github-bot added the Merged label Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels#50169

[JIT] Ensure offset is a multiple of 4 to fix "Philox" RNG in jitted kernels#50169
mcarilli wants to merge 3 commits intopytorch:masterfrom
mcarilli:rng_increment_fix_for_jit_philox

mcarilli commented Jan 6, 2021

Uh oh!

facebook-github-bot commented Jan 6, 2021 •

edited

Loading

Uh oh!

codecov Bot commented Jan 7, 2021 •

edited

Loading

Uh oh!

ngimel commented Jan 8, 2021

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mcarilli commented Jan 6, 2021

Uh oh!

facebook-github-bot commented Jan 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

codecov Bot commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ngimel commented Jan 8, 2021

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

facebook-github-bot commented Jan 6, 2021 •

edited

Loading

codecov Bot commented Jan 7, 2021 •

edited

Loading