fix multinomial kernels to properly advance random states by ngimel · Pull Request #38046 · pytorch/pytorch

ngimel · 2020-05-07T20:30:27Z

Before, multinomial kernels did not advance random states enough, which lead to the same sequence being generated over and over with a shift of 4. This PR fixes that.
Fixes #37403

gchanan · 2020-05-07T20:31:27Z

is it feasible to write a test?

ezyang · 2020-05-07T20:31:29Z

Is there any reasonable way to test this?

dr-ci · 2020-05-07T22:26:31Z

💊 CI failures summary and remediations

As of commit 350b490 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/py3.6-clang7-rocmdeb-ubuntu16.04

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 17 times.

ezyang · 2020-05-08T03:20:45Z

I can never remember how to determine the index formula XD

It's like multi-d tensor indexing XD

ezyang · 2020-05-11T02:59:57Z

I attempted a review but apparently my CUDA has atrophied significantly. One thing I couldn't figure out was why it was necessary to switch from 2D-1D setup to 1D-2D.

ngimel · 2020-05-11T05:22:48Z

It was not directly necessary to fix this bug, I did it to improve efficiency.
Before:
1D grid, each block was responsible for num_distributions/max_blocks distributions, doing one distribution at a time in a loop. If the number of distributions is small, we don't have enough blocks to fill the device.
2D block, each block is (32,4) threads, but only 4 threads (with threadIdx.x=0) in the block actually did something, and were responsible for generating all the samples for the given distribution. So if we are talking about generating 100000 samples that's 25000 iterations in the loop while other threads are idling and we potentially don't even have enough blocks to fill the device.
We have 128*min(num_distributions, max_blocks) threads
Now:
2D grid, y-dimension of the grid is responsible for num_distributions/max_y_blocks distributions, doing one distribution at a time in a loop.
x-dimension of the grid and x dimension of the block are responsible for generating samples. Generating samples with replacement does not depend on the previous results so there is no point in serializing anything. Roughly speaking, we can launch num_distributions * num_samples threads (subject to some limits, exact formulas omitted for clarity). This allows us to get reasonable device utilization as long as number of samples is not tiny (say more than 128) and num_distributions * n_samples is large enough to fill the device (whereas previously we depended on num_distributions alone being large enough, and used only a quarter of the threads).

ezyang · 2020-05-11T20:54:52Z

~~Wow, this is so obviously wrong it isn't even funny.~~ Oh we used to only use one random

ezyang · 2020-05-11T21:13:38Z

So what, this is grid.x * ((numDist-1)/grid.x+1)*4... so is this just a really longwinded way of saying numDist * 4? Or maybe with some extra slop at the end?

Oh you are right, I messed this up, changed the kernel but did not change this. Fill fix now.

I fixed it and added a comment that hopefully makes things clearer (it's also simpler than it used to be, because each thread is using just 1 random in most cases)

ezyang

I won't claim to understand all the subtleties of the indexing arithmetic here, but I'm going to approve to move things along here. I will admit that I spent half an hour trying to puzzle out the indexing computations, and could not figure it out. If this were a paper, I'd ask for some argument of correctness for why the new offset calculation is correct (whereas the old is not). But maybe this is not worth the effort. I left some comments on bits that were puzzling me below.

ezyang · 2020-05-11T21:16:53Z

You probably didn't want this print here

ezyang · 2020-05-11T21:17:09Z

You modified the logic for replacement=False. Maybe that should be tested too?

Changes to replacement=False were very superficial, but I'll add the test.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-05-13T06:13:41Z

@ngimel merged this pull request in 3d96808.

) Summary: Before, multinomial kernels did not advance random states enough, which lead to the same sequence being generated over and over with a shift of 4. This PR fixes that. Fixes pytorch#37403 Pull Request resolved: pytorch#38046 Differential Revision: D21516542 Pulled By: ngimel fbshipit-source-id: 23248a8c3a5c44316c4c35cd71a8c3b5f76c90f2

Summary: Before, multinomial kernels did not advance random states enough, which lead to the same sequence being generated over and over with a shift of 4. This PR fixes that. Fixes #37403 Pull Request resolved: #38046 Differential Revision: D21516542 Pulled By: ngimel fbshipit-source-id: 23248a8c3a5c44316c4c35cd71a8c3b5f76c90f2

) Summary: Before, multinomial kernels did not advance random states enough, which lead to the same sequence being generated over and over with a shift of 4. This PR fixes that. Fixes pytorch#37403 Pull Request resolved: pytorch#38046 Differential Revision: D21516542 Pulled By: ngimel fbshipit-source-id: 23248a8c3a5c44316c4c35cd71a8c3b5f76c90f2

ezyang reviewed May 8, 2020

View reviewed changes

ezyang mentioned this pull request May 11, 2020

torch.multinomial performance issue when replacement=False #11931

Closed

ezyang reviewed May 11, 2020

View reviewed changes

ezyang approved these changes May 11, 2020

View reviewed changes

ezyang reviewed May 11, 2020

View reviewed changes

facebook-github-bot reviewed May 12, 2020

View reviewed changes

Natalia Gimelshein added 5 commits May 12, 2020 20:41

fix multinomial kernels to properly advance random states

e0dfe36

add test, launch multiple blocks for with-replacement case:

23838fa

lint

2b6f365

fix random state advance for a restructured kernel

4eb8011

add test for replacement=False

350b490

ngimel force-pushed the multinomial branch from 7c7c74e to 350b490 Compare May 13, 2020 03:42

facebook-github-bot reviewed May 13, 2020

View reviewed changes

facebook-github-bot closed this in 3d96808 May 13, 2020

facebook-github-bot added the merged label May 13, 2020

ngimel added this to the 1.5.1 milestone May 15, 2020

gchanan mentioned this pull request May 28, 2020

[v1.5.1] Release Tracker #39104

Closed

gchanan mentioned this pull request May 28, 2020

[v1.5.1] fix multinomial kernels to properly advance random states (#38046) #39147

Merged

mruberry added the Merged label Oct 28, 2020

Conversation

ngimel commented May 7, 2020

Uh oh!

gchanan commented May 7, 2020

Uh oh!

ezyang commented May 7, 2020

Uh oh!

dr-ci Bot commented May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented May 11, 2020

Uh oh!

ngimel commented May 11, 2020

Uh oh!

ezyang May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dr-ci Bot commented May 7, 2020 •

edited

Loading

ezyang May 11, 2020 •

edited

Loading