Use random64 in Fischer-Yates algorithm for large N by ngimel · Pull Request #143682 · pytorch/pytorch

ngimel · 2024-12-20T22:03:52Z

Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html

pytorch-bot · 2024-12-20T22:03:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143682

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 419b9ca with merge base b5cf8e2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy

is it worthwhile to add a test on what the distribution looks like in the blog post?

TimZaman · 2024-12-21T06:30:31Z

pretty fast turnaround natalia 😎

ngimel · 2024-12-21T07:10:52Z

is it worthwhile to add a test on what the distribution looks like in the blog post?
It takes 4 minutes and requires a lot of cpu memory

albanD

Not sure if we want to go that far, but for better randomness, Numpy uses repeated sampling rather than modulo: https://github.com/numpy/numpy/blob/9aa5cda4c502487fc69717507f6a9936b420365f/numpy/random/src/distributions/distributions.c#L1091-L1098

From measuring the runtime on cpu based on the blogpost, it doesn't make a noticeable difference. Maybe we should also consider doing the same to avoid border effects for numbers close to the int32/64 limits?

ngimel · 2024-12-23T21:17:13Z

Discussed offline with @albanD, with our random generator resampling adds too much overhead, instead we switched to no-init version of Fischer-Yates and always use random64. For 700 million input tensor that gives approx the same perf as previously.

ngimel · 2024-12-24T07:24:55Z

@pytorchbot merge

pytorchmergebot · 2024-12-24T07:26:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-24T08:46:21Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

ngimel · 2024-12-24T19:10:46Z

@pytorchbot merge

pytorchmergebot · 2024-12-24T19:13:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-24T20:16:37Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.13-clang10 / test (crossref, 1, 2, lf.linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

ngimel · 2024-12-24T20:26:13Z

@pytorchbot merge

pytorchmergebot · 2024-12-24T20:28:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-24T21:31:55Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.4-py3.10-gcc9-sm89 / test (default, 4, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

pytorchmergebot · 2024-12-27T09:09:36Z

@ngimel your PR has been successfully reverted.

This reverts commit 7013be0. Reverted #143682 on behalf of https://github.com/wdvr due to failing Meta internal tests that need to be updated ([comment](#143682 (comment)))

albanD

Still ok, even though we shouldn't really hardcode random values in tests...

albanD · 2024-12-27T11:34:35Z

Should we add back the conditional on "n" to decide if we sample 32 or 64 bits? Maybe that will also make the CI handling easier as it wouldn't change the randomness for very small inputs.

ngimel · 2024-12-30T20:12:30Z

I'd still like to change the algo to no-init, and it will produce different permutations. We'll try to update ref values

facebook-github-bot · 2025-01-06T18:10:45Z

@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-01-06T21:43:03Z

@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ngimel · 2025-01-07T01:18:33Z

@pytorchbot merge

pytorchmergebot · 2025-01-07T01:20:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes #ISSUE_NUMBER Similar to #143682, for large maximum values we were sampling integers via % and it doesn't provide uniform distribution. Here we limit the max skew to approx 1% (random32 is used for max values `<= 2**32 / 128`) This comes with significant perf penalty, especially for cuda, but it's a pretty bad bug, so we'll have to figure out what can be done to improve it. `torch.compile` has always been producing correct results for this, and it's performance is also significantly better than current eager (eager is ~660 GB/s on H100, torch.compile 1200 GB/s), so we have to figure out why torch.compile is better. `__launch_bounds__` slightly regress perf, so perhaps we can figure out how to specify them better, but it's only 20-30 GB/s, so the big difference is still unexplained. Pull Request resolved: #143787 Approved by: https://github.com/eqy

…#143875)" This reverts commit b1a10ec.

#144730) Revert "Use random64 in Fischer-Yates algorithm for large N (#143682) (#143875)" This reverts commit b1a10ec.

kit1980 · 2025-01-14T01:46:42Z

@pytorchbot cherry-pick --onto release/2.6 -c critical

pytorchbot · 2025-01-14T01:50:41Z

Cherry picking #143682

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 2e42be0595481fb448eaa41ea8bc281f1afff5c2 returned non-zero exit code 1

Auto-merging aten/src/ATen/native/TensorFactories.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/native/TensorFactories.cpp
Auto-merging test/test_sparse_csr.py
Auto-merging test/test_tensor_creation_ops.py
error: could not apply 2e42be05954... Use random64 in Fischer-Yates algorithm for large N (#143682)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html Pull Request resolved: #143682 Approved by: https://github.com/eqy, https://github.com/albanD, https://github.com/malfet

kit1980 · 2025-01-14T02:06:07Z

Manually cherry-picked #142814 (comment)

Fixes bug in randperm https://nbsanity.com/static/a4774194938414dedcec7d6e99727d31/Shuffling_20in_20torch_20vs_20numpy-public.html Pull Request resolved: #143682 Approved by: https://github.com/eqy, https://github.com/albanD, https://github.com/malfet Co-authored-by: Natalia Gimelshein <ngimel@meta.com>

Use random64 in Fischer-Yates algorithm for large N

fbbdd88

ngimel added the release notes: cpp release notes category label Dec 20, 2024

eqy approved these changes Dec 21, 2024

View reviewed changes

albanD approved these changes Dec 22, 2024

View reviewed changes

switch to no-init shuffle version, add test

437a45d

ngimel added 2 commits December 23, 2024 18:10

fix np test

ac8e38c

fix hardcoded datasets test

b0dcdf3

ngimel mentioned this pull request Dec 24, 2024

fix randint distribution for large max #143787

Closed

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 24, 2024

pytorchmergebot added the merging label Dec 24, 2024

pytorchmergebot removed the merging label Dec 24, 2024

Merge branch 'main' into randperm_fix

0ff7872

pytorchmergebot added the merging label Dec 24, 2024

pytorchmergebot removed the merging label Dec 24, 2024

relax bfloat16 tolerance

03acc9f

pytorchmergebot added the merging label Dec 24, 2024

pytorchmergebot removed the merging label Dec 24, 2024

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Dec 27, 2024

pytorchmergebot reopened this Dec 27, 2024

albanD reviewed Dec 27, 2024

View reviewed changes

albanD approved these changes Dec 27, 2024

View reviewed changes

ngimel added 2 commits January 6, 2025 12:55

Merge branch 'main' into randperm_fix

f5f8eba

preserve old behavior for small values

419b9ca

malfet approved these changes Jan 7, 2025

View reviewed changes

pytorchmergebot added the merging label Jan 7, 2025

pytorchmergebot closed this in 2e42be0 Jan 7, 2025

pytorchmergebot removed the merging label Jan 7, 2025

kit1980 added a commit that referenced this pull request Jan 14, 2025

Revert "Use random64 in Fischer-Yates algorithm for large N (#143682) (…

afa9047

…#143875)" This reverts commit b1a10ec.

malfet pushed a commit that referenced this pull request Jan 14, 2025

Revert "Use random64 in Fischer-Yates algorithm for large N (#143682)… (

e2067a6

#144730) Revert "Use random64 in Fischer-Yates algorithm for large N (#143682) (#143875)" This reverts commit b1a10ec.

kit1980 mentioned this pull request Jan 14, 2025

Use random64 in Fischer-Yates algorithm for large N (#143682) #144735

Merged

github-actions Bot deleted the ngimel/randperm_fix branch February 14, 2025 02:04

Conversation

ngimel commented Dec 20, 2024

Uh oh!

pytorch-bot Bot commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143682

✅ No Failures

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

TimZaman commented Dec 21, 2024

Uh oh!

ngimel commented Dec 21, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Dec 23, 2024

Uh oh!

ngimel commented Dec 24, 2024

Uh oh!

pytorchmergebot commented Dec 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Dec 24, 2024

Merge failed

Uh oh!

ngimel commented Dec 24, 2024

Uh oh!

pytorchmergebot commented Dec 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Dec 24, 2024

Merge failed

Uh oh!

ngimel commented Dec 24, 2024

Uh oh!

pytorchmergebot commented Dec 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Dec 24, 2024

Merge failed

Uh oh!

pytorchmergebot commented Dec 27, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented Dec 27, 2024

Uh oh!

ngimel commented Dec 30, 2024

Uh oh!

facebook-github-bot commented Jan 6, 2025

Uh oh!

facebook-github-bot commented Jan 6, 2025

Uh oh!

ngimel commented Jan 7, 2025

Uh oh!

pytorchmergebot commented Jan 7, 2025

Merge started

Uh oh!

kit1980 commented Jan 14, 2025

Uh oh!

pytorchbot commented Jan 14, 2025

Cherry picking #143682

Uh oh!

kit1980 commented Jan 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

pytorch-bot Bot commented Dec 20, 2024 •

edited

Loading