Skip to content

Use random64 in Fischer-Yates algorithm for large N#143682

Closed
ngimel wants to merge 9 commits into
mainfrom
ngimel/randperm_fix
Closed

Use random64 in Fischer-Yates algorithm for large N#143682
ngimel wants to merge 9 commits into
mainfrom
ngimel/randperm_fix

Conversation

@ngimel

@ngimel ngimel commented Dec 20, 2024

Copy link
Copy Markdown
Collaborator

@pytorch-bot

pytorch-bot Bot commented Dec 20, 2024

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143682

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 419b9ca with merge base b5cf8e2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ngimel ngimel added the release notes: cpp release notes category label Dec 20, 2024

@eqy eqy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worthwhile to add a test on what the distribution looks like in the blog post?

@TimZaman

Copy link
Copy Markdown

pretty fast turnaround natalia 😎

@ngimel

ngimel commented Dec 21, 2024

Copy link
Copy Markdown
Collaborator Author

is it worthwhile to add a test on what the distribution looks like in the blog post?
It takes 4 minutes and requires a lot of cpu memory

@albanD albanD left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to go that far, but for better randomness, Numpy uses repeated sampling rather than modulo: https://github.com/numpy/numpy/blob/9aa5cda4c502487fc69717507f6a9936b420365f/numpy/random/src/distributions/distributions.c#L1091-L1098

From measuring the runtime on cpu based on the blogpost, it doesn't make a noticeable difference. Maybe we should also consider doing the same to avoid border effects for numbers close to the int32/64 limits?

@ngimel

ngimel commented Dec 23, 2024

Copy link
Copy Markdown
Collaborator Author

Discussed offline with @albanD, with our random generator resampling adds too much overhead, instead we switched to no-init version of Fischer-Yates and always use random64. For 700 million input tensor that gives approx the same perf as previously.

@ngimel

ngimel commented Dec 24, 2024

Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 24, 2024
@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@ngimel

ngimel commented Dec 24, 2024

Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ngimel

ngimel commented Dec 24, 2024

Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

@ngimel your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Dec 27, 2024
This reverts commit 7013be0.

Reverted #143682 on behalf of https://github.com/wdvr due to failing Meta internal tests that need to be updated ([comment](#143682 (comment)))
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Dec 27, 2024

@albanD albanD left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still ok, even though we shouldn't really hardcode random values in tests...

@albanD

albanD commented Dec 27, 2024

Copy link
Copy Markdown
Collaborator

Should we add back the conditional on "n" to decide if we sample 32 or 64 bits? Maybe that will also make the CI handling easier as it wouldn't change the randomness for very small inputs.

@ngimel

ngimel commented Dec 30, 2024

Copy link
Copy Markdown
Collaborator Author

I'd still like to change the algo to no-init, and it will produce different permutations. We'll try to update ref values

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ngimel

ngimel commented Jan 7, 2025

Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jan 8, 2025
Fixes #ISSUE_NUMBER
Similar to #143682, for large maximum values we were sampling integers via % and it doesn't provide uniform distribution. Here we limit the max skew to approx 1% (random32 is used for max values `<= 2**32 / 128`)
This comes with significant perf penalty, especially for cuda, but it's a pretty bad bug, so we'll have to figure out what can be done to improve it.
`torch.compile` has always been producing correct results for this, and it's performance is also significantly better than current eager (eager is ~660 GB/s on H100, torch.compile 1200 GB/s), so we have to figure out why torch.compile is better.
`__launch_bounds__` slightly regress perf, so perhaps we can figure out how to specify them better, but it's only 20-30 GB/s, so the big difference is still unexplained.

Pull Request resolved: #143787
Approved by: https://github.com/eqy
kit1980 added a commit that referenced this pull request Jan 14, 2025
malfet pushed a commit that referenced this pull request Jan 14, 2025
#144730)

Revert "Use random64 in Fischer-Yates algorithm for large N (#143682) (#143875)"

This reverts commit b1a10ec.
@kit1980

kit1980 commented Jan 14, 2025

Copy link
Copy Markdown
Contributor

@pytorchbot cherry-pick --onto release/2.6 -c critical

@pytorchbot

Copy link
Copy Markdown
Collaborator

Cherry picking #143682

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 2e42be0595481fb448eaa41ea8bc281f1afff5c2 returned non-zero exit code 1

Auto-merging aten/src/ATen/native/TensorFactories.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/native/TensorFactories.cpp
Auto-merging test/test_sparse_csr.py
Auto-merging test/test_tensor_creation_ops.py
error: could not apply 2e42be05954... Use random64 in Fischer-Yates algorithm for large N (#143682)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

@kit1980

kit1980 commented Jan 14, 2025

Copy link
Copy Markdown
Contributor

Manually cherry-picked #142814 (comment)

@github-actions github-actions Bot deleted the ngimel/randperm_fix branch February 14, 2025 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cpp release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants