[ROCm] TopK optimizations for AMD GPUs#146387
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146387
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ff7648f with merge base 71855a1 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
malfet
left a comment
There was a problem hiding this comment.
Sure, though it would be good to paste an example of benchmark run and perf observed before and after the change
thank you, will add that as well. This PR is work in progress. |
|
@ngimel , |
| size_t n_multidims = 0; // number of dimensions with dimensionality more than one | ||
| for(int s: self.sizes()) { | ||
| n_multidims += (s > 1); | ||
| } | ||
| return (n_multidims == 1 && self.numel()>=10000); // based on the experiments in https://github.com/pytorch/pytorch/pull/146387 |
There was a problem hiding this comment.
I think you want self.numel() == self.sizes(dim)
There was a problem hiding this comment.
thank you @ngimel for your suggestion. Will change it to "return (self.numel() == self.sizes(dim) && self.numel()>=10000);"
|
@pytorchbot merge |
|
Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
@apakbin |
|
@pytorchbot rebase |
|
@apakbin , |
| bool disable_sort_for_topk(); | ||
| bool should_use_sort(const Tensor& self, int64_t dim) { | ||
| #if defined(USE_ROCM) | ||
| return (self.numel() == self.size(dim) && self.numel() >= 10000); // based on the experiments in https://github.com/pytorch/pytorch/pull/146387 |
There was a problem hiding this comment.
| return (self.numel() == self.size(dim) && self.numel() >= 10000); // based on the experiments in https://github.com/pytorch/pytorch/pull/146387 | |
| return (self.numel() >= 10000 && self.numel() == self.size(dim)); // based on the experiments in https://github.com/pytorch/pytorch/pull/146387 |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
TopK performance on ROCm performs better on the test suite with the default config. Pull Request resolved: pytorch#146387 Approved by: https://github.com/malfet, https://github.com/ngimel
TopK performance on ROCm performs better on the test suite with the default config. Pull Request resolved: pytorch#146387 Approved by: https://github.com/malfet, https://github.com/ngimel
TopK performance on ROCm performs better on the test suite with the default config. Pull Request resolved: pytorch#146387 Approved by: https://github.com/malfet, https://github.com/ngimel
TopK performance on ROCm performs better on the test suite with the default config. Pull Request resolved: pytorch#146387 Approved by: https://github.com/malfet, https://github.com/ngimel



TopK performance on ROCm performs better on the test suite with the default config.
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd