[Inductor] Set the default value of min_chunk_size to 512#150762
[Inductor] Set the default value of min_chunk_size to 512#150762jiayisunx wants to merge 12 commits intogh/jiayisunx/63/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150762
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 81030e5 with merge base 70b4a88 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torch/_inductor/codegen/cpp.py
Outdated
| f"{local_buf_dtype}* {local_buffer_name} = buf_{local_buffer_name}.get();" | ||
| ) | ||
| gen_loop_nest(loop_nest) | ||
| worksharing.close() |
There was a problem hiding this comment.
May I know for which case, we need to add this line of change?
There was a problem hiding this comment.
Many UTs in test/inductor/test_cpu_select_algorithm.py failed after parallelization. Without this line, worksharing was not closed and the output code was missing a }.
There was a problem hiding this comment.
Would like to understand more about the difference. Will these UTs also fail if we keep the chunk size but increase the loop size?
leslie-fang-intel
left a comment
There was a problem hiding this comment.
Thanks for the PR. Please elaborate more about why the changes of worksharing.close() is needed.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m 'Sorry for reverting your change, but an inductor compilation error shows up in trunk' -c nosignal https://github.com/pytorch/pytorch/actions/runs/16255653984/job/45891797651#step:25:5755 |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…50762)" This reverts commit 3321acc. Reverted #150762 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but an inductor compilation error shows up in trunk ([comment](#150762 (comment)))
|
@jiayisunx your PR has been successfully reverted. |
Change the default value of min_chunk_size from 4096 to 512 to allow more for loops to be parallelized. I tested the Inductor benchmark with this PR on CPU, and saw ~10% improvement in torchbench geomean speedup, and no change in huggingface/timm_models. There are about 15 torchbench models with different degrees of performance improvement, among which functorch_dp_cifar10, opacus_cifar10, hf_Reformer, and pyhpc_turbulent_kinetic_energy have more than 50% performance improvement. Pull Request resolved: #150762 Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel ghstack-source-id: b524b9e
Change the default value of min_chunk_size from 4096 to 512 to allow more for loops to be parallelized. I tested the Inductor benchmark with this PR on CPU, and saw ~10% improvement in torchbench geomean speedup, and no change in huggingface/timm_models. There are about 15 torchbench models with different degrees of performance improvement, among which functorch_dp_cifar10, opacus_cifar10, hf_Reformer, and pyhpc_turbulent_kinetic_energy have more than 50% performance improvement. Pull Request resolved: #150762 Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel ghstack-source-id: adf05f5
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Change the default value of min_chunk_size from 4096 to 512 to allow more for loops to be parallelized.
I tested the Inductor benchmark with this PR on CPU, and saw ~10% improvement in torchbench geomean speedup, and no change in huggingface/timm_models. There are about 15 torchbench models with different degrees of performance improvement, among which functorch_dp_cifar10, opacus_cifar10, hf_Reformer, and pyhpc_turbulent_kinetic_energy have more than 50% performance improvement.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos