remove redundant to_dtype in Fused Schedular Nodes#118365
remove redundant to_dtype in Fused Schedular Nodes#118365zhuhaozhe wants to merge 15 commits intogh/zhuhaozhe/9/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118365
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit d40433e with merge base cc7ef43 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
|
Successfully rebased |
Fix #115260. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Fix #115260. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will also store `float buffer` so we can load `float buffer` from `store_cache` and directly do compute on it. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will also store `float buffer` so we can load `float buffer` from `store_cache` and directly do compute on it. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`. Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`. Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
@jansel Yes, please review. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
Fix #115260. This issue is triggered by `FusedSchedularNodes` cases. We always store `lowp buffer` to `store_cache` then load `lowp buffer` from `store_cache` and `convert it to float` before `compute ops`. Now we will generate a `{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}` in `cse.cache`. Then the `to_dtype(float32)` after `load` will hit this cache and not generate a new var with cast codes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
#120243) This reverts commit cc7ef43. Manual revert because of the conflict in: test/inductor/test_cpu_repro.py , conflict with this PR: #118365 Pull Request resolved: #120243 Approved by: https://github.com/malfet, https://github.com/huydhn
Fix #115260.
This issue is triggered by
FusedSchedularNodescases.We always store
lowp buffertostore_cachethen loadlowp bufferfromstore_cacheandconvert it to floatbeforecompute ops.Now we will generate a
{key: to(float32)_expr, value: the float32 cse var before to_dtype and store}incse.cache.Then the
to_dtype(float32)afterloadwill hit this cache and not generate a new var with cast codes.Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames