futher scheduler changes for invoke_quant: prologue low prec, (slightly) more aggressive fusion#145104
futher scheduler changes for invoke_quant: prologue low prec, (slightly) more aggressive fusion#145104eellison wants to merge 13 commits intogh/eellison/752/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145104
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit cf18367 with merge base 49082f9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| @@ -3477,36 +3521,7 @@ def can_fuse(self, node1: BaseSchedulerNode, node2: BaseSchedulerNode) -> bool: | |||
| ) | |||
There was a problem hiding this comment.
Maybe move the whole if block regarding prologue fusion to can_fuse_prologue to make can_fuse smaller.
| """ | ||
| Heuristics to avoid benchmarking predictably slow prologue fusions | ||
| """ | ||
| # user opt into more aggressive prologue fusion, dont use heuristics |
There was a problem hiding this comment.
User opt in by using invoke_quant?
torch/_inductor/graph.py
Outdated
| self.low_precision_codegen_ops = OrderedSet[str]() | ||
| # more aggressive prologue fusion | ||
| self.invoke_quant_ops = OrderedSet[str]() |
There was a problem hiding this comment.
| self.low_precision_codegen_ops = OrderedSet[str]() | |
| # more aggressive prologue fusion | |
| self.invoke_quant_ops = OrderedSet[str]() | |
| self.low_precision_codegen_ops : OrderedSet[str] = OrderedSet() | |
| # more aggressive prologue fusion | |
| self.invoke_quant_ops : OrderedSet[str] = OrderedSet() |
to avoid runtime calls to __getitem__.
…ec, (slightly) more aggressive fusion" Respect invoke_quant low precision options, also, be more aggressive in attepmting fusion. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
…ec, (slightly) more aggressive fusion" Respect invoke_quant low precision options, also, be more aggressive in attepmting fusion. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
…ec, (slightly) more aggressive fusion" Respect invoke_quant low precision options, also, be more aggressive in attepmting fusion. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge -f "rocm test taking a while" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Respect invoke_quant low precision options, also, be more aggressive in attepmting fusion.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov