[Inductor] fix performance regression caused by #173662#176772
[Inductor] fix performance regression caused by #173662#176772AmesingFlank wants to merge 9 commits intogh/AmesingFlank/6/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176772
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 5aecd88 with merge base 9774102 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-14) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / inductor-test / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The inductor NameError flake (buf1 not defined) that prompted the pin has been fixed upstream by multiple PyTorch PRs: - pytorch/pytorch#176772 (fix is_unfusable scheduler logic) - pytorch/pytorch#176832 (guard get_read_writes behind config flag) - pytorch/pytorch#177062 (fix MultiOutput write deps) The root cause was pytorch/pytorch#173662 which overrode UserDefinedTritonKernel.get_read_writes() even when the epilogue fusion feature was disabled, breaking buffer scheduling for TritonTemplateBuffer subclasses like Helion's HelionTemplateBuffer. Verified the previously-failing test passes against current PyTorch main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The inductor NameError flake (buf1 not defined) that prompted the pin has been fixed upstream by multiple PyTorch PRs: - pytorch/pytorch#176772 (fix is_unfusable scheduler logic) - pytorch/pytorch#176832 (guard get_read_writes behind config flag) - pytorch/pytorch#177062 (fix MultiOutput write deps) The root cause was pytorch/pytorch#173662 which overrode UserDefinedTritonKernel.get_read_writes() even when the epilogue fusion feature was disabled, breaking buffer scheduling for TritonTemplateBuffer subclasses like Helion's HelionTemplateBuffer. Verified the previously-failing test passes against current PyTorch main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The inductor NameError flake (buf1 not defined) that prompted the pin has been fixed upstream by multiple PyTorch PRs: - pytorch/pytorch#176772 (fix is_unfusable scheduler logic) - pytorch/pytorch#176832 (guard get_read_writes behind config flag) - pytorch/pytorch#177062 (fix MultiOutput write deps) The root cause was pytorch/pytorch#173662 which overrode UserDefinedTritonKernel.get_read_writes() even when the epilogue fusion feature was disabled, breaking buffer scheduling for TritonTemplateBuffer subclasses like Helion's HelionTemplateBuffer. Verified the previously-failing test passes against current PyTorch main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stack from ghstack (oldest at bottom):
Tested by running
before this PR:
with this PR:
Also, user triton kernel fusion isn't affected:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela