[inductor] enable mkldnn op weight pre-packing on aarch64#115037
[inductor] enable mkldnn op weight pre-packing on aarch64#115037snadampal wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115037
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit a97e61b with merge base 71bf4f3 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @jgong5 , @XiaobingSuper , @CaoE , Can you please review this PR? It will be great if it gets merged for PyTorch 2.2 release. Thank you! |
db59207 to
cb44ee2
Compare
|
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following: |
|
btw, I had already tested bert sentiment analysis with torch.compile() and the results look correct on aarch64. my above questions were just to better understand the fx_pass and mkldnn rewrite behavior for non-fusion cases, i will dig into the code. and this PR is ready for merge once approved by the module owners. |
cb44ee2 to
fade410
Compare
|
I have updated the PR to allow the dynamic shapes on aarch64 even for fp32 inputs. |
malfet
left a comment
There was a problem hiding this comment.
LGTM, but I wonder if is_mkldnn_acl_supported() is in any way fundamentally different than _is_mkldnn_bf16_supported()? If not, why not merge two functions together?
these two are different. for example on Neoverse N1 we have |
This PR enables the fx passes and mkldnn optimizations for aarch64. It improved the bert inference performance up to 5.8x on AWS c7g instance when compared torch.compile() vs no compile path. This is enabled when pytorch is built with USE_MKLDNN_ACL option on aarch64.
fade410 to
a97e61b
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR enables the fx passes and mkldnn optimizations for aarch64 It improved the bert inference performance up to 5.8x on AWS c7g instance when compared torch.compile() vs no compile path. This is enabled when pytorch is built with USE_MKLDNN_ACL option for aarch64. Pull Request resolved: pytorch#115037 Approved by: https://github.com/jgong5, https://github.com/malfet
…5037) This PR enables the fx passes and mkldnn optimizations for aarch64 It improved the bert inference performance up to 5.8x on AWS c7g instance when compared torch.compile() vs no compile path. This is enabled when pytorch is built with USE_MKLDNN_ACL option for aarch64. Pull Request resolved: pytorch#115037 Approved by: https://github.com/jgong5, https://github.com/malfet
This PR enables the fx passes and mkldnn optimizations for aarch64 It improved the bert inference performance up to 5.8x on AWS c7g instance when compared torch.compile() vs no compile path. This is enabled when pytorch is built with USE_MKLDNN_ACL option for aarch64.
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler