Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176703
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 2738fc0 with merge base 08b6f48 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
458a6b7 to
e5e17f7
Compare
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
kurtamohler
left a comment
There was a problem hiding this comment.
Thank you for the PR! Left a few suggestions
e5e17f7 to
f8ce1f2
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
f8ce1f2 to
2738fc0
Compare
|
@pytorchbot label "ciflow/mps" |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
BTW: the solve triangular extremely slow on MPS. And It have no Idea how to be it faster after do some investigation. Any suggestion?😿 |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / build Details for Dev Infra teamRaised by workflow job |
Is it slow just for batched inputs with more than 2 dims, or is it also slow for non-batched inputs? It seems that if the input is batched, each matrix is computed serially: So a major improvement would be to parallelize the batches. It looks like MPSMatrixSolveTriangular does not support batched matrices, so it would have to be implemented as a custom metal kernel. |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3.14t-clang15 / test (crossref, 2, 2, lf.linux.2xlarge) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
As decomposition via two triangular solves Frequently requested op in pytorch#154052 Pull Request resolved: pytorch#176703 Approved by: https://github.com/kurtamohler, https://github.com/malfet
As decomposition via two triangular solves
Frequently requested op in #154052