Validate pivot range in linalg.ldl_solve CPU kernel#181032
Validate pivot range in linalg.ldl_solve CPU kernel#181032qflen wants to merge 1 commit intopytorch:mainfrom
Conversation
Lapack SYTRS writes past the matrix when |IPIV(k)| falls outside [1, N], which surfaces as heap corruption during tensor teardown. Mirror the lu_solve CPU kernel's sanity check so out-of-range pivots raise a clean RuntimeError instead. Fixes pytorch#163450
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/181032
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit d26c4f8 with merge base ba36784 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (distributed, 1, 3, linux.g4dn.12xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (distributed, 1, 3, linux.g4dn.12xlarge.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: HTTP Error 503: Service Unavailable Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (distributed, 1, 3, linux.g4dn.12xlarge.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary
torch.linalg.ldl_solveforwards user-provided pivots straight to LapackSYTRS, which writes into unrelated memory when
|IPIV(k)|falls outside[1, N]. The out-of-bounds writes corrupt the heap, so the crash manifestsmuch later during tensor teardown as a malformed-free abort (the reproducer
from #163450 shows
malloc(): unsorted double linked list corrupted).The CPU kernel for
linalg.lu_solvealready guards against the same classof bug; this PR ports that sanity check to
ldl_solve_kernel. Negativepivots are accepted since they legally encode 2x2 block pivots, but
|pivot|must still satisfy1 <= |pivot| <= N.Fixes #163450.
Test plan
test_ldl_solve_cpu_errorscovering zero pivots and out-of-range positive/negative pivots across floating and complex dtypes. The new check catches the repro from torch.linalg.ldl_solve aborted heap corruption with wrong input #163450 (pivots[2, 3, 5, 7, 11]on a 5x5 matrix) before reaching SYTRS.