-
Notifications
You must be signed in to change notification settings - Fork 27.7k
Batched MAGMA calls illegally read CUDA memory #26996
Copy link
Copy link
Closed
Labels
high prioritymodule: dependency bugProblem is not caused by us, but caused by an upstream library we useProblem is not caused by us, but caused by an upstream library we usemodule: linear algebraIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmulIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmultriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
high prioritymodule: dependency bugProblem is not caused by us, but caused by an upstream library we useProblem is not caused by us, but caused by an upstream library we usemodule: linear algebraIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmulIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmultriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Bug
(Some?) Batched MAGMA calls illegally read CUDA memory.
These illegal reads are often "silent" and harmless. If, however, they access unallocated device memory they will cause the program's future CUDA calls to fail.
To Reproduce
See #26789. Or once that PR goes in just re-enable test_cholesky_batched_many_batches.
This can also be reproduced by calling magma_dpotrf_batched directly with a tensor allocated by cudaMalloc. Run under cuda-memcheck to report all illegal memory accesses, including "silent" ones.
Additional context
This issue was discovered in #26789 and diagnosed by me and @ngimel. We are following-up with @vishwakftw.
A workaround may be to pad tensor inputs to batched MAGMA calls. This requires copying the original tensor into a buffer larger than it needs. How much extra space is needed requires further investigation.
cc @ezyang @gchanan @zou3519 @vincentqb @vishwakftw @jianyuh @ssnl