Batched MAGMA calls illegally read CUDA memory

## 🐛 Bug

(Some?) Batched MAGMA calls illegally read CUDA memory.

These illegal reads are often "silent" and harmless. If, however, they access unallocated device memory they will cause the program's future CUDA calls to fail. 

## To Reproduce

See #26789. Or once that PR goes in just re-enable test_cholesky_batched_many_batches.

This can also be reproduced by calling magma_dpotrf_batched directly with a tensor allocated by cudaMalloc. Run under cuda-memcheck to report all illegal memory accesses, including "silent" ones.

## Additional context

This issue was discovered in #26789 and diagnosed by me and @ngimel. We are following-up with @vishwakftw. 

A workaround may be to pad tensor inputs to batched MAGMA calls. This requires copying the original tensor into a buffer larger than it needs. How much extra space is needed requires further investigation. 


cc @ezyang @gchanan @zou3519 @vincentqb @vishwakftw @jianyuh @SsnL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched MAGMA calls illegally read CUDA memory #26996

🐛 Bug

To Reproduce

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batched MAGMA calls illegally read CUDA memory #26996

Description

🐛 Bug

To Reproduce

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions