Add non-allocating helper function for torch.linalg.qr#56254
Add non-allocating helper function for torch.linalg.qr#56254IvanYashchuk wants to merge 14 commits intogh/ivanyashchuk/15/basefrom
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 87182f9 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
ghstack-source-id: 0a89999 Pull Request resolved: pytorch#56254
[ghstack-poisoned]
ghstack-source-id: a668bd2 Pull Request resolved: pytorch#56254
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: c2ef227 Pull Request resolved: pytorch#56254
[ghstack-poisoned]
ghstack-source-id: 5005854 Pull Request resolved: pytorch#56254
|
@lezcano would you sanity check this? |
Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]
Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]
ghstack-source-id: 8307873 Pull Request resolved: pytorch#56254
antocuni
left a comment
There was a problem hiding this comment.
LGTM.
If I understand it correctly, this PR by itself is mostly a no-op because it introduces _linalg_qr_out_helper but it never uses it apart for re-implementing the old _linalg_qr_helper, but I assume it will be used by subsequent PRs in the ghstack
| TORCH_INTERNAL_ASSERT(Q.sizes().equals(expected_Q_shape)); | ||
|
|
||
| // Q tensor must be in batched column major order (Fortran contiguous) | ||
| TORCH_INTERNAL_ASSERT(Q.transpose(-2, -1).is_contiguous()); |
There was a problem hiding this comment.
this has nothing to do with this PR, it's just a random idea which came to my mind while reading this.
Should we have something like is_fortran_contiguous()? And maybe even introduce at::MemoryFormat::FortranContiguous which would be useful when creating tensors.
I have seen similar patterns a bit everywhere in linalg code, and having first-class support for fortran-style memory would probably simplify a lot of code
There was a problem hiding this comment.
In linalg code we need to have only that the last two dimensions to be Fortran contiguous and all other are normal C contiguous. Real Fortran-style memory would be to have the matrix in the first two dimensions and batches in the end, it's not useful to have.
lezcano
left a comment
There was a problem hiding this comment.
It looks very good. I just found one part that was a bit difficult to follow. I made a proposal for a simpler way to handle that, which I think should be save one allocation.
If this sounds reasonable, the handling of the "unpacking" of the tensors after the geqrf call should be changed as well.
| // geqrf requires m x n workspace input that is modified in-place | ||
| // if m > n and reduced==true we use Q tensor for storing the result of geqrf operation | ||
| // otherwise R tensor is used | ||
| Tensor QR = R; |
There was a problem hiding this comment.
This logic is a bit difficult to follow. It's difficult to see that all the cases are covered. I think it'd be better to initialise it with a lambda and an if/else structure that has all the else's
const Tensor QR = [&]{
if (m <= n) {
return R;
} else {
if (compute_q) {
return reduced_mode ? Q : R;
} else {
return at::empty(input.transpose(-2, -1).sizes(), input.options()).transpose_(-2, -1);
}
}}();Note that the current implementation allocates a new tensor whenever compute_q == false, which is not necessary.
There was a problem hiding this comment.
An analogous if/else (without the lambda) should be written below to handle the unpacking of the result.
There was a problem hiding this comment.
the current implementation allocates a new tensor whenever compute_q == false, which is not necessary
Right, I'll fix that.
There was a problem hiding this comment.
I changed the code to use your conditions.
|
@lezcano would you take a look at this PR? |
The previous helper was allocating memory on the go, with narrowing and resizes, which is not a good strategy to do. Here we first allocate the memory of the correct size and then fill it with the result. |
Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]
ghstack-source-id: 4a7d272 Pull Request resolved: pytorch#56254
Differential Revision: [D27960151](https://our.internmc.facebook.com/intern/diff/D27960151) [ghstack-poisoned]
ghstack-source-id: b31055e Pull Request resolved: pytorch#56254
Summary: Pull Request resolved: pytorch#56254 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960151 Pulled By: mruberry fbshipit-source-id: 4067afed0dcca3f32d0fa153e50a268a850817b2
Summary: Pull Request resolved: pytorch#56254 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960151 Pulled By: mruberry fbshipit-source-id: 4067afed0dcca3f32d0fa153e50a268a850817b2
Stack from ghstack:
Differential Revision: D27960151