Add torch.linalg.cholesky_ex without checking for errors by default#56724
Add torch.linalg.cholesky_ex without checking for errors by default#56724IvanYashchuk wants to merge 23 commits intopytorch:masterfrom
Conversation
Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented to call `linalg_cholesky_ex`. `torch.linalg.cholesky_ex` suppresses checking of LAPACK error codes by default and returns them. That puts the responsibility to check the error codes on the user.
💊 CI failures summary and remediationsAs of commit c73bf24 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
Docs failure is real: named tuple list needs an update (so many lists is a pain, sorry): |
| linalg_cholesky_out_info(input, L, info); | ||
| } | ||
|
|
||
| if (check_errors) { |
There was a problem hiding this comment.
so if someone calls torch.linalg.cholesky_ex(..., check_errors=True) and there is an error then the error message would say torch.linalg.cholesky...?
That's not a big deal but it'd probably be easy to pass the calling function's string here?
|
|
||
| std::tuple<Tensor&, Tensor&> linalg_cholesky_ex_out(const Tensor& input, bool check_errors, Tensor& L, Tensor& info) { | ||
| squareCheckInputs(input); | ||
| checkSameDevice("torch.linalg.cholesky_ex", L, input, "L"); |
There was a problem hiding this comment.
Because cholesky is implemented as a call to cholesky_ex this would show the name cholesky_ex if a call to cholesky hit an error, right? Also seem comment below for when a call to cholesky_ex would show the name cholesky
Error messages aren't the end of the world but do you think we should bother taking a string for the function name? Alternatively we could just always use torch.linalg.cholesky, even when someone calls torch.linalg.cholesky_ex
There was a problem hiding this comment.
I agree that we could just use torch.linalg.cholesky.
Here torch.linalg.cholesky has its own same checks differing in the function name and that "result" is used instead of "L". It duplicates the check, but that shouldn't have overhead.
There was a problem hiding this comment.
So the error messages of torch.linalg.cholesky are not changed in this PR. The tests would need to be modified if the error messages would change.
| CompositeExplicitAutograd: linalg_cholesky_ex | ||
|
|
||
| - func: linalg_cholesky.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) | ||
| - func: linalg_cholesky_ex.L(Tensor self, *, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info) |
There was a problem hiding this comment.
Jumping ahead of @ezyang here: could we look at making these structured? If that seems challenging we could look at doing it in a follow-up PR.
There was a problem hiding this comment.
I don't see a CPU, CUDA dispatch key here so this might be an alias style kernel that may not work with structured. You should add this to the binder of cases to solve with the transpiler.
There was a problem hiding this comment.
Is CPU, CUDA dispatch key preferred over CompositeExplicitAutograd if the dispatch stubs are used?
Found the answer in the aten/native/README.md:
Note: kernels which call
DispatchStub should NOT be registered as CompositeExplicitAutograd, as
DispatchStub only works forCPU, CUDA
I haven't looked at porting to structured in detail, so I can't estimate how challenging is it. I was planning to port linalg module functions to structured after the 1.9 branch cut.
| @skipCUDAIfNoMagmaAndNoCusolver | ||
| @skipCPUIfNoLapack | ||
| @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128) | ||
| def test_cholesky_with_info(self, device, dtype): |
There was a problem hiding this comment.
test_cholesky_ex? That will make it easier for people to find the test(s) associated with the operator
| from torch.testing._internal.common_utils import random_hermitian_pd_matrix | ||
|
|
||
| def run_test(n, batch): | ||
| A = random_hermitian_pd_matrix(n, *batch, dtype=dtype, device=device) |
There was a problem hiding this comment.
I'm curious about these tests because we already have a lot of tests for cholesky, and cholesky_ex and cholesky are "near aliases" of each other; should we just refactor the cholesky tests so that test_cholesky_foo and test_cholesky_ex_foo both call a common helper, but pass a different torch function? Then there can be separate tests that validate their divergent behavior on infos
There was a problem hiding this comment.
I completely agree. I think it would take a lot of energy from me to refactor the tests now. If those functions were aliases the task of course would be easy. I'd prefer to do it in a follow-up PR, after 1.9 branch cut, if that's okay.
| True | ||
| """) | ||
|
|
||
| cholesky_ex = _add_docstr(_linalg.linalg_cholesky_ex, r""" |
There was a problem hiding this comment.
I like that we're documenting this completely separately from linalg.cholesky because it's an experimental function
There was a problem hiding this comment.
That said, we should probably notify the reader early that this function is "experimental" and that it may change in a future PyTorch release
I realize it's already in an "experimental" section, but it's common to search functions directly in the docs
There was a problem hiding this comment.
I added a warning, is it okay?
.. warning:: This function is "experimental" and it may change in a future PyTorch release.
Not sure on the use of quotes though.
| @skipCUDAIfNoMagmaAndNoCusolver | ||
| @skipCPUIfNoLapack | ||
| @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128) | ||
| def test_cholesky_with_info_non_pd(self, device, dtype): |
| lstsq | ||
| householder_product | ||
|
|
||
| Experimental Functions |
|
Hey @IvanYashchuk! This is awesome! I made a few comments about error messages, a docs tweak, a possible test refactor, and using structure kernels. I don't think any of them are blocking, we just need to focus on fixing the tests and any additional tweaks are a bonus. We can pursue the docs as part of our docs review, for example. @lezcano would you like to sanity check the impl, too? Let's also give @ezyang an opportunity to look at the native_functions.yaml changes |
|
This is great @IvanYashchuk. Let me see if using this and checking the |
|
@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| DEFINE_DISPATCH(cholesky_stub); | ||
|
|
||
| void linalg_cholesky_out_info(const Tensor& input, const Tensor& result, const Tensor& info) { | ||
| TORCH_INTERNAL_ASSERT(input.dim() >= 2); |
There was a problem hiding this comment.
as in the other PR, consider making these TORHC_INTERNAL_ASSERT_DEBUG_ONLY
| checkLinalgCompatibleDtype("torch.linalg.cholesky_ex", L, input, "L"); | ||
| checkSameDevice("torch.linalg.cholesky_ex", info, input, "info"); | ||
| ScalarType info_output_type = ScalarType::Int; | ||
| checkLinalgCompatibleDtype("torch.linalg.cholesky_ex", info.scalar_type(), info_output_type); |
There was a problem hiding this comment.
we could require that infos are of correct type and error out if they aren't, that would slightly simplify the code.
There was a problem hiding this comment.
I think info should follow the same behavior that is expected for all other "out" tensors. It should be possible to provide a tensor with different castable dtype, tensors with zero elements are resized, etc. Is that right?
There was a problem hiding this comment.
Not exactly. We are limiting type promotion behavior to pointwise functions and reductions #56356. Existing additional functionality is grandfathered in, but we should avoid adding new. So, linalg functions casting to output exists today, but info is a new addition, and should not be promoted.
| // if upper=true we need to tranpose the self tensor | ||
| if (upper) { | ||
| // self.transpose_(-2, -1); | ||
| self_data = self.transpose(-2, -1).data_ptr<scalar_t>(); |
There was a problem hiding this comment.
this is a dangling data pointer, or rather, this would be a dangling pointer if it was not guaranteed to be the same as self.data_ptr(), which brings a question of what does this conditional achieve?
There was a problem hiding this comment.
Looking at it now, this portion of the code is wrong and I am removing it. Thank you!
The original intention was trying to solve #57032. This was wrong and the issue is resolved now with changes to the cholesky_helper_magma function.
| @@ -1659,65 +1672,89 @@ AT_ERROR("cholesky: MAGMA library not found in " | |||
| // which concisely is equal to batch_size % batch_limit | |||
| if (batch_size % batch_limit != 0) { | |||
| magmaCholeskyBatched<scalar_t>( | |||
There was a problem hiding this comment.
nit, but consider bringing this to the upper loop, the only change needed is mini_idx < batch_size in the loop statement, and nbatches = std::min(batch_limit, batch_size - mini_idx) as argument to magmaCholeskyBatched.
| Tensor _cholesky_helper_cuda(const Tensor& self, bool upper) { | ||
| auto info_shape = IntArrayRef( | ||
| self.sizes().cbegin(), self.sizes().cend() - 2); // self.shape[:-2] | ||
| Tensor self_working_copy = cloneBatchedColumnMajor(self); |
There was a problem hiding this comment.
can you please leave a comment here that clone is called because legacy cholesky doesn't do clone before?
|
This is great. In this notebook I made a quick and dirty update of the For a small matrix example this gets the runtime down from 123ms -> 76µs. Yes, those are the correct units, this is a speedup of ~2,000X. |
Add upper argument to cholesky_stub
454f89b to
a075e78
Compare
| // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cholesky ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| template<typename scalar_t> | ||
| static void apply_cholesky(Tensor& self, bool upper, std::vector<int64_t>& infos) { |
There was a problem hiding this comment.
This function is moved to BatchLinearAlgebraKernel.cpp file.
| } | ||
| squareCheckInputs(self); | ||
|
|
||
| auto raw_cholesky_output = at::_cholesky_helper(self, upper); |
There was a problem hiding this comment.
Removed at::_cholesky_helper, it is replaced with cholesky_stub.
|
@mruberry, I updated this pull request. Most of the changes in this PR are rewrites of internal code for cholesky needed to expose the
|
|
@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This can *significantly* speed up `psd_safe_cholesky` due to cutting out the pytorch error-handling middle man, achieving ~2,0000X speedups: pytorch/pytorch#56724 (comment) This also allows us to add jitter to the specific batch elements for which the decomp failed (rather than idiscriminatly to all). This requires pytorch/pytorch#56724 that hasn't landed yet but will be part of 1.9. Either way, I implemented this in a backward-compatible fashion so this will work with older pytorch versions as well.
|
@mruberry For |
|
@mruberry, thanks! |
|
@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This can *significantly* speed up `psd_safe_cholesky` due to cutting out the pytorch error-handling middle man, achieving ~2,0000X speedups: pytorch/pytorch#56724 (comment) This also allows us to add jitter to the specific batch elements for which the decomp failed (rather than idiscriminatly to all). This requires pytorch/pytorch#56724 that hasn't landed yet but will be part of 1.9. Either way, I implemented this in a backward-compatible fashion so this will work with older pytorch versions as well.
…ytorch#56724) Summary: The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user. When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor. Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now. Resolves pytorch#57032. Ref. pytorch#34272, pytorch#47608, pytorch#47953 Pull Request resolved: pytorch#56724 Reviewed By: ngimel Differential Revision: D27960176 Pulled By: mruberry fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5
This can *significantly* speed up `psd_safe_cholesky` due to cutting out the pytorch error-handling middle man, achieving ~2,0000X speedups: pytorch/pytorch#56724 (comment) This also allows us to add jitter to the specific batch elements for which the decomp failed (rather than idiscriminatly to all). This requires pytorch/pytorch#56724 that hasn't landed yet but will be part of 1.9. Either way, I implemented this in a backward-compatible fashion so this will work with older pytorch versions as well.
…ytorch#56724) Summary: The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user. When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor. Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now. Resolves pytorch#57032. Ref. pytorch#34272, pytorch#47608, pytorch#47953 Pull Request resolved: pytorch#56724 Reviewed By: ngimel Differential Revision: D27960176 Pulled By: mruberry fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5
The new function has the following signature
cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos). Whencheck_errors=True, an error is thrown if the decomposition fails;check_errors=False- responsibility for checking the decomposition is on the user.When
check_errors=False, we don't have host-device memory transfers for checking the values of theinfotensor.Rewrote the internal code for
torch.linalg.cholesky. Addedcholesky_stubdispatch.linalg_choleskyis implemented using calls tolinalg_cholesky_exnow.Resolves #57032.
Ref. #34272, #47608, #47953