Port CPU torch.ormqr to ATen by IvanYashchuk · Pull Request #57315 · pytorch/pytorch

IvanYashchuk · 2021-04-29T22:22:26Z

Stack from ghstack:

Added cuBLAS path for torch.linalg.lstsq #54725 Added cuBLAS path for torch.linalg.lstsq
Add cuSOLVER path for torch.linalg.lstsq #57317 Add cuSOLVER path for torch.linalg.lstsq
Add CUDA support for torch.ormqr #57316 Add CUDA support for torch.ormqr
Port CPU torch.ormqr to ATen #57315 Port CPU torch.ormqr to ATen

This PR ports torch.ormqr from TH to ATen.
CUDA path will be implemented in a follow-up PR.
With ATen port, support for complex and batched inputs is added.
The tests are rewritten and OpInfo entry is added.

We can implement the least squares solver with geqrf + ormqr +
triangular_solve. So it's useful to have this function renewed at least for the
internal code.

Resolves #24748

Differential Revision: D28242070

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves #24748 [ghstack-poisoned]

facebook-github-bot · 2021-04-29T22:22:34Z

💊 CI failures summary and remediations

As of commit c9a690d (more details on the Dr. CI page):

2/2 failures introduced in this PR

2 failures not recognized by patterns:

Job	Step	Action
^flake8-py3	^Unknown	🔁 rerun
^clang-tidy	^Unknown	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves pytorch#24748 ghstack-source-id: fdc9cbd Pull Request resolved: pytorch#57315

mruberry

Awesome! @lezcano would you like to sanity check this, too?

lezcano

Left a few comments, none of them too important.

lezcano · 2021-05-03T14:35:39Z

+        other_matrix_shape = (m, n) if left else (n, m)
+        other = make_tensor((*batch, *other_matrix_shape), device, dtype, requires_grad=requires_grad)
+        kwargs = {"left": left, "transpose": transpose}
+        sample_inputs.append(SampleInput(reflectors, args=(tau, other,), kwargs=kwargs))


Prefer writing it as a generator, so that when OpInfos accept generators it's easier to port. See

pytorch/torch/testing/_internal/common_methods_invocations.py

Lines 1626 to 1653 in 51fc406

def gen_inputs():

# Generic inputs

tgt_gen = (make_arg((S, S), noncontiguous=not ctg) for ctg in (True, False))

src_gen = (make_arg((S,), noncontiguous=not ctg) for ctg in (True, False))

idx = torch.randperm(S * S, device=device, dtype=torch.int64)[:S]

idx_nonctg = torch.repeat_interleave(idx, 2, dim=-1)[::2]

idx_neg = -idx - 1

idx_list = [idx, idx_nonctg, idx_neg]

for tgt, idx, src, acc in product(tgt_gen, idx_list, src_gen, (True, False)):

yield SampleInput(input=tgt, args=(idx, src, acc))

# Scalar cases

scalar_sizes = [(), (1,)]

tgt_gen = (make_arg(size) for size in scalar_sizes)

idx_gen = (make_idx(size, high=1) for size in scalar_sizes)

src_gen = (make_arg(size) for size in scalar_sizes)

for tgt, idx, src, acc in product(tgt_gen, idx_gen, src_gen, (True, False)):

yield SampleInput(input=tgt, args=(idx, src, acc))

# Empty cases

tgt_sizes = [(0,), (), (1,), (3, 2)]

tgt_gen = (make_arg(size) for size in tgt_sizes)

idx = make_idx((0,), high=1)

src = make_arg((0,))

for tgt, acc in product(tgt, (True, False)):

yield SampleInput(input=tgt, args=(idx, src, acc))

return list(gen_inputs())

lezcano · 2021-05-03T14:38:23Z

 add_docstr(torch.ormqr,
           r"""
-ormqr(input, input2, input3, left=True, transpose=False) -> Tensor
+ormqr(input, tau, other, left=True, transpose=False, *, out=None) -> Tensor


Thank you for correcting and improving the docs!!

lezcano · 2021-05-03T14:39:43Z


    return samples

+def sample_inputs_ormqr(op_info, device, dtype, requires_grad):


Consider defining a helper function of the form:
make_arg = partial(make_tensor, dtype=dtype, device=device, requires_grad=requires_grad)

lezcano · 2021-05-03T14:57:06Z

+
+.. seealso::
+
+        :func:`torch.geqrf` can be used to form the Householder representation of matrix `Q`


"the Householder representation of matrix Q" -> "a Householder representation (input, tau) of the matrix Q

lezcano · 2021-05-03T15:12:09Z

+  TORCH_CHECK(other.dim() >= 2, "torch.ormqr: other must have at least 2 dimensions.");
+
+  int64_t left_size_condition = left ? -2 : -1;
+  TORCH_CHECK(


missing input.size(-2) >= input.size(-1)?

No, as you see from the tests passing this function works both for m >= n and m < n matrices.
For the case with m < n only the first m columns that represent m Householder vectors are used by LAPACK.

In [1]: import torch In [2]: a = torch.randn(3, 5) In [3]: h, tau = torch.geqrf(a) In [4]: h.shape Out[4]: torch.Size([3, 5]) In [5]: c = torch.randn(3, 7) In [6]: res = torch.ormqr(h, tau, c) In [7]: q, _ = torch.linalg.qr(a) In [8]: torch.allclose(q @ c, res) Out[8]: True

Then we should add that the case m < n to the docs.

Why do you think the case m < n is special?

Because in householder_product is not even considered:
https://pytorch.org/docs/master/generated/torch.linalg.householder_product.html

Well, householder_product is related but it's a different function. It has the same constraints on the input as the original orgqr implementation had:

pytorch/aten/src/TH/generic/THTensorLapack.cpp

Line 341 in 27f7d1c

THArgCheck(m >= n, 1, "input.size(0) must be greater than or equal to input.size(1)");

and LAPACK's orgqr has this m>=n requirement, but n there is the number of columns of Q to be computed, not n=input.shape[-1]. It has a side effect that the output of torch.geqrf can't be directly used with torch.orgqr, but it can be used with ormqr with the current implementation.

In [1]: import torch In [2]: a = torch.randn(3, 5) In [3]: h, tau = torch.geqrf(a) In [4]: c = torch.eye(3) In [5]: torch.ormqr(h, tau, c) # narrow of `h` is not required Out[5]: tensor([[-0.1468, 0.8266, -0.5433], [-0.0029, -0.5496, -0.8354], [-0.9892, -0.1211, 0.0831]]) In [6]: torch.linalg.qr(a)[0] Out[6]: tensor([[-0.1468, 0.8266, -0.5433], [-0.0029, -0.5496, -0.8354], [-0.9892, -0.1211, 0.0831]]) In [7]: torch.linalg.householder_product(h, tau) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-7-70a58677659d> in <module> ----> 1 torch.linalg.householder_product(h, tau) RuntimeError: torch.linalg.householder_product: input.shape[-2] must be greater than or equal to input.shape[-1] In [8]: torch.linalg.householder_product(h.narrow(-1, 0, 3), tau) # narrow is required here Out[8]: tensor([[-0.1468, 0.8266, -0.5433], [-0.0029, -0.5496, -0.8354], [-0.9892, -0.1211, 0.0831]])

I do not have any strong opinions on how to handle this, as I really think that these functions are too low level to form part of the python API.

That being said, it's quite annoying that we have an h, tau decomposition in several functions, and that each of them has slightly different requirements.

Even more, the seealso section of householder_reflection should be updated to reflect this dissonance.

In every place where we have a multiplication of Q with some other matrix, using this function should be more efficient. In PyTorch Python code, there is for example one place it could be used, in the implementation of lobpcg the Q matrix is explicitly formed and then used in multiplication:

pytorch/torch/_lobpcg.py

Line 878 in c371542

P = mm(S_, mm(Z[:, n - nc:], _utils.basis(_utils.transpose(Z[:n - nc, n - nc:]))))

pytorch/torch/_linalg_utils.py

Lines 80 to 88 in c371542

def basis(A):

"""Return orthogonal basis of A columns.

"""

if A.is_cuda:

# torch.orgqr is not available in CUDA

Q, _ = torch.qr(A, some=True)

else:

Q = torch.orgqr(*torch.geqrf(A))

return Q

To make using this function more accessible, the documentation should be improved, the name should be changed and autograd support added. If we do that, we can revisit the constraints on the input and make it consistent.

That's spot on. Let's do that in a follow-up after the branch-cut.

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves #24748 [ghstack-poisoned]

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves pytorch#24748 ghstack-source-id: 4bf58ae Pull Request resolved: pytorch#57315

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves #24748 [ghstack-poisoned]

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves pytorch#24748 ghstack-source-id: e9cd751 Pull Request resolved: pytorch#57315

IvanYashchuk · 2021-05-04T18:14:06Z

@lezcano, thank you for your feedback! I've updated this PR according to your suggestions and I expanded a bit the documentation on the sizes of inputs added a "Raises:" section, could you please take a look?

lezcano

I just left a couple comments. Just stylistic points.
In any case, this PR is ready to be merged.

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves #24748 [ghstack-poisoned]

This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves pytorch#24748 ghstack-source-id: 6df2e90 Pull Request resolved: pytorch#57315

IvanYashchuk · 2021-05-04T21:34:06Z

@mruberry, I think this stack is ready to be merged. Could you please take another look?

I also moved the cuBLAS path for lstsq PR to this stack after the cuSOLVER PR.

mruberry · 2021-05-06T00:04:03Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-05-06T05:45:29Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-06T11:46:10Z

@mruberry merged this pull request in 59d794b.

Summary: Pull Request resolved: pytorch#57315 This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves pytorch#24748 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242070 Pulled By: mruberry fbshipit-source-id: f070bb6ac2f5a3269b163b22f7354e9089ed3061

IvanYashchuk requested a review from ezyang as a code owner April 29, 2021 22:22

facebook-github-bot added the cla signed label Apr 29, 2021

IvanYashchuk added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: porting Issues related to porting TH/THNN legacy to ATen native labels Apr 29, 2021

IvanYashchuk requested review from mruberry and removed request for ezyang April 29, 2021 22:24

pytorchbot added the open source label Apr 29, 2021

mruberry approved these changes May 1, 2021

View reviewed changes

mruberry requested a review from lezcano May 1, 2021 23:36

lezcano reviewed May 3, 2021

View reviewed changes

lezcano approved these changes May 4, 2021

View reviewed changes

Comment thread torch/_torch_docs.py

Comment thread torch/_torch_docs.py

Comment thread torch/_torch_docs.py

Comment thread torch/_torch_docs.py

IvanYashchuk mentioned this pull request May 4, 2021

Added cuBLAS path for torch.linalg.lstsq #54725

Closed

ngimel mentioned this pull request May 6, 2021

Roll-up: remaining TH functions #49421

Closed

14 tasks

facebook-github-bot closed this in 59d794b May 6, 2021

facebook-github-bot added the Merged label May 6, 2021

facebook-github-bot deleted the gh/ivanyashchuk/27/head branch May 9, 2021 14:17

	def gen_inputs():
	# Generic inputs
	tgt_gen = (make_arg((S, S), noncontiguous=not ctg) for ctg in (True, False))
	src_gen = (make_arg((S,), noncontiguous=not ctg) for ctg in (True, False))
	idx = torch.randperm(S * S, device=device, dtype=torch.int64)[:S]
	idx_nonctg = torch.repeat_interleave(idx, 2, dim=-1)[::2]
	idx_neg = -idx - 1
	idx_list = [idx, idx_nonctg, idx_neg]
	for tgt, idx, src, acc in product(tgt_gen, idx_list, src_gen, (True, False)):
	yield SampleInput(input=tgt, args=(idx, src, acc))

	# Scalar cases
	scalar_sizes = [(), (1,)]
	tgt_gen = (make_arg(size) for size in scalar_sizes)
	idx_gen = (make_idx(size, high=1) for size in scalar_sizes)
	src_gen = (make_arg(size) for size in scalar_sizes)
	for tgt, idx, src, acc in product(tgt_gen, idx_gen, src_gen, (True, False)):
	yield SampleInput(input=tgt, args=(idx, src, acc))

	# Empty cases
	tgt_sizes = [(0,), (), (1,), (3, 2)]
	tgt_gen = (make_arg(size) for size in tgt_sizes)
	idx = make_idx((0,), high=1)
	src = make_arg((0,))
	for tgt, acc in product(tgt, (True, False)):
	yield SampleInput(input=tgt, args=(idx, src, acc))

	return list(gen_inputs())


		return samples

		def sample_inputs_ormqr(op_info, device, dtype, requires_grad):


		.. seealso::

		:func:`torch.geqrf` can be used to form the Householder representation of matrix `Q`

	def basis(A):
	"""Return orthogonal basis of A columns.
	"""
	if A.is_cuda:
	# torch.orgqr is not available in CUDA
	Q, _ = torch.qr(A, some=True)
	else:
	Q = torch.orgqr(*torch.geqrf(A))
	return Q

Conversation

IvanYashchuk commented Apr 29, 2021 • edited by mruberry Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

2 failures not recognized by patterns:

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IvanYashchuk commented May 4, 2021

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IvanYashchuk commented May 4, 2021

Uh oh!

mruberry commented May 6, 2021

Uh oh!

mruberry commented May 6, 2021

Uh oh!

facebook-github-bot commented May 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

IvanYashchuk commented Apr 29, 2021 •

edited by mruberry

Loading

facebook-github-bot commented Apr 29, 2021 •

edited

Loading