Added OpInfo-based testing of some linalg functions by IvanYashchuk · Pull Request #51107 · pytorch/pytorch

IvanYashchuk · 2021-01-26T11:36:13Z

Added OpInfo-based testing of the following linear algebra functions:

cholesky, linalg.cholesky
linalg.eigh
inverse, linalg.inv
qr, linalg.qr
solve

The output of torch.linalg.pinv for empty inputs was not differentiable, now it's fixed.

In some cases, batched grad checks are disabled because it doesn't work well with 0x0 matrices (see #50743 (comment)).

Ref. #50006

facebook-github-bot · 2021-01-26T11:36:34Z

💊 CI failures summary and remediations

As of commit 558d744 (more details on the Dr. CI page):

1/3 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)
2/3 broken upstream at merge base fe08671 on Mar 12 from 2:49am to 4:28pm

🚧 2 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

pytorch_linux_xenial_py3_clang7_onnx_ort_test2 on Mar 12 from 4:00am to 4:28pm (a51f130 - 5b648ef)
- 🔁 rerun
pytorch_linux_xenial_py3_clang7_onnx_ort_test1 on Mar 12 from 2:49am to 4:21pm (ee4ce8e - 5b648ef)
- 🔁 rerun

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.0.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

IvanYashchuk · 2021-01-26T11:39:07Z

Tests for QR decomposition are quite slow

========================================================= slowest 20 durations =========================================================
32.76s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_qr_cuda_complex128
32.08s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_linalg_qr_cuda_complex128
16.98s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_qr_cuda_float64
15.77s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_solve_cuda_complex64
15.09s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_solve_cuda_complex128
14.89s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_qr_cuda_float32
13.98s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_solve_cuda_float32
13.53s call     test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_flip_cuda_complex128
13.19s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_qr_cuda_complex64
13.05s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_svd_cuda_complex128
13.04s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_linalg_svd_cuda_complex128
12.51s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_svd_cuda_float64
11.87s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_qr_cuda_complex128
10.38s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_solve_cuda_float32
10.06s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_solve_cuda_complex64
9.71s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_solve_cuda_complex128
9.08s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_qr_cuda_float64
9.05s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_linalg_qr_cuda_float64
8.80s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_qr_cuda_float64
8.68s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_svd_cuda_float32
===================================== 5139 passed, 1009 skipped, 61 warnings in 1069.53s (0:17:49) =====================================

IvanYashchuk · 2021-01-26T14:51:44Z

The failing tests should be fixed after merging #51109.

codecov · 2021-02-02T18:15:34Z

Codecov Report

Merging #51107 (8c68fd5) into master (aeb3e93) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #51107      +/-   ##
==========================================
+ Coverage   77.29%   77.30%   +0.01%     
==========================================
  Files        1888     1888              
  Lines      183512   183613     +101     
==========================================
+ Hits       141852   141951      +99     
- Misses      41660    41662       +2

mruberry · 2021-03-03T06:23:38Z

+           op=torch.solve,
+           dtypes=floating_and_complex_types(),
+           test_inplace_grad=False,
+           # TODO: TypeError: empty_like(): argument 'input' (position 1) must be Tensor, not torch.return_types.solve


I appreciate this TODO but this is more a failure of our out= testing (which we should be updating soonish) and not an issue with torch.solve().

mruberry

Awesome! Nice work, @IvanYashchuk. I really appreciate how thorough and consistent you were with documentation. It makes the code much more readable and easier to maintain.

This needs a rebase. Just ping me when it's ready to merge.

IvanYashchuk · 2021-03-03T07:54:14Z

@mruberry I updated and rebased this PR. It's ready to merge.

mruberry · 2021-03-03T19:00:10Z

The ROCm test failures on this PR are interesting and linalg related. Would you take a look at them before we merge this?

IvanYashchuk · 2021-03-03T21:14:48Z

It was my fault not being careful enough when resolving the merge conflict and I accidentally removed skipCUDAIfRocm for logdet. Let's wait for CI now.

ROCm fails with RuntimeError: magma: The value of work_size(-9223372036854775808) is too large to fit into a magma_int_t (4 bytes)

IvanYashchuk · 2021-03-11T20:02:08Z

@mruberry I updated this pull request with the recent changes for "out" testing. I also added all relevant ROCm skips, which will be fixed sometime later. Could you take a look once again and hopefully merge this?

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-03-12T08:21:44Z

I'll try to get this landed ASAP. I was concerned about the ROCm failure because it's a timeout, but that test build doesn't appear to be running these tests. The overall test time seems similar to current CI timings.

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-03-14T09:12:53Z

@mruberry merged this pull request in 7df176b.

Summary: Added OpInfo-based testing of the following linear algebra functions: * cholesky, linalg.cholesky * linalg.eigh * inverse, linalg.inv * qr, linalg.qr * solve The output of `torch.linalg.pinv` for empty inputs was not differentiable, now it's fixed. In some cases, batched grad checks are disabled because it doesn't work well with 0x0 matrices (see pytorch#50743 (comment)). Ref. pytorch#50006 Pull Request resolved: pytorch#51107 Reviewed By: albanD Differential Revision: D27006115 Pulled By: mruberry fbshipit-source-id: 3c1d00e3d506948da25d612fb114e6d4a478c5b1

IvanYashchuk added 10 commits January 25, 2021 12:46

Remove unused sample_inputs_pinverse

3eb019e

Use skipCUDAIfNoMagmaAndNoCusolver in OpInfo tests

5e9d503

Added OpInfo for torch.solve

ec4472e

Fixed autograd support of linalg_pinv for empty inputs

95cc854

Updated pseudo inverse op tests

6767c48

Added OpInfo entry for torch.inverse

b63a572

Added OpInfo entry for torch.qr, torch.linalg.qr

5f1fc31

Added OpInfo entry for cholesky

f5c4708

Added OpInfo entry for torch.linalg.eigh

1af0d0e

Fix magma cusolver decorators

a031600

IvanYashchuk added module: tests Issues related to tests (not the torch.testing module) module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Jan 26, 2021

IvanYashchuk requested a review from mruberry January 26, 2021 11:36

facebook-github-bot added the cla signed label Jan 26, 2021

flake8

eccb3cf

pytorchbot added the open source label Jan 26, 2021

IvanYashchuk and others added 4 commits February 1, 2021 09:42

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

40a39b3

Added 0x0 case for cholesky inputs

1824c00

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

ad291d9

Merge branch 'master' into linalg-opinfo-tests

bef168a

This was referenced Feb 3, 2021

Some test_qr CUDA tests in test_autograd are taking very long time #51552

Closed

torch.linalg in PyTorch 1.10 tracker #42666

Closed

IvanYashchuk added 4 commits February 23, 2021 17:18

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

5a3a241

Make lines shorter

580b794

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

3a0222a

Fix merge; add skipCUDAIfRocm

1b22d80

mruberry reviewed Mar 3, 2021

View reviewed changes

mruberry approved these changes Mar 3, 2021

View reviewed changes

IvanYashchuk added 3 commits March 3, 2021 00:47

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

1c4da21

Renamed sample_inputs_linalg_inv -> invertible

0e85bb8

Added default skip to HermitianOpInfo and TriangularOpInfo

5c6707b

Added skipCUDAIfRocm

af89122

IvanYashchuk and others added 9 commits March 5, 2021 04:41

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

84126a7

Added skipCUDAIfRocm for pseudoinverse tests

00835e3

ROCm fails with RuntimeError: magma: The value of work_size(-9223372036854775808) is too large to fit into a magma_int_t (4 bytes)

More skipCUDAIfRocm

76f7124

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

37e7b34

Merge branch 'master' into linalg-opinfo-tests

86ef797

Remove supports_tensor_out

6f301e1

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

1a1acc1

Make non-empty sample input to be the first in the list

82cc62c

Merge remote-tracking branch 'upstream/master' into linalg-opinfo-tests

8c68fd5

facebook-github-bot reviewed Mar 12, 2021

View reviewed changes

IvanYashchuk mentioned this pull request Mar 12, 2021

Fix gradcheck failing for OpInfo with TensorList inputs #53418

Closed

Merge branch 'master' into linalg-opinfo-tests

558d744

facebook-github-bot reviewed Mar 12, 2021

View reviewed changes

facebook-github-bot closed this in 7df176b Mar 14, 2021

facebook-github-bot added the Merged label Mar 14, 2021

Conversation

IvanYashchuk commented Jan 26, 2021

Uh oh!

facebook-github-bot commented Jan 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🚧 2 fixed upstream failures:

ci.pytorch.org: 1 failed

Uh oh!

IvanYashchuk commented Jan 26, 2021

Uh oh!

IvanYashchuk commented Jan 26, 2021

Uh oh!

codecov Bot commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mruberry Mar 3, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Mar 3, 2021

Uh oh!

mruberry commented Mar 3, 2021

Uh oh!

IvanYashchuk commented Mar 3, 2021

Uh oh!

IvanYashchuk commented Mar 11, 2021

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented Mar 12, 2021

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Jan 26, 2021 •

edited

Loading

codecov Bot commented Feb 2, 2021 •

edited

Loading