Skip to content

Dispatch to mv rather than mm in the case that tensor1.ndim == 1 and tensor2.ndim == 2#75195

Closed
lezcano wants to merge 10 commits intogh/Lezcano/57/basefrom
gh/Lezcano/57/head
Closed

Dispatch to mv rather than mm in the case that tensor1.ndim == 1 and tensor2.ndim == 2#75195
lezcano wants to merge 10 commits intogh/Lezcano/57/basefrom
gh/Lezcano/57/head

Conversation

@lezcano
Copy link
Collaborator

@lezcano lezcano commented Apr 4, 2022

…tensor2.ndim == 2

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 4, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit debf0bf (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-04T20:36:24.5572937Z �[0;31m[ FAILED ] �[mAtenXlaTensorTest.TestMatmul_1x2
2022-05-04T20:36:24.5513452Z �[0;32m[ RUN      ] �[mXlaUtilCacheTest.BasicTest
2022-05-04T20:36:24.5514102Z �[0;32m[       OK ] �[mXlaUtilCacheTest.BasicTest (0 ms)
2022-05-04T20:36:24.5517062Z �[0;32m[----------] �[m1 test from XlaUtilCacheTest (0 ms total)
2022-05-04T20:36:24.5517392Z 
2022-05-04T20:36:24.5570100Z �[0;32m[----------] �[mGlobal test environment tear-down
2022-05-04T20:36:24.5570786Z �[0;32m[==========] �[m623 tests from 8 test suites ran. (512555 ms total)
2022-05-04T20:36:24.5571192Z �[0;32m[  PASSED  ] �[m621 tests.
2022-05-04T20:36:24.5571554Z �[0;32m[  SKIPPED ] �[m1 test, listed below:
2022-05-04T20:36:24.5572040Z �[0;32m[  SKIPPED ] �[mAtenXlaTensorTest.TestGroupNormBackward
2022-05-04T20:36:24.5572501Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2022-05-04T20:36:24.5572937Z �[0;31m[  FAILED  ] �[mAtenXlaTensorTest.TestMatmul_1x2
2022-05-04T20:36:24.5573171Z 
2022-05-04T20:36:24.5573268Z  1 FAILED TEST
2022-05-04T20:36:24.7276308Z + cleanup
2022-05-04T20:36:24.7276526Z + retcode=1
2022-05-04T20:36:24.7276685Z + set +x
2022-05-04T20:36:24.7569431Z ##[error]Process completed with exit code 1.
2022-05-04T20:36:24.7765995Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-04T20:36:24.7766245Z with:
2022-05-04T20:36:24.7766611Z   github-token: ***
2022-05-04T20:36:24.7766792Z env:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@lezcano lezcano mentioned this pull request Apr 4, 2022
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@lezcano
Copy link
Collaborator Author

lezcano commented Apr 4, 2022

@ezyang This PR shows (or will show, once the CI of the previous PRs is green) that this is the line that's causing trouble.

We get this error: https://github.com/pytorch/pytorch/runs/5820894454?check_suite_focus=true in the distributed build. I have tried to track down what could be causing it, and following the execution of the trace that it throws, I don't see how that state could be reached. It's also particularly odd that this just happens in some random test in a distributed setting...

…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Apr 5, 2022

Can you repro it locally?

I looked over the patch and it looks... fine-ish? The contiguous changes look a little skeevy

@lezcano
Copy link
Collaborator Author

lezcano commented Apr 5, 2022

I could not reproduce it locally. This PR should just be one line. I pushed some further changes to start debugging it via CI, but this is certainly not the best approach.

Once we had access to sshing into the jobs, but J believe that's not working atm...

lezcano added 2 commits April 5, 2022 18:03
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@lezcano lezcano mentioned this pull request May 4, 2022
@ngimel
Copy link
Collaborator

ngimel commented May 4, 2022

@pytorchbot merge this

facebook-github-bot pushed a commit that referenced this pull request May 6, 2022
…tensor2.ndim == 2 (#75195)

Summary:
This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

Pull Request resolved: #75195

Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4baf7c0899a2fa9c3630613f37d5fc65971db21c

Reviewed By: malfet

Differential Revision: D36171145

fbshipit-source-id: f6054f25940e6a8ad9de0e7576def098b15e9c3f
@facebook-github-bot facebook-github-bot deleted the gh/Lezcano/57/head branch May 8, 2022 14:16
zou3519 added a commit that referenced this pull request Jul 29, 2022
…= 1 and tensor2.ndim == 2 (#75195)"

This reverts commit d024d26.

ghstack-source-id: d5daa0c
Pull Request resolved: #82484
zou3519 added a commit that referenced this pull request Jul 29, 2022
…= 1 and tensor2.ndim == 2 (#75195)"

This reverts commit d024d26.

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jul 29, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Why is this a partial revert?
- the out= tests for nn.functional.linear fail on a complete revert
- the profiler tests fail on the revert (so I updated the expecttests
for the profiler tests)

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jul 29, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Why is this a partial revert?
- the out= tests for nn.functional.linear fail on a complete revert
- the profiler tests fail on the revert (so I updated the expecttests
for the profiler tests)

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: 1332851
Pull Request resolved: #82504
@zou3519 zou3519 mentioned this pull request Aug 1, 2022
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: ff73cb6
Pull Request resolved: #82504
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: f127f9b
Pull Request resolved: #82504
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: ea72d0f
Pull Request resolved: #82504
pytorchmergebot pushed a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
@zou3519 zou3519 mentioned this pull request Aug 2, 2022
atalman pushed a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
facebook-github-bot pushed a commit that referenced this pull request Aug 4, 2022
Summary:
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/8f86e361918e3c8a0ee2b569be6c82dfbf32d705

Test plan from GitHub:
- test offline that the functorch regression was fixed

Reviewed By: kit1980

Differential Revision: D38394573

Pulled By: zou3519

fbshipit-source-id: f9185d9cb447fb439d8e402712f2f2617f73b8cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source topic: performance topic category with-ssh

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants