Skip to content

Revert #75195#82504

Closed
zou3519 wants to merge 5 commits intogh/zou3519/455/basefrom
gh/zou3519/455/head
Closed

Revert #75195#82504
zou3519 wants to merge 5 commits intogh/zou3519/455/basefrom
gh/zou3519/455/head

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Jul 29, 2022

Stack from ghstack:

This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:

Test Plan:

  • test offline that the functorch regression was fixed

This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Why is this a partial revert?
- the out= tests for nn.functional.linear fail on a complete revert
- the profiler tests fail on the revert (so I updated the expecttests
for the profiler tests)

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jul 29, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Why is this a partial revert?
- the out= tests for nn.functional.linear fail on a complete revert
- the profiler tests fail on the revert (so I updated the expecttests
for the profiler tests)

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: 1332851
Pull Request resolved: #82504
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 29, 2022

🔗 Helpful links

❌ 3 New Failures

As of commit 21f8156 (more details on the Dr. CI page):

Expand to see more
  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/3)

Step: "Test" (full log | diagnosis details)

2022-08-02T15:03:36.5559418Z RuntimeError: test_ops_gradients failed!
2022-08-02T15:03:36.1263455Z =========================== short test summary info ===========================
2022-08-02T15:03:36.1263797Z FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_fwgrad_bwgrad_nn_functional_prelu_cpu_float64
2022-08-02T15:03:36.1264122Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-02T15:03:36.1264398Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!
2022-08-02T15:03:36.1264715Z = 1 failed, 320 passed, 1000 skipped, 7 xfailed, 49 warnings, 2 rerun in 49.13s =
2022-08-02T15:03:36.5558160Z Traceback (most recent call last):
2022-08-02T15:03:36.5558501Z   File "run_test.py", line 974, in <module>
2022-08-02T15:03:36.5558718Z     main()
2022-08-02T15:03:36.5558937Z   File "run_test.py", line 952, in main
2022-08-02T15:03:36.5559179Z     raise RuntimeError(err_message)
2022-08-02T15:03:36.5559418Z RuntimeError: test_ops_gradients failed!
2022-08-02T15:03:36.7478105Z 
2022-08-02T15:03:36.7478540Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-08-02T15:03:36.7480438Z 
2022-08-02T15:03:36.7480662Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-08-02T15:03:36.7533965Z ##[error]Process completed with exit code 1.
2022-08-02T15:03:36.7663606Z Prepare all required actions
2022-08-02T15:03:36.7664123Z Getting action download info
2022-08-02T15:03:36.9479400Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-02T15:03:37.1106696Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-02T15:03:37.1106918Z with:

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (2/3)

Step: "Test" (full log | diagnosis details)

2022-08-02T14:55:34.7019160Z AssertionError: Th...functional.prelu on device type cpu are incorrect!
2022-08-02T14:55:34.7008817Z [gw2] [  1%] FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_prelu_cpu 
2022-08-02T14:55:34.7009011Z 
2022-08-02T14:55:34.7009127Z ================================== FAILURES ===================================
2022-08-02T14:55:34.7009429Z ___________ TestCommonCPU.test_dtypes__refs_nn_functional_prelu_cpu ___________
2022-08-02T14:55:34.7009726Z [gw2] win32 -- Python 3.8.13 C:\Jenkins\Miniconda3\python.exe
2022-08-02T14:55:34.7010068Z Traceback (most recent call last):
2022-08-02T14:55:34.7017724Z   File "C:\actions-runner\_work\pytorch\pytorch\test\test_ops.py", line 1230, in test_dtypes
2022-08-02T14:55:34.7018224Z     self.fail(msg)
2022-08-02T14:55:34.7018517Z   File "C:\Jenkins\Miniconda3\lib\unittest\case.py", line 753, in fail
2022-08-02T14:55:34.7018823Z     raise self.failureException(msg)
2022-08-02T14:55:34.7019160Z AssertionError: The supported dtypes for _refs.nn.functional.prelu on device type cpu are incorrect!
2022-08-02T14:55:34.7019578Z The following dtypes did not work in forward but are listed by the OpInfo: {torch.float64, torch.float32, torch.bfloat16}.
2022-08-02T14:55:34.7019804Z 
2022-08-02T14:55:34.7020026Z - generated xml file: C:\actions-runner\_work\pytorch\pytorch\test\test-reports\python-pytest\test_ops\test_ops.xml -
2022-08-02T14:55:34.7020398Z =========================== short test summary info ===========================
2022-08-02T14:55:34.7020703Z FAILED test_ops.py::TestCommonCPU::test_dtypes__refs_nn_functional_prelu_cpu
2022-08-02T14:55:34.7021002Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-02T14:55:34.7021284Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!
2022-08-02T14:55:34.7021586Z ======= 1 failed, 241 passed, 7 skipped, 42 warnings, 2 rerun in 20.11s =======
2022-08-02T14:55:35.2791964Z Traceback (most recent call last):
2022-08-02T14:55:35.2792329Z   File "run_test.py", line 974, in <module>

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (functorch, 1, 1, windows.4xlarge) (3/3)

Step: "Test" (full log | diagnosis details)

2022-08-02T15:10:15.0545509Z RuntimeError: C:\a...rk\pytorch\pytorch\functorch\test\test_ops failed!
2022-08-02T15:10:14.3301728Z 
2022-08-02T15:10:14.3301865Z FAILED (errors=10, skipped=1620, expected failures=304)
2022-08-02T15:10:14.3302013Z 
2022-08-02T15:10:14.3302099Z Generating XML reports...
2022-08-02T15:10:14.3302468Z Generated XML report: test-reports\python-unittest\functorch\test\test_ops\TEST-TestOperatorsCPU-20220802145642.xml
2022-08-02T15:10:15.0544176Z Traceback (most recent call last):
2022-08-02T15:10:15.0544527Z   File "run_test.py", line 974, in <module>
2022-08-02T15:10:15.0544723Z     main()
2022-08-02T15:10:15.0544932Z   File "run_test.py", line 952, in main
2022-08-02T15:10:15.0545203Z     raise RuntimeError(err_message)
2022-08-02T15:10:15.0545509Z RuntimeError: C:\actions-runner\_work\pytorch\pytorch\functorch\test\test_ops failed!
2022-08-02T15:10:15.3612236Z 
2022-08-02T15:10:15.3613009Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2022-08-02T15:10:15.3617033Z 
2022-08-02T15:10:15.3617252Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 goto fail 
2022-08-02T15:10:15.3619272Z 
2022-08-02T15:10:15.3619442Z (base) C:\actions-runner\_work\pytorch\pytorch>exit /b 1 
2022-08-02T15:10:15.3679592Z ##[error]Process completed with exit code 1.
2022-08-02T15:10:15.3823661Z Prepare all required actions
2022-08-02T15:10:15.3824261Z Getting action download info
2022-08-02T15:10:15.5337051Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@zou3519 zou3519 requested review from ezyang and ngimel July 29, 2022 22:50
@zou3519 zou3519 added this to the 1.12.1 milestone Jul 29, 2022
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should dispatch both the out and the non out version to mm. Otherwise the divergence may bite us in the future

@zou3519
Copy link
Contributor Author

zou3519 commented Aug 1, 2022

cc @JackCaoG for the XLA failure. Since this PR is doing a revert, we probably just need to revert pytorch/xla#3154 on the XLA side

@JackCaoG
Copy link
Collaborator

JackCaoG commented Aug 1, 2022

@wonjoolee95 Can you help revert pytorch/xla#3154 and land with this pr?

@zou3519
Copy link
Contributor Author

zou3519 commented Aug 1, 2022

I think we should dispatch both the out and the non out version to mm. Otherwise the divergence may bite us in the future

SGTM, let me do that in this PR

This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
@zou3519 zou3519 changed the title Partially revert #75195 Revert #75195 Aug 1, 2022
@zou3519 zou3519 requested a review from mruberry as a code owner August 1, 2022 18:25
@wonjoo-wj
Copy link
Collaborator

Hey @zou3519, we opened pytorch/xla#3814 on XLA's end. Can we update the XLA pin to 73c64a55fb096f1e132029d3decbb6f4e532cc7b on this PR?

@zou3519
Copy link
Contributor Author

zou3519 commented Aug 1, 2022

Yup, let me do that

This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: ff73cb6
Pull Request resolved: #82504
@zou3519
Copy link
Contributor Author

zou3519 commented Aug 1, 2022

@wonjoolee95 -- what is the right order to merging {this pr, the xla-side pr} ?

@wonjoo-wj
Copy link
Collaborator

@wonjoolee95 -- what is the right order to merging {this pr, the xla-side pr} ?

When the CIs are all green on both ends, we can merge this PR first (since XLA is pinned to the fix commit already) and then the XLA PR.

This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: f127f9b
Pull Request resolved: #82504
@zou3519 zou3519 requested a review from a team as a code owner August 1, 2022 20:13
@zou3519
Copy link
Contributor Author

zou3519 commented Aug 2, 2022

@pytorchbot merge -f "cpp docs timed out"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Merge failed due to Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x b2bc557f6ecc4f669c46232421ffe69a4fc512f8 returned non-zero exit code 1

Auto-merging .github/ci_commit_pins/xla.txt
CONFLICT (content): Merge conflict in .github/ci_commit_pins/xla.txt
Auto-merging torch/testing/_internal/common_methods_invocations.py
error: could not apply b2bc557f6e... Revert #75195
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

Raised by https://github.com/pytorch/pytorch/actions/runs/2782784605

This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: ea72d0f
Pull Request resolved: #82504
@zou3519
Copy link
Contributor Author

zou3519 commented Aug 2, 2022

@pytorchbot merge -f "windows failures are on trunk"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2022

Hey @zou3519.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
@zou3519 zou3519 mentioned this pull request Aug 2, 2022
atalman pushed a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
facebook-github-bot pushed a commit that referenced this pull request Aug 4, 2022
Summary:
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/8f86e361918e3c8a0ee2b569be6c82dfbf32d705

Test plan from GitHub:
- test offline that the functorch regression was fixed

Reviewed By: kit1980

Differential Revision: D38394573

Pulled By: zou3519

fbshipit-source-id: f9185d9cb447fb439d8e402712f2f2617f73b8cc
@facebook-github-bot facebook-github-bot deleted the gh/zou3519/455/head branch August 6, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants