Skip to content

[JIT] nvfuser CI fixes#75116

Closed
davidberard98 wants to merge 3 commits intogh/davidberard98/81/basefrom
gh/davidberard98/81/head
Closed

[JIT] nvfuser CI fixes#75116
davidberard98 wants to merge 3 commits intogh/davidberard98/81/basefrom
gh/davidberard98/81/head

Conversation

@davidberard98
Copy link
Copy Markdown
Contributor

@davidberard98 davidberard98 commented Apr 1, 2022

Stack from ghstack:

* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Apr 1, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 98f2f4b (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-04-04T18:51:32.9546083Z FAIL [5.376s]: test_fft_plan_repeatable_cuda (__main__.TestFFTCUDA)
2022-04-04T18:51:32.5945701Z   test_reference_nd_fft_rfftn_cuda_float64 (__main__.TestFFTCUDA) ... skip: test doesn't currently work on the ROCm stack (0.003s)
2022-04-04T18:51:32.7157732Z   test_stft_cuda_float64 (__main__.TestFFTCUDA) ... /opt/conda/lib/python3.7/site-packages/torch/functional.py:727: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at  /var/lib/jenkins/workspace/aten/src/ATen/native/SpectralOps.cpp:795.)
2022-04-04T18:51:32.7159465Z   normalized, onesided, return_complex)
2022-04-04T18:51:32.7331806Z ok (0.139s)
2022-04-04T18:51:32.7363087Z   test_stft_requires_complex_cuda (__main__.TestFFTCUDA) ... ok (0.003s)
2022-04-04T18:51:32.8261152Z   test_stft_roundtrip_complex_window_cuda_complex128 (__main__.TestFFTCUDA) ... ok (0.090s)
2022-04-04T18:51:32.9364390Z   test_stft_roundtrip_complex_window_cuda_float64 (__main__.TestFFTCUDA) ... ok (0.110s)
2022-04-04T18:51:32.9538360Z   test_stft_window_device_cuda (__main__.TestFFTCUDA) ... ok (0.017s)
2022-04-04T18:51:32.9544896Z 
2022-04-04T18:51:32.9545216Z ======================================================================
2022-04-04T18:51:32.9546083Z FAIL [5.376s]: test_fft_plan_repeatable_cuda (__main__.TestFFTCUDA)
2022-04-04T18:51:32.9547607Z ----------------------------------------------------------------------
2022-04-04T18:51:32.9548521Z Traceback (most recent call last):
2022-04-04T18:51:32.9549964Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1780, in wrapper
2022-04-04T18:51:32.9551010Z     method(*args, **kwargs)
2022-04-04T18:51:32.9552519Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-04-04T18:51:32.9553641Z     result = test(self, **param_kwargs)
2022-04-04T18:51:32.9555074Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 973, in only_fn
2022-04-04T18:51:32.9556123Z     return fn(self, *args, **kwargs)
2022-04-04T18:51:32.9557765Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 815, in dep_fn
2022-04-04T18:51:32.9558804Z     return fn(slf, *args, **kwargs)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

davidberard98 added a commit that referenced this pull request Apr 1, 2022
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

ghstack-source-id: 051ce67
Pull Request resolved: #75116
@davidberard98 davidberard98 linked an issue Apr 1, 2022 that may be closed by this pull request
9 tasks
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 2, 2022
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

ghstack-source-id: 8f8d6ac
Pull Request resolved: #75116
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 4, 2022
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

ghstack-source-id: 85fabc6
Pull Request resolved: #75116
Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, unfortunately the reduction issues are not repro'ing on my local Pascal card, maybe I got lucky with the heuristics....

torch._C._jit_nvfuser_clear_comparison_callback()

class TestPassManagerCudaFuser(JitTestCase):
def setUp(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have some other test that leaked the flag and accidentally left nvfuser as enabled, or is it some thread racing that other tests running in parallel are turning the nvfuser switch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another test leaked the flag (my fault, fixed it on line 159)

Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to stamp

@davidberard98
Copy link
Copy Markdown
Contributor Author

LGTM, unfortunately the reduction issues are not repro'ing on my local Pascal card, maybe I got lucky with the heuristics....

@jjsjann123 I was assuming it was safe to skip since the other reduction tests are marked with the same, lmk if that's not accurate

@jjsjann123
Copy link
Copy Markdown
Collaborator

LGTM, unfortunately the reduction issues are not repro'ing on my local Pascal card, maybe I got lucky with the heuristics....

@jjsjann123 I was assuming it was safe to skip since the other reduction tests are marked with the same, lmk if that's not accurate

No it looks fine to me as well. I need to double check what went wrong with the repro on my local machine... This PR looks good to merge.

Copy link
Copy Markdown
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidberard98
Copy link
Copy Markdown
Contributor Author

@pytorchmergebot merge this please

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2022

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@davidberard98 davidberard98 added the topic: not user facing topic category label Apr 4, 2022
facebook-github-bot pushed a commit that referenced this pull request Apr 5, 2022
Summary:
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

Pull Request resolved: #75116

Approved by: https://github.com/jjsjann123, https://github.com/eellison

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/83400e836ebbb43d0b5b3c8b881288ed50bf4803

Reviewed By: b0noI

Differential Revision: D35404315

Pulled By: davidberard98

fbshipit-source-id: cd467428a4940f58af78705640443ad7d280a22e
@jjsjann123
Copy link
Copy Markdown
Collaborator

LGTM, unfortunately the reduction issues are not repro'ing on my local Pascal card, maybe I got lucky with the heuristics....

Just for the record, I've been really dumb and somehow messed the device number on my machine. Unsurprisingly I've been running all my tests on a volta card instead of the pascal card.... No wonder why it does not repro.... 😮‍💨

@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/81/head branch April 8, 2022 14:17
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
* test_native_batch_norm_backward
* test_reduction_empty_axes
* test_register_fuser
* test_category_rule

Pull Request resolved: pytorch#75116

Approved by: https://github.com/jjsjann123, https://github.com/eellison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2022-03-31 disabled nvfuser tests

4 participants