Skip to content

[ONNX] Fix for constant folding flaky tests#32546

Closed
neginraoof wants to merge 3 commits intopytorch:masterfrom
neginraoof:neraoof/fixConstFoldingFlaky
Closed

[ONNX] Fix for constant folding flaky tests#32546
neginraoof wants to merge 3 commits intopytorch:masterfrom
neginraoof:neraoof/fixConstFoldingFlaky

Conversation

@neginraoof
Copy link
Contributor

@neginraoof neginraoof commented Jan 23, 2020

Fix for constant folding flaky tests
Looks like the constant folding test modules are sometimes exported with ONNX_ATEN op export type, which is causing the CI failures.
I'm unable to repro this issue locally, but my guess is that the op export param is being overwritten on CI build at some point.
This PR sets the op export type and hopefully fixes the issue.

@kostmo
Copy link
Member

kostmo commented Jan 23, 2020

💊 CircleCI build failures summary and remediations

As of commit 3627aad:

  • 1/3 broken upstream at merge base 957a07f since Jan 27

    Please rebase on the viable/strict branch (expand for instructions)

    If your commit is newer than viable/strict, you can try basing on an older, stable commit:

    git fetch origin viable/strict
    git rebase --onto viable/strict $(git merge-base origin/master HEAD)
    

    If your commit is older than viable/strict:

    git fetch origin viable/strict
    git rebase viable/strict
    

    Check out the recency history of this "viable master" tracking branch.

  • 1/3 failures introduced in this PR

  • 1/3 recognized as flaky ❄️

    • Re-run these jobs?

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakage:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Test" (full log | pattern match details)

Jan 27 17:06:27 RuntimeError: test_quantization failed!
Jan 27 17:06:27 Ran 36 tests in 50.351s 
Jan 27 17:06:27  
Jan 27 17:06:27 FAILED (errors=1, skipped=1) 
Jan 27 17:06:27  
Jan 27 17:06:27 Generating XML reports... 
Jan 27 17:06:27 Traceback (most recent call last): 
Jan 27 17:06:27   File "test/run_test.py", line 456, in <module> 
Jan 27 17:06:27     main() 
Jan 27 17:06:27   File "test/run_test.py", line 449, in main 
Jan 27 17:06:27     raise RuntimeError(message) 
Jan 27 17:06:27 RuntimeError: test_quantization failed! 
Jan 27 17:06:27 + cleanup 
Jan 27 17:06:27 + retcode=1 
Jan 27 17:06:27 + set +x 
Jan 27 17:06:27 =================== sccache compilation log =================== 
Jan 27 17:06:27 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Jan 27 17:06:27 Compile requests                  7 
Jan 27 17:06:27 Compile requests executed         6 
Jan 27 17:06:27 Cache hits                        0 
Jan 27 17:06:27 Cache misses                      6 
Jan 27 17:06:27 Cache timeouts                    0 

❄️ 1 failure recognized as flaky

The following build failures have been detected as flaky and may not be your fault:

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_gcc7_test (1/1)

Step: "Test" (full log | pattern match details) ❄️

Jan 27 17:53:28 AssertionError: 10 not less than or equal to 1e-05 :
Jan 27 17:53:28 ---------------------------------------------------------------------- 
Jan 27 17:53:28 Traceback (most recent call last): 
Jan 27 17:53:28   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 174, in wrapper 
Jan 27 17:53:28     self._join_processes(fn) 
Jan 27 17:53:28   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 255, in _join_processes 
Jan 27 17:53:28     self._check_return_codes(elapsed_time) 
Jan 27 17:53:28   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 275, in _check_return_codes 
Jan 27 17:53:28     self.assertEqual(p.exitcode, first_process.exitcode) 
Jan 27 17:53:28   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 892, in assertEqual 
Jan 27 17:53:28     super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 
Jan 27 17:53:28 AssertionError: 10 not less than or equal to 1e-05 :  
Jan 27 17:53:28  
Jan 27 17:53:28 ---------------------------------------------------------------------- 
Jan 27 17:53:28 Ran 117 tests in 150.248s 
Jan 27 17:53:28  
Jan 27 17:53:28 FAILED (failures=1, skipped=7) 
Jan 27 17:53:28  
Jan 27 17:53:28 Generating XML reports... 
Jan 27 17:53:29 Traceback (most recent call last): 
Jan 27 17:53:29   File "test/run_test.py", line 456, in <module> 
Jan 27 17:53:29     main() 

🚧 1 upstream failure recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 6 times.

Copy link
Collaborator

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@neginraoof
Copy link
Contributor Author

@houseroad Can we merge this? Thanks!

Copy link
Member

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this will work.

@facebook-github-bot
Copy link
Contributor

@houseroad merged this pull request in 5c019fe.

BowenBao pushed a commit to BowenBao/pytorch that referenced this pull request Feb 12, 2020
Summary:
Fix for constant folding flaky tests
Looks like the constant folding test modules are sometimes exported with ONNX_ATEN op export type, which is causing the CI failures.
I'm unable to repro this issue locally, but my guess is that the op export param is being overwritten on CI build at some point.
This PR sets the op export type and hopefully fixes the issue.
Pull Request resolved: pytorch#32546

Reviewed By: hl475

Differential Revision: D19606919

Pulled By: houseroad

fbshipit-source-id: 31793d6857bbbf99b43b4a7c22a045a56ae19e44
ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
Summary:
Fix for constant folding flaky tests
Looks like the constant folding test modules are sometimes exported with ONNX_ATEN op export type, which is causing the CI failures.
I'm unable to repro this issue locally, but my guess is that the op export param is being overwritten on CI build at some point.
This PR sets the op export type and hopefully fixes the issue.
Pull Request resolved: pytorch#32546

Reviewed By: hl475

Differential Revision: D19606919

Pulled By: houseroad

fbshipit-source-id: 31793d6857bbbf99b43b4a7c22a045a56ae19e44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants