Fix generate_not_implemented_tests not testing unknown types correctly#56997
Fix generate_not_implemented_tests not testing unknown types correctly#56997asi1024 wants to merge 4 commits intopytorch:masterfrom
generate_not_implemented_tests not testing unknown types correctly#56997Conversation
💊 CI failures summary and remediationsAs of commit 61407c1 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Run clang-format | 🔁 rerun |
❄️ 1 failure tentatively classified as flaky
but reruns have not yet been triggered to confirm:
pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (1/1)
Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️
May 06 14:30:17 RuntimeError: Process 0 terminated or timed out after 110.04512786865234 seconds
May 06 14:30:17 ======================================================================
May 06 14:30:17 ERROR [110.068s]: test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn)
May 06 14:30:17 ----------------------------------------------------------------------
May 06 14:30:17 Traceback (most recent call last):
May 06 14:30:17 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
May 06 14:30:17 self._join_processes(fn)
May 06 14:30:17 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 566, in _join_processes
May 06 14:30:17 self._check_return_codes(elapsed_time)
May 06 14:30:17 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 614, in _check_return_codes
May 06 14:30:17 raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
May 06 14:30:17 RuntimeError: Process 0 terminated or timed out after 110.04512786865234 seconds
May 06 14:30:17
May 06 14:30:17 ----------------------------------------------------------------------
May 06 14:30:17 Ran 196 tests in 657.476s
May 06 14:30:17
May 06 14:30:17 FAILED (errors=4, skipped=117)
May 06 14:30:17
May 06 14:30:17 Generating XML reports...
May 06 14:30:17 Generated XML report: test-reports/dist-nccl/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20210506141919.xml
May 06 14:30:17 Traceback (most recent call last):
May 06 14:30:17 File "test/run_test.py", line 1156, in <module>
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
mruberry
left a comment
There was a problem hiding this comment.
Hey @asi1024! Thanks for noticing this issue; this is often a tricky Python nuance.
It looks like some of the tests are failing, probably because this test wasn't running for so long and it regressed. Do you the test can be updated? Alternatively, is there something else these failures are telling us? That is, is there something outside this test we should be fixing?
|
cc @pmeier, who's investigating updates to the binary pwise tests |
14b6d5e to
1d209db
Compare
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
With apologies, @asi1024, I believe I spoke too soon about the test failures being unrelated. I suspect the following test failure is related to this change: These tests work by generating text files and comparing the new text file to an old "expect" file. So when one of these tests breaks it doesn't mean something is wrong, necessarily, simple that something has changed. For simplicity, however, would you revert the changes to _tensor.py and python_variable_methods.cpp, then skip testing the ops which will cause the test to fail? Then this PR will fix the issue it was originally trying to solve. Additionally, however, would you please file an issue for the other changes this PR was making so we don't forget about them? |
This reverts commit 1d209db.
|
@mruberry I reverted the commit and will open an issue for this change! |
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…tly (pytorch#56997) Summary: Currently, the test code is not testing unknown types correctly because `op` is overwritten in the for-loop (i.e., currently only `__ior__` is tested). This PR fixes the test `generate_not_implemented_tests` to bind operator name to each method, and remove operators currently unsupported (`__rand__`, …). cc/ mruberry This fix is be needed to add tests for the operator we are going to introduce (e.g., `__rand__`) Pull Request resolved: pytorch#56997 Reviewed By: astaff Differential Revision: D28118465 Pulled By: mruberry fbshipit-source-id: c5a466a7604262ed5490862300d47043aff63d0b
…tly (pytorch#56997) Summary: Currently, the test code is not testing unknown types correctly because `op` is overwritten in the for-loop (i.e., currently only `__ior__` is tested). This PR fixes the test `generate_not_implemented_tests` to bind operator name to each method, and remove operators currently unsupported (`__rand__`, …). cc/ mruberry This fix is be needed to add tests for the operator we are going to introduce (e.g., `__rand__`) Pull Request resolved: pytorch#56997 Reviewed By: astaff Differential Revision: D28118465 Pulled By: mruberry fbshipit-source-id: c5a466a7604262ed5490862300d47043aff63d0b
Currently, the test code is not testing unknown types correctly because
opis overwritten in the for-loop (i.e., currently only__ior__is tested).This PR fixes the test
generate_not_implemented_teststo bind operator name to each method, and remove operators currently unsupported (__rand__, …).cc/ @mruberry This fix is be needed to add tests for the operator we are going to introduce (e.g.,
__rand__)