Skip to content

Refactor test_jit_fuser legacy tests.#34983

Closed
ZolotukhinM wants to merge 4 commits intogh/ZolotukhinM/196/basefrom
gh/ZolotukhinM/196/head
Closed

Refactor test_jit_fuser legacy tests.#34983
ZolotukhinM wants to merge 4 commits intogh/ZolotukhinM/196/basefrom
gh/ZolotukhinM/196/head

Conversation

@ZolotukhinM
Copy link
Copy Markdown

@ZolotukhinM ZolotukhinM commented Mar 18, 2020

Stack from ghstack:

Previously we relied on global setting to specify, which executor to
use. However, it didn't work as we expected, which can be demonstrated
by the difference between the following two test runs:

 # Run in the standard order
 > pytest test/test_jit_fuser.py test/test_jit_fuser_legacy.py

test/test_jit_fuser.py ..................ss..sss.s.....s.............
test/test_jit_fuser_legacy.py ..................ss..sss.sF....s.............

 # Run in the reversed order
 > pytest test/test_jit_fuser_legacy.py test/test_jit_fuser.py

test/test_jit_fuser_legacy.py ............F.....s....ss.......s...FF........
test/test_jit_fuser.py ............F.....s....ss..F....s...FF........

This PR actually makes the tests run with the desired executor and
disables tests that happen to be failing.

With this change:

 # Run in the standard order
 > pytest test/test_jit_fuser.py test/test_jit_fuser_legacy.py

test/test_jit_fuser.py ..................ss..sss.s.....s.............
test/test_jit_fuser_legacy.py ..................s...ssssss....s...ss........

 # Run in the reversed order
 > pytest test/test_jit_fuser_legacy.py test/test_jit_fuser.py

test/test_jit_fuser_legacy.py ..................s...ssssss....s...ss........
test/test_jit_fuser.py ..................ss..sss.s.....s.............

Differential Revision: D20520367

Don't use argv/globals to specify which graph executor to use: instead,
set the desired mode in test fixture.

Differential Revision: [D20520367](https://our.internmc.facebook.com/intern/diff/D20520367)

[ghstack-poisoned]
@ZolotukhinM ZolotukhinM changed the title [WIP] Refactor test_jit_fuser tests. Refactor test_jit_fuser legacy tests. Mar 18, 2020
Previously we relied on global setting to specify, which executor to
use. However, it didn't work as we expected, which can be demonstrated
by the difference between the following two test runs:
```
 # Run in the standard order
 > pytest test/test_jit_fuser.py test/test_jit_fuser_legacy.py

test/test_jit_fuser.py ..................ss..sss.s.....s.............
test/test_jit_fuser_legacy.py ..................ss..sss.sF....s.............

 # Run in the reversed order
 > pytest test/test_jit_fuser_legacy.py test/test_jit_fuser.py

test/test_jit_fuser_legacy.py ............F.....s....ss.......s...FF........
test/test_jit_fuser.py ............F.....s....ss..F....s...FF........
```

This PR actually makes the tests run with the desired executor and
disables tests that happen to be failing.

With this change:
```
 # Run in the standard order
 > pytest test/test_jit_fuser.py test/test_jit_fuser_legacy.py

test/test_jit_fuser.py ..................ss..sss.s.....s.............
test/test_jit_fuser_legacy.py ..................s...ssssss....s...ss........

 # Run in the reversed order
 > pytest test/test_jit_fuser_legacy.py test/test_jit_fuser.py

test/test_jit_fuser_legacy.py ..................s...ssssss....s...ss........
test/test_jit_fuser.py ..................ss..sss.s.....s.............
```

Differential Revision: [D20520367](https://our.internmc.facebook.com/intern/diff/D20520367)

[ghstack-poisoned]
ZolotukhinM pushed a commit that referenced this pull request Mar 18, 2020
Previously we relied on global setting to specify, which executor to
use. However, it didn't work as we expected, which can be demonstrated
by the difference between the following two test runs:
```
 # Run in the standard order
 > pytest test/test_jit_fuser.py test/test_jit_fuser_legacy.py

test/test_jit_fuser.py ..................ss..sss.s.....s.............
test/test_jit_fuser_legacy.py ..................ss..sss.sF....s.............

 # Run in the reversed order
 > pytest test/test_jit_fuser_legacy.py test/test_jit_fuser.py

test/test_jit_fuser_legacy.py ............F.....s....ss.......s...FF........
test/test_jit_fuser.py ............F.....s....ss..F....s...FF........
```

This PR actually makes the tests run with the desired executor and
disables tests that happen to be failing.

With this change:
```
 # Run in the standard order
 > pytest test/test_jit_fuser.py test/test_jit_fuser_legacy.py

test/test_jit_fuser.py ..................ss..sss.s.....s.............
test/test_jit_fuser_legacy.py ..................s...ssssss....s...ss........

 # Run in the reversed order
 > pytest test/test_jit_fuser_legacy.py test/test_jit_fuser.py

test/test_jit_fuser_legacy.py ..................s...ssssss....s...ss........
test/test_jit_fuser.py ..................ss..sss.s.....s.............
```

ghstack-source-id: b6d8f5e
Pull Request resolved: #34983
Comment thread test/test_jit_fuser.py
@unittest.skipIf(not RUN_CUDA_HALF, "no half support")
@unittest.skipIf(graph_executor_mode() != ProfilingMode.LEGACY, "no half support with profiling on")
def test_cuda_half(self):
if graph_executor_mode() == ProfilingMode.PROFILING:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be actually a bit confusing since folks including me are used to looking for skipIf decorators. What's the reason for switching?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that skipIf decorators are executed early and thus don't do what we expect them to do in this case. The checks graph_executor_mode() == ... might be executed not immediately before the test is run and thus result in incorrect behaviour.

Comment thread test/test_jit_fuser.py
if graph_executor_mode() == ProfilingMode.PROFILING:
self.skipTest("no half support with profiling on")
if graph_executor_mode() == ProfilingMode.LEGACY:
self.skipTest("broken with legacy executor too")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eerr.. shouldn't this test work with the legacy executor?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but apparently it was never tested and was broken :)

Comment thread test/test_jit_fuser.py
@unittest.skipIf(not RUN_CUDA, "fuser requires CUDA")
def test_rand_broadcast_cuda(self):
if graph_executor_mode() != ProfilingMode.PROFILING:
self.skipTest("Passes only with profiling executor")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did it stop passing with the legacy executor?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, but we never actually tested it for the reasons I mentioned previously.

Comment thread test/test_jit_fuser.py
@unittest.skipIf(not RUN_CUDA_MULTI_GPU, "needs non-zero device")
@enable_cpu_fuser
def test_fusion_reuse_multi_gpu(self):
if graph_executor_mode() == ProfilingMode.LEGACY:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most tests should work with the legacy executor, i'm not sure i understand why there are failures now.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Mar 19, 2020

💊 CircleCI build failures summary and remediations

As of commit 1587c86 (more details on the Dr. CI page):


  • 6/7 failures introduced in this PR

  • 1/7 broken upstream at merge base a3de359 on Mar 18 from 7:01am to 8:48am (2 commits; b712905 - a1eaaea)

    Please rebase on the viable/strict branch (expand for instructions)

    Since your merge base is older than viable/strict, run these commands:

    git fetch https://github.com/pytorch/pytorch viable/strict
    git rebase FETCH_HEAD
    

    Check out the recency history of this "viable master" tracking branch.


🕵️ 6 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/6)

Step: "Test" (full log | pattern match details)

Mar 18 16:19:08 RuntimeError: test_jit failed!
Mar 18 16:19:07 Ran 2434 tests in 135.218s 
Mar 18 16:19:07  
Mar 18 16:19:07 FAILED (failures=1, errors=2, skipped=72, expected failures=1) 
Mar 18 16:19:07  
Mar 18 16:19:07 Generating XML reports... 
Mar 18 16:19:08 Traceback (most recent call last): 
Mar 18 16:19:08   File "test/run_test.py", line 674, in <module> 
Mar 18 16:19:08     main() 
Mar 18 16:19:08   File "test/run_test.py", line 667, in main 
Mar 18 16:19:08     raise RuntimeError(message) 
Mar 18 16:19:08 RuntimeError: test_jit failed! 
Mar 18 16:19:09 + cleanup 
Mar 18 16:19:09 + retcode=1 
Mar 18 16:19:09 + set +x 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (2/6)

Step: "Test" (full log | pattern match details)

Mar 18 23:43:54 RuntimeError: test_jit_simple failed!
Mar 18 23:43:54 Ran 2437 tests in 84.605s 
Mar 18 23:43:54  
Mar 18 23:43:54 FAILED (failures=1, errors=2, skipped=86, expected failures=1) 
Mar 18 23:43:54  
Mar 18 23:43:54 Generating XML reports... 
Mar 18 23:43:54 Traceback (most recent call last): 
Mar 18 23:43:54   File "test/run_test.py", line 674, in <module> 
Mar 18 23:43:54     main() 
Mar 18 23:43:54   File "test/run_test.py", line 667, in main 
Mar 18 23:43:54     raise RuntimeError(message) 
Mar 18 23:43:54 RuntimeError: test_jit_simple failed! 
Mar 18 23:43:54 + cleanup 
Mar 18 23:43:54 + retcode=1 
Mar 18 23:43:54 + set +x 
Mar 18 23:43:54 =================== sccache compilation log =================== 
Mar 18 23:43:54 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 18 23:43:54 Compile requests                 2 
Mar 18 23:43:54 Compile requests executed        0 
Mar 18 23:43:54 Cache hits                       0 
Mar 18 23:43:54 Cache misses                     0 
Mar 18 23:43:54 Cache timeouts                   0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test (3/6)

Step: "Test" (full log | pattern match details)

Mar 18 23:45:49 RuntimeError: test_jit_legacy failed!
Mar 18 23:45:49 Ran 2437 tests in 174.448s 
Mar 18 23:45:49  
Mar 18 23:45:49 FAILED (failures=1, errors=2, skipped=70, expected failures=1) 
Mar 18 23:45:49  
Mar 18 23:45:49 Generating XML reports... 
Mar 18 23:45:49 Traceback (most recent call last): 
Mar 18 23:45:49   File "test/run_test.py", line 674, in <module> 
Mar 18 23:45:49     main() 
Mar 18 23:45:49   File "test/run_test.py", line 667, in main 
Mar 18 23:45:49     raise RuntimeError(message) 
Mar 18 23:45:49 RuntimeError: test_jit_legacy failed! 
Mar 18 23:45:49 + cleanup 
Mar 18 23:45:49 + retcode=1 
Mar 18 23:45:49 + set +x 
Mar 18 23:45:49 =================== sccache compilation log =================== 
Mar 18 23:45:49 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 18 23:45:49 Compile requests               250 
Mar 18 23:45:49 Compile requests executed        0 
Mar 18 23:45:49 Cache hits                       0 
Mar 18 23:45:49 Cache misses                     0 
Mar 18 23:45:49 Cache timeouts                   0 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (4/6)

Step: "Test" (full log | pattern match details)

Mar 19 00:24:36 RuntimeError: test_jit failed!
Mar 19 00:24:36 Ran 2437 tests in 113.831s 
Mar 19 00:24:36  
Mar 19 00:24:36 FAILED (failures=1, errors=2, skipped=34, expected failures=1) 
Mar 19 00:24:36  
Mar 19 00:24:36 Generating XML reports... 
Mar 19 00:24:36 Traceback (most recent call last): 
Mar 19 00:24:36   File "test/run_test.py", line 674, in <module> 
Mar 19 00:24:36     main() 
Mar 19 00:24:36   File "test/run_test.py", line 667, in main 
Mar 19 00:24:36     raise RuntimeError(message) 
Mar 19 00:24:36 RuntimeError: test_jit failed! 
Mar 19 00:24:37 + cleanup 
Mar 19 00:24:37 + retcode=1 
Mar 19 00:24:37 + set +x 
Mar 19 00:24:37 =================== sccache compilation log =================== 
Mar 19 00:24:37 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 19 00:24:37 Compile requests               133 
Mar 19 00:24:37 Compile requests executed       46 
Mar 19 00:24:37 Cache hits                      44 
Mar 19 00:24:37 Cache misses                     1 
Mar 19 00:24:37 Cache timeouts                   0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (5/6)

Step: "Test" (full log | pattern match details)

Mar 19 00:29:36 RuntimeError: test_jit failed!
Mar 19 00:29:36 Ran 2437 tests in 118.855s 
Mar 19 00:29:36  
Mar 19 00:29:36 FAILED (failures=1, errors=2, skipped=67, expected failures=1) 
Mar 19 00:29:36  
Mar 19 00:29:36 Generating XML reports... 
Mar 19 00:29:36 Traceback (most recent call last): 
Mar 19 00:29:36   File "test/run_test.py", line 674, in <module> 
Mar 19 00:29:36     main() 
Mar 19 00:29:36   File "test/run_test.py", line 667, in main 
Mar 19 00:29:36     raise RuntimeError(message) 
Mar 19 00:29:36 RuntimeError: test_jit failed! 
Mar 19 00:29:36 =================== sccache compilation log =================== 
Mar 19 00:29:36 + cleanup 
Mar 19 00:29:36 + retcode=1 
Mar 19 00:29:36 + set +x 
Mar 19 00:29:36 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 19 00:29:36 Compile requests                 48 
Mar 19 00:29:36 Compile requests executed        22 
Mar 19 00:29:36 Cache hits                       10 
Mar 19 00:29:36 Cache misses                     11 
Mar 19 00:29:36 Cache timeouts                    0 

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test (6/6)

Step: "Test" (full log | pattern match details)

Mar 19 02:18:55 RuntimeError: test_jit failed!
Mar 19 02:18:55 Ran 2404 tests in 5277.840s 
Mar 19 02:18:55  
Mar 19 02:18:55 FAILED (failures=1, errors=2, skipped=67, expected failures=1) 
Mar 19 02:18:55  
Mar 19 02:18:55 Generating XML reports... 
Mar 19 02:18:55 Traceback (most recent call last): 
Mar 19 02:18:55   File "test/run_test.py", line 674, in <module> 
Mar 19 02:18:55     main() 
Mar 19 02:18:55   File "test/run_test.py", line 667, in main 
Mar 19 02:18:55     raise RuntimeError(message) 
Mar 19 02:18:55 RuntimeError: test_jit failed! 
Mar 19 02:18:56 + cleanup 
Mar 19 02:18:56 + retcode=1 
Mar 19 02:18:56 + set +x 
Mar 19 02:18:56 =================== sccache compilation log =================== 
Mar 19 02:18:56 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 19 02:18:56 Compile requests                 0 
Mar 19 02:18:56 Compile requests executed        0 
Mar 19 02:18:56 Cache hits                       0 
Mar 19 02:18:56 Cache misses                     0 
Mar 19 02:18:56 Cache timeouts                   0 

🚧 1 upstream failure:

These were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 times.

@facebook-github-bot facebook-github-bot deleted the gh/ZolotukhinM/196/head branch April 19, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants