pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments#79898
pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments#79898
Conversation
1d116c8 to
cf4524b
Compare
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 1332182 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
136af30 to
6b31474
Compare
95dfee5 to
38e67dd
Compare
f40ddce to
c93465e
Compare
| # when running in parallel in pytest, adding the test times will not give the correct | ||
| # time used to run the file, which will make the sharding incorrect, so if the test is | ||
| # run in parallel, we take the time reported by the testsuite | ||
| if key in pytest_parallel_times: |
There was a problem hiding this comment.
Ah, so you track the pytest times, and then for these test summaries, and these test summaries ONLY, you override the time corresponding to the invoking file?
There was a problem hiding this comment.
oh wait you're not modifying the existing stats, you're just adding a new table for file times!
| upload_to_s3( | ||
| args.workflow_run_id, | ||
| args.workflow_run_attempt, | ||
| "invoking_file_times", |
There was a problem hiding this comment.
Would be good to verify that these stats are what you expect after landing
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here |
|
Hey @clee2000. |
… cuda environments (#79898) (#79898) Summary: This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS. I am excluding linux cuda because running in parallel results in errors due to running out of memory Notes: * update hypothesis version for compatability with pytest * use rerun-failures to rerun tests (similar to flaky tests, although these test files generally don't have flaky tests) * reruns are denoted by a rerun tag in the xml. Failed reruns also have the failure tag. Successes (meaning that the test is flaky) do not have the failure tag. * see https://docs.google.com/spreadsheets/d/1aO0Rbg3y3ch7ghipt63PG2KNEUppl9a5b18Hmv2CZ4E/edit#gid=602543594 for info on speedup (or slowdown in the case of slow tests) * expecting windows tests to decrease by 60 minutes total * slow test infra is expected to stay the same - verified by running pytest and unittest on the same job and check the number of skipped/run tests * test reports to s3 changed - add entirely new table to keep track of invoking_file times Pull Request resolved: #79898 Approved by: https://github.com/malfet, https://github.com/janeyx99 Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/06a0cfc0ea0cc703a1ebc8148181ac3e3cb80ab5 Reviewed By: jeanschmidt Differential Revision: D37990830 Pulled By: clee2000 fbshipit-source-id: bf781f39829c03f167470e2222ed0496a54fca72
| verbosity=2 if verbose else 1, | ||
| resultclass=XMLTestResultVerbose)) | ||
| if test_filename in PYTEST_FILES and not IS_SANDCASTLE and not ( | ||
| "cuda" in os.environ["BUILD_ENVIRONMENT"] and "linux" in os.environ["BUILD_ENVIRONMENT"] |
There was a problem hiding this comment.
Hi @malfet @janeyx99 , it seems like our test_ops tests were completely skipped after this PR because we don't have BUILD_ENVIRONMENT in our environment. Is this check really necessary for non-github-CI builds? Can you provide a fix to re-enable test_ops and other PYTEST_FILES tests?
cc @ptrblck
Running test_ops ... [2022-07-28 03:25:21.888236]
Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--save-xml', '--import-slow-tests', '--import-disabled-tests'] ... [2022-07-28 03:25:21.888297]
Traceback (most recent call last):
File "test_ops.py", line 1736, in <module>
run_tests()
File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 716, in run_tests
"cuda" in os.environ["BUILD_ENVIRONMENT"] and "linux" in os.environ["BUILD_ENVIRONMENT"]
File "/opt/conda/lib/python3.8/os.py", line 675, in __getitem__
raise KeyError(key) from None
KeyError: 'BUILD_ENVIRONMENT'
test_ops failed!### Description quick fix for #79898 (comment) ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: #82452 Approved by: https://github.com/huydhn
This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS. I am excluding linux cuda because running in parallel results in errors due to running out of memory
Notes: