[c2] framework for committed serialized tests#10594
[c2] framework for committed serialized tests#10594ajyu wants to merge 1 commit intopytorch:masterfrom
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
|
Can you explain why setup.py needs to register these serialized tests as |
|
I needed to add the files to CMakeLists in order for the files to be included in the Jenkins test build to pass tests. I figured that making the setup.py build work as well would be desired, in case we switch Jenkins to use it rather than cmake. But I can take that out if desired. |
ezyang
left a comment
There was a problem hiding this comment.
OK, in that case, let's remove it from data_files. You should figure out another way to get Jenkins tests to pass. Is the problem that these files are generated at build time and so they get lost when we move files from build to test time? There's some prior art about this in .jenkins/pytorch/build.sh, see:
# Add the test binaries so that they won't be git clean'ed away
git add -f build/bin
|
The files are not generated at build time, they live in the codebase and engineers would generate them at the time they write their tests. They are copied during build time, and I guess subsequently copied during test time. I'll remove the changes from setup.py then. Are repo files available to Jenkins during the time of testing? If so, I can try to figure out to directly read from the repo location if so. Otherwise, I'd need to keep the CMakeLists.txt changes to allow Jenkins to continue working. |
|
In that case, yes, all source controlled files are available during tests, at the source code locations. By the way, how big are these files going to be? If someone checks in a large binary file, that binary size is permanent in the git repository, and constitutes a tax on everyone who every git clones the repo at any point in the future from there. Let's make it hard for people to accidentally do this. |
|
@ezyang for the model and input/output data, they are small, just like the test data we committed to onnx repo. They should be fine. The goal to detect the incompatible changes on gradient ops to prevent sev (live the broadcasting one). |
orionr
left a comment
There was a problem hiding this comment.
Great work! I'm curious what the size of the test files is, but if small we're in good shape if you can remove the CMakeLists.txt changes. If this requires some CI changes as well, let me know and we can adjust them.
caffe2/CMakeLists.txt
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
These are the sizes of the binary files as of now. I can add a test in the next diff to warn against binaries that are too large and warn if we surpass some limit on the entire directory of I've been looking through the test output files and discussing with @ezyang to figure out what we can do. Seems like currently the source code files live in |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: pytorch#10594 Differential Revision: D9370359 fbshipit-source-id: 8688cfca5daa46adcfec1e872a12e4342145af88
ezyang
left a comment
There was a problem hiding this comment.
I didn't review the non-setup/jenkins bits of the patch, but those parts now look extremely good now. Thanks!
|
Do we really believe that checked in tests are a good idea? I'm pretty sure that having more rigorous test harness will provide much better coverage, and have a benefit of not checking in a new file after every single operator change. Here's how I see it: Operator code is generally quite self-contained, so it's unlikely that a change in one part of the system will affect a random operator on the other end. Unless it's a breakage so bad that most operator tests will be red anyway. On the other hand, if someone changes the operator kernels, the tiny numerical differences might change e.g. the few least-significant bits, raising a failure when compared to the expected test. Since this was a PR that updates an op, I believe the author will be more than happy to check in a new set of expected outputs, meaning that those values don't mean much. Finally, you can't test all ops this way, because if you use an external library, just updating it to a new version (or having a different cuDNN version installed) might result in those tiny differences, flaring up the whole test suite for no good reason. We actually took a similar approach for testing the PyTorch JIT some time ago (we serialize IR as text, not binary tensor values), but it has many scalability problems, and is notoriously hard to review. We already had many situations where the test had a good expect file checked in, only to have it overwritten with a non-working one in a later PR (and we do pretty careful code review). After the release, we'll probably try to move away from comparing to expected values, and you might want to consider that too. |
Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: pytorch#10594 Reviewed By: ezyang Differential Revision: D9370359 Pulled By: ajyu fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
Summary: Followup to [the serialized test framework](#10594) Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner. I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase). 1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests. 2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests. 3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients. 4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object. 5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo. I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do ``` settings(...) given(...) def test_my_stuff(...) ``` But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright. Pull Request resolved: #11350 Reviewed By: houseroad Differential Revision: D9693857 Pulled By: ajyu fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside
caffe2/python/operator_testthat call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.To use:
1a. Decorate it with @given_and_seeded
1b. Call testWithArgs in main
Details:
Run your test with
python caffe2/python/operator_test/[your_test].py -gOutputs are in
caffe2/python/serialized_test/data. The operator tests outputs are in a further subdirectoryoperator_test, to allow for other tests in the future (model zoo tests?)Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Test Plan:
python caffe2/python/operator_test/weighted_sum_test.py -gto generate the outputspython caffe2/python/operator_test/weighted_sum_test.py; Check that the unit test itself still runs as expectedpython caffe2/python/gradient_check_test.py; Check that CheckSimple still works as expectedwith-proxy FULL_CAFFE2=1 python setup.py develop; Check there are no errors