[c2] framework for committed serialized tests by ajyu · Pull Request #10594 · pytorch/pytorch

ajyu · 2018-08-16T21:55:26Z

Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside caffe2/python/operator_test that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:

Refactor your test to be a SerializedTestCase
1a. Decorate it with @given_and_seeded
1b. Call testWithArgs in main
Run your test with -g to generate the output. Check it in.
Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with python caffe2/python/operator_test/[your_test].py -g
Outputs are in caffe2/python/serialized_test/data. The operator tests outputs are in a further subdirectory operator_test, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.

Test Plan:

Run python caffe2/python/operator_test/weighted_sum_test.py -g to generate the outputs
Run python caffe2/python/operator_test/weighted_sum_test.py; Check that the unit test itself still runs as expected
Run python caffe2/python/gradient_check_test.py; Check that CheckSimple still works as expected
Run with-proxy FULL_CAFFE2=1 python setup.py develop; Check there are no errors

facebook-github-bot

ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

caffe2/python/operator_test/weighted_sum_test.py

caffe2/python/serialized_test/generate_serialized_tests.py

caffe2/python/serialized_test/serialized_test_data.py

facebook-github-bot

ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajyu · 2018-08-30T01:26:56Z

@pytorchbot retest this please

ezyang · 2018-08-30T18:12:54Z

Can you explain why setup.py needs to register these serialized tests as data_files? I don't think we want to distribute them as part of the package.

ajyu · 2018-08-30T18:16:02Z

I needed to add the files to CMakeLists in order for the files to be included in the Jenkins test build to pass tests. I figured that making the setup.py build work as well would be desired, in case we switch Jenkins to use it rather than cmake. But I can take that out if desired.

ezyang

OK, in that case, let's remove it from data_files. You should figure out another way to get Jenkins tests to pass. Is the problem that these files are generated at build time and so they get lost when we move files from build to test time? There's some prior art about this in .jenkins/pytorch/build.sh, see:


# Add the test binaries so that they won't be git clean'ed away
git add -f build/bin

ajyu · 2018-08-30T18:34:44Z

The files are not generated at build time, they live in the codebase and engineers would generate them at the time they write their tests. They are copied during build time, and I guess subsequently copied during test time.

I'll remove the changes from setup.py then.

Are repo files available to Jenkins during the time of testing? If so, I can try to figure out to directly read from the repo location if so. Otherwise, I'd need to keep the CMakeLists.txt changes to allow Jenkins to continue working.

ezyang · 2018-08-30T18:47:47Z

In that case, yes, all source controlled files are available during tests, at the source code locations.

By the way, how big are these files going to be? If someone checks in a large binary file, that binary size is permanent in the git repository, and constitutes a tax on everyone who every git clones the repo at any point in the future from there. Let's make it hard for people to accidentally do this.

houseroad · 2018-08-30T19:09:26Z

@ezyang for the model and input/output data, they are small, just like the test data we committed to onnx repo. They should be fine. The goal to detect the incompatible changes on gradient ops to prevent sev (live the broadcasting one).

orionr

Great work! I'm curious what the size of the test files is, but if small we're in good shape if you can remove the CMakeLists.txt changes. If this requires some CI changes as well, let me know and we can adjust them.

caffe2/CMakeLists.txt

ajyu · 2018-08-30T20:23:50Z

These are the sizes of the binary files as of now.

-rw-r--r--. 1 ansha users  67 Aug 28 11:36 gradient_0.pb
-rw-r--r--. 1 ansha users 390 Aug 28 11:37 inputs.npz
-rw-r--r--. 1 ansha users  43 Aug 28 11:37 operator_0.pb
-rw-r--r--. 1 ansha users 234 Aug 28 11:37 outputs.npz

I can add a test in the next diff to warn against binaries that are too large and warn if we surpass some limit on the entire directory of caffe2/python/serialized_test/data.

I've been looking through the test output files and discussing with @ezyang to figure out what we can do. Seems like currently the source code files live in /var/lib/jenkins/workspace/[...]. We can change CI to cd into WORKSPACE instead, before running these operator tests.

.jenkins/caffe2/test.sh

orionr

Looks good!

caffe2/python/operator_test/weighted_sum_test.py

Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: pytorch#10594 Differential Revision: D9370359 fbshipit-source-id: 8688cfca5daa46adcfec1e872a12e4342145af88

ezyang

I didn't review the non-setup/jenkins bits of the patch, but those parts now look extremely good now. Thanks!

apaszke · 2018-09-04T14:40:08Z

Do we really believe that checked in tests are a good idea? I'm pretty sure that having more rigorous test harness will provide much better coverage, and have a benefit of not checking in a new file after every single operator change. Here's how I see it:

Operator code is generally quite self-contained, so it's unlikely that a change in one part of the system will affect a random operator on the other end. Unless it's a breakage so bad that most operator tests will be red anyway. On the other hand, if someone changes the operator kernels, the tiny numerical differences might change e.g. the few least-significant bits, raising a failure when compared to the expected test. Since this was a PR that updates an op, I believe the author will be more than happy to check in a new set of expected outputs, meaning that those values don't mean much. Finally, you can't test all ops this way, because if you use an external library, just updating it to a new version (or having a different cuDNN version installed) might result in those tiny differences, flaring up the whole test suite for no good reason.

We actually took a similar approach for testing the PyTorch JIT some time ago (we serialize IR as text, not binary tensor values), but it has many scalability problems, and is notoriously hard to review. We already had many situations where the test had a good expect file checked in, only to have it overwritten with a non-working one in a later PR (and we do pretty careful code review). After the release, we'll probably try to move away from comparing to expected values, and you might want to consider that too.

Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: pytorch#10594 Reviewed By: ezyang Differential Revision: D9370359 Pulled By: ajyu fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8

Summary: Followup to [the serialized test framework](#10594) Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner. I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase). 1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests. 2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests. 3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients. 4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object. 5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo. I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do ``` settings(...) given(...) def test_my_stuff(...) ``` But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright. Pull Request resolved: #11350 Reviewed By: houseroad Differential Revision: D9693857 Pulled By: ajyu fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7

facebook-github-bot reviewed Aug 16, 2018

View reviewed changes

weiyangfb added the caffe2 label Aug 16, 2018

ajyu commented Aug 16, 2018

View reviewed changes

caffe2/python/operator_test/weighted_sum_test.py Outdated

This comment was marked as off-topic.

Sign in to view

ilia-cher self-requested a review August 16, 2018 22:23

houseroad requested review from dzhulgakov, houseroad and xiaomengy August 16, 2018 22:56

houseroad reviewed Aug 17, 2018

View reviewed changes

caffe2/python/serialized_test/generate_serialized_tests.py Outdated

This comment was marked as off-topic.

Sign in to view

houseroad reviewed Aug 17, 2018

View reviewed changes

caffe2/python/serialized_test/generate_serialized_tests.py Outdated

This comment was marked as off-topic.

Sign in to view

houseroad reviewed Aug 17, 2018

View reviewed changes

caffe2/python/serialized_test/generate_serialized_tests.py Outdated

This comment was marked as off-topic.

Sign in to view

houseroad reviewed Aug 17, 2018

View reviewed changes

caffe2/python/serialized_test/serialized_test_data.py Outdated

This comment was marked as off-topic.

Sign in to view

facebook-github-bot reviewed Aug 21, 2018

View reviewed changes

ajyu requested review from Yangqing, anderspapitto, apaszke, bddppq, colesbury, ebetica, ezyang, fmassa, gchanan, goldsborough, jamesr66a, pietern, smessmer, soumith, teng-li and zdevito as code owners August 21, 2018 22:06

ajyu force-pushed the tmp branch from a865ed8 to 083cc69 Compare August 21, 2018 22:44

facebook-github-bot reviewed Aug 21, 2018

View reviewed changes

ezyang requested changes Aug 30, 2018

View reviewed changes

ajyu force-pushed the tmp branch from b402605 to 4624c4d Compare August 30, 2018 18:44

orionr suggested changes Aug 30, 2018

View reviewed changes

caffe2/CMakeLists.txt Outdated

This comment was marked as off-topic.

Sign in to view

ajyu force-pushed the tmp branch from 4624c4d to 02f43b5 Compare August 30, 2018 22:22

ajyu force-pushed the tmp branch from 02f43b5 to eb4e08a Compare August 30, 2018 23:08

ajyu commented Aug 31, 2018

View reviewed changes

.jenkins/caffe2/test.sh Outdated Show resolved Hide resolved

orionr approved these changes Aug 31, 2018

View reviewed changes

caffe2/python/operator_test/weighted_sum_test.py Outdated

This comment was marked as off-topic.

Sign in to view

ajyu force-pushed the tmp branch from eb4e08a to 833469b Compare August 31, 2018 01:28

ezyang approved these changes Aug 31, 2018

View reviewed changes

facebook-github-bot closed this in 9fae8fc Aug 31, 2018

ajyu deleted the tmp branch August 31, 2018 20:37

ajyu mentioned this pull request Aug 31, 2018

[c2][serialized_tests] Refactor tests part 1 #11159

Closed

ajyu mentioned this pull request Sep 6, 2018

[c2][serialized_tests] Refactor tests part 1 #11350

Closed

ezyang added the merged label Jun 26, 2019

Conversation

ajyu commented Aug 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ajyu commented Aug 30, 2018

Uh oh!

ezyang commented Aug 30, 2018

Uh oh!

ajyu commented Aug 30, 2018

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

ajyu commented Aug 30, 2018

Uh oh!

ezyang commented Aug 30, 2018

Uh oh!

houseroad commented Aug 30, 2018

Uh oh!

orionr left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

ajyu commented Aug 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

orionr left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

apaszke commented Sep 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ajyu commented Aug 16, 2018 •

edited

Loading

ajyu commented Aug 30, 2018 •

edited

Loading