Skip to content

[c2] framework for committed serialized tests#10594

Closed
ajyu wants to merge 1 commit intopytorch:masterfrom
ajyu:tmp
Closed

[c2] framework for committed serialized tests#10594
ajyu wants to merge 1 commit intopytorch:masterfrom
ajyu:tmp

Conversation

@ajyu
Copy link
Contributor

@ajyu ajyu commented Aug 16, 2018

Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside caffe2/python/operator_test that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:

  1. Refactor your test to be a SerializedTestCase
    1a. Decorate it with @given_and_seeded
    1b. Call testWithArgs in main
  2. Run your test with -g to generate the output. Check it in.
  3. Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with python caffe2/python/operator_test/[your_test].py -g
Outputs are in caffe2/python/serialized_test/data. The operator tests outputs are in a further subdirectory operator_test, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.

Test Plan:

  1. Run python caffe2/python/operator_test/weighted_sum_test.py -g to generate the outputs
  2. Run python caffe2/python/operator_test/weighted_sum_test.py; Check that the unit test itself still runs as expected
  3. Run python caffe2/python/gradient_check_test.py; Check that CheckSimple still works as expected
  4. Run with-proxy FULL_CAFFE2=1 python setup.py develop; Check there are no errors

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This comment was marked as off-topic.

@ilia-cher ilia-cher self-requested a review August 16, 2018 22:23

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajyu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajyu
Copy link
Contributor Author

ajyu commented Aug 30, 2018

@pytorchbot retest this please

@ezyang
Copy link
Contributor

ezyang commented Aug 30, 2018

Can you explain why setup.py needs to register these serialized tests as data_files? I don't think we want to distribute them as part of the package.

@ajyu
Copy link
Contributor Author

ajyu commented Aug 30, 2018

I needed to add the files to CMakeLists in order for the files to be included in the Jenkins test build to pass tests. I figured that making the setup.py build work as well would be desired, in case we switch Jenkins to use it rather than cmake. But I can take that out if desired.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case, let's remove it from data_files. You should figure out another way to get Jenkins tests to pass. Is the problem that these files are generated at build time and so they get lost when we move files from build to test time? There's some prior art about this in .jenkins/pytorch/build.sh, see:


# Add the test binaries so that they won't be git clean'ed away
git add -f build/bin

@ajyu
Copy link
Contributor Author

ajyu commented Aug 30, 2018

The files are not generated at build time, they live in the codebase and engineers would generate them at the time they write their tests. They are copied during build time, and I guess subsequently copied during test time.

I'll remove the changes from setup.py then.

Are repo files available to Jenkins during the time of testing? If so, I can try to figure out to directly read from the repo location if so. Otherwise, I'd need to keep the CMakeLists.txt changes to allow Jenkins to continue working.

@ezyang
Copy link
Contributor

ezyang commented Aug 30, 2018

In that case, yes, all source controlled files are available during tests, at the source code locations.

By the way, how big are these files going to be? If someone checks in a large binary file, that binary size is permanent in the git repository, and constitutes a tax on everyone who every git clones the repo at any point in the future from there. Let's make it hard for people to accidentally do this.

@houseroad
Copy link
Member

@ezyang for the model and input/output data, they are small, just like the test data we committed to onnx repo. They should be fine. The goal to detect the incompatible changes on gradient ops to prevent sev (live the broadcasting one).

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I'm curious what the size of the test files is, but if small we're in good shape if you can remove the CMakeLists.txt changes. If this requires some CI changes as well, let me know and we can adjust them.

This comment was marked as off-topic.

@ajyu
Copy link
Contributor Author

ajyu commented Aug 30, 2018

These are the sizes of the binary files as of now.

-rw-r--r--. 1 ansha users  67 Aug 28 11:36 gradient_0.pb
-rw-r--r--. 1 ansha users 390 Aug 28 11:37 inputs.npz
-rw-r--r--. 1 ansha users  43 Aug 28 11:37 operator_0.pb
-rw-r--r--. 1 ansha users 234 Aug 28 11:37 outputs.npz

I can add a test in the next diff to warn against binaries that are too large and warn if we surpass some limit on the entire directory of caffe2/python/serialized_test/data.

I've been looking through the test output files and discussing with @ezyang to figure out what we can do. Seems like currently the source code files live in /var/lib/jenkins/workspace/[...]. We can change CI to cd into WORKSPACE instead, before running these operator tests.

Copy link
Contributor

@orionr orionr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

This comment was marked as off-topic.

Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: pytorch#10594

Differential Revision: D9370359

fbshipit-source-id: 8688cfca5daa46adcfec1e872a12e4342145af88
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the non-setup/jenkins bits of the patch, but those parts now look extremely good now. Thanks!

@apaszke
Copy link
Contributor

apaszke commented Sep 4, 2018

Do we really believe that checked in tests are a good idea? I'm pretty sure that having more rigorous test harness will provide much better coverage, and have a benefit of not checking in a new file after every single operator change. Here's how I see it:

Operator code is generally quite self-contained, so it's unlikely that a change in one part of the system will affect a random operator on the other end. Unless it's a breakage so bad that most operator tests will be red anyway. On the other hand, if someone changes the operator kernels, the tiny numerical differences might change e.g. the few least-significant bits, raising a failure when compared to the expected test. Since this was a PR that updates an op, I believe the author will be more than happy to check in a new set of expected outputs, meaning that those values don't mean much. Finally, you can't test all ops this way, because if you use an external library, just updating it to a new version (or having a different cuDNN version installed) might result in those tiny differences, flaring up the whole test suite for no good reason.

We actually took a similar approach for testing the PyTorch JIT some time ago (we serialize IR as text, not binary tensor values), but it has many scalability problems, and is notoriously hard to review. We already had many situations where the test had a good expect file checked in, only to have it overwritten with a non-working one in a later PR (and we do pretty careful code review). After the release, we'll probably try to move away from comparing to expected values, and you might want to consider that too.

PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: pytorch#10594

Reviewed By: ezyang

Differential Revision: D9370359

Pulled By: ajyu

fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
facebook-github-bot pushed a commit that referenced this pull request Sep 18, 2018
Summary:
Followup to [the serialized test framework](#10594)

Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner.

I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase).

1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests.
2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests.
3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients.
4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object.
5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo.

I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do
```
settings(...)
given(...)
def test_my_stuff(...)
```
But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright.
Pull Request resolved: #11350

Reviewed By: houseroad

Differential Revision: D9693857

Pulled By: ajyu

fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants