Skip to content

add autodiff expressions for common operations#11832

Closed
zou3519 wants to merge 18 commits intopytorch:masterfrom
zou3519:better-autodiff
Closed

add autodiff expressions for common operations#11832
zou3519 wants to merge 18 commits intopytorch:masterfrom
zou3519:better-autodiff

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Sep 18, 2018

This PR does a few things:

Enable testing autodiff in test_jit.py

Previously test_jit.py only tested autograd on backward graphs.
This is because we borrow from test_autograd and construct graphs with a small
number of nodes. Because the number of nodes is small (typically 1-2), those graph
do not end up containing autodiff subgraphs, so autodiff never gets tested.

This PR enables autodiff testing by doing the following:

  • added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable
    autodiff subgraph inlining.
  • (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold.
    These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining()
    sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs.
  • The relevant backward jit tests disable autodiff subgraph inlining so they
    will test the autodiff versions of the operators instead of autograd whenever
    an autodiff variant exists.
  • We don't run the tests that do inline autodiff subgraphs anymore.
    This has no impact on testing correctness because the assumption is
    that autograd functions are correct and are tested in test_autograd.py

Add autodiff for a lot of ops

This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass.

Included:

  • arithmetic overloads
  • abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt

Test Plan

TestJitGenerated tests autodiff for all of the added operations.

cc @apaszke @zdevito

@pytorchbot pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 18, 2018
@zou3519 zou3519 force-pushed the better-autodiff branch 2 times, most recently from 201f382 to c14e20d Compare September 20, 2018 16:59
@zou3519
Copy link
Contributor Author

zou3519 commented Sep 21, 2018

There is something really fishing going on with LSTM's double backward (that is the test that is failing). Disabling the autodiff formula for mul(Tensor, Scalar) causes the test to pass, renabling it causes the test to fail. As far as I can tell the autodiff formula is correct so I'm not too sure what is going on (it's possible that it's a numeric precision issue, but the gradients are off by ~100 (they are on the order of magnitude of 1000)

@apaszke
Copy link
Contributor

apaszke commented Sep 21, 2018

@zou3519 it's a known bug in AD. It will be fixed by #11654, but it's blocked on some fbcode stuff that I've been dealing with since Monday...

@zou3519
Copy link
Contributor Author

zou3519 commented Sep 21, 2018

@apaszke thank you for pointing that out! (I would have gone on a long search for the cause otherwise...).

Previously test_jit.py only tested autograd on backward graphs.
This is because we borrow from test_autograd and construct graphs with a small
number of nodes. Because the number of nodes is small (typically 1-2), those graph
do not end up containing autodiff subgraphs, so autodiff never gets tested.

This PR enables autodiff testing by doing the following:
- added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable
  autodiff subgraph inlining.
- (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold.
  These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining()
  sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs.
- The relevant backward jit tests disable autodiff subgraph inlining so they
  will test the autodiff versions of the operators instead of autograd whenever
  an autodiff variant exists.
- We don't run the tests that do inline autodiff subgraphs anymore.
  This has no impact on testing correctness because the assumption is
  that autograd functions are correct and are tested in test_autograd.py
- Added div
- Added scalar-tensor and tensor-scalar overloads
@zou3519
Copy link
Contributor Author

zou3519 commented Sep 24, 2018

Should be ready for review now. I excluded some advanced indexing tests from running because they fail autodiff tests (they were previously incorrect but this PR exposed them) and I don't see an easy fix for them. The root cause is #12024.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Haven't reviewed all AD formulas, but they should be tested anyway

test/test_jit.py Outdated

def assert_fully_differentiable_graph(graph):
nodes = list(graph.nodes())
if not (len(nodes) == 1 and 'prim::DifferentiableGraph' in str(nodes[0])):

This comment was marked as off-topic.

This comment was marked as off-topic.

friend struct GraphExecutor;

size_t autodiffSubgraphNodeThreshold = 2;
size_t autodiffSubgraphInlineThreshold = 5;

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

petrex pushed a commit to petrex/pytorch that referenced this pull request Sep 26, 2018
* upstream/master: (117 commits)
  Add full namespace resolution in CAFFE_DURATION (pytorch#12065)
  T33898723: Simple put operators for caffe2 stats (pytorch#12057)
  add narrow() support for sparse tensors re: pytorch#8853 (pytorch#11342)
  Fix ONNX bug, add symbolic for full
  Enable tracing of tensor factories with an out argument
  Fix warnings emitted when testing distributions (pytorch#12038)
  Unify versions across setup.py, libtorch, and libcaffe2 (pytorch#12053)
  add autodiff expressions for common operations (pytorch#11832)
  Blob doesn't allow access to destroyCall anymore (pytorch#11548)
  IValue can store Blob (pytorch#11414)
  Move Blob to ATen/core (pytorch#11924)
  Use tempfile during serialized test comparison (pytorch#12021)
  fix segfault when grad to a hook fn is None (pytorch#12028)
  Fallback CreateMutex/AtomicIter operators for mkl-dnn
  Unify all *_EXPORT and *_IMPORT macros across c++ backend (pytorch#12019)
  Add safety asserts for methods on TensorImpl which don't work on Variable. (pytorch#12058)
  Make USE_IDEEP work again (pytorch#12026)
  Fix "identifier following the 'template' keyword does not refer to a template" (pytorch#12037)
  Delete some unused variables. (pytorch#12059)
  Support TypeIdentifier::name() (pytorch#12036)
  ...
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants