Make GraphExecutors work on Stacks instead of variable_tensor_lists by apaszke · Pull Request #9763 · pytorch/pytorch

apaszke · 2018-07-24T14:53:49Z

This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions.

@zdevito

facebook-github-bot

@apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

apaszke · 2018-07-25T22:02:30Z

@pytorchbot retest this please

zdevito

Cool! This will make it easier to add IValue<->python conversions and support non Tensor inputs/outputs.

I have two small but important changes:

we need to maintain the assumption that stacks can have additional elements in them, the GE should always just pop the last N elements off the stack where N is the number of inputs to the graph. In all places where stacks are used, we already know the number of inputs from the graph.
we shouldn't expose the tag in IValue, it will make it difficult to change IValue's internal representation. This one is easy to work around.

torch/csrc/jit/argument_spec.h

      auto & pod = pods[i];
+      pod.kind = static_cast<uint32_t>(inputs[i].kind());
+      if (!inputs[i].isTensor()) continue;
+      const auto & t = inputs[i].toTensor();


torch/csrc/jit/graph_executor.cpp


  // entry point where execution begins
-  variable_tensor_list run(variable_tensor_list inputs) {
+  void run(Stack & inputs) {


torch/csrc/jit/graph_executor.cpp

+    captureInputs(*grad_fn, stack);

-    auto stack = unwrapVariables(std::move(inputs));
+    detachVariables(stack);


torch/csrc/jit/interpreter.cpp

-      //TODO: has graph executor work from a stack as well
-      variable_tensor_list toutputs = executor->run(variable_tensor_list(std::move(tinputs)));
-      stack.insert(stack.end(), toutputs.begin(), toutputs.end());
+      executor->run(stack);


torch/csrc/jit/ivalue.h

 #define TORCH_FORALL_TAGS(_) \
  _(None) _(Tensor) _(Double) _(Int) _(Tuple) _(IntList) _(DoubleList)

+enum class IValueKind : uint32_t {


torch/csrc/jit/passes/shape_analysis.cpp

  for(size_t i = 0; i < spec.size(); ++i) {
-    graph.inputs()[i]->setType(spec.tensorInfo(i));
+    auto argspec = spec.at(i);
+    if (argspec.kind() != IValueKind::Tensor) continue;


zdevito

Looks good!

facebook-github-bot

@apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…9807) Summary: Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: #9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7

…ytorch#9763) Summary: This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions. zdevito Pull Request resolved: pytorch#9763 Reviewed By: zou3519 Differential Revision: D8978457 Pulled By: apaszke fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20

…ytorch#9807) Summary: Based on top of pytorch#9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: pytorch#9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7

…ytorch#9763) Summary: This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions. zdevito Pull Request resolved: pytorch#9763 Reviewed By: zou3519 Differential Revision: D8978457 Pulled By: apaszke fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20

…ytorch#9807) Summary: Based on top of pytorch#9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: pytorch#9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7

apaszke requested review from colesbury, ezyang, gchanan, soumith and zdevito as code owners July 24, 2018 14:53

apaszke force-pushed the jit_stack_executor branch from b34dccf to 49916a6 Compare July 24, 2018 18:37

facebook-github-bot reviewed Jul 24, 2018

View reviewed changes

apaszke mentioned this pull request Jul 25, 2018

Unify IR operator representation (stop using attributes in the JIT) #9807

Closed

facebook-github-bot reviewed Jul 25, 2018

View reviewed changes

zdevito added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 25, 2018

zdevito reviewed Jul 25, 2018

View reviewed changes

apaszke force-pushed the jit_stack_executor branch from 67b6a51 to 668888b Compare July 26, 2018 03:30

zdevito approved these changes Jul 26, 2018

View reviewed changes

apaszke added 7 commits July 26, 2018 06:36

Make GraphExecutors work on Stacks instead of variable_tensor_lists

78281fc

Start tracking non tensor arguments in ArgumentSpec

c7e9eba

Implement IValue captures for GraphExecutorAutogradFunction

c9481dd

Fix ASAN errors

8f4e5af

Fix warning in tracer.cpp

75fea4c

Review comments

c510976

Fix constness issues in stack.h

f2c0a22

apaszke force-pushed the jit_stack_executor branch from 668888b to f2c0a22 Compare July 26, 2018 13:40

facebook-github-bot reviewed Jul 26, 2018

View reviewed changes

facebook-github-bot closed this in e39c804 Jul 26, 2018

apaszke deleted the jit_stack_executor branch July 26, 2018 19:06

zou3519 mentioned this pull request Jul 31, 2018

3x regression in JIT LSTM speeds #10032

Closed

ezyang added open source merged labels Jun 24, 2019

Conversation

apaszke commented Jul 24, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

apaszke commented Jul 25, 2018

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants