[wip] update graph fuser aliasdb in-place by suo · Pull Request #37106 · pytorch/pytorch

suo · 2020-04-22T21:32:50Z

Stack from ghstack:

[wip] update graph fuser aliasdb in-place #37106 [wip] update graph fuser aliasdb in-place

Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

x, y = f(a, b, c)

and end with:

x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.

This PR is the bare minimum to get this working, in the interest of
unscrewing the compilation times ASAP.

Followups I want to do:

We don't have a way of expressing deletion of values in AliasDb. In
graph_fuser.cpp we sometimes construct nodes that we end up throwing
away, and we are littering MemoryDAG with references to dangling
pointers. Because of the way the pass works, it's fine, but this is
fragile so I want to fix it.
We should decouple alias analysis from write tracking, to simplify the
job of keeping the write caches consistent as we mutate the aliasing
information.
the tensorexpr fuser doesn't do this and thus is incorrect today, we
need to update it to work.

Differential Revision: D21219179

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. [ghstack-poisoned]

dr-ci · 2020-04-22T21:37:31Z

💊 Build failures summary and remediations

As of commit 953d48e (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 45 times.

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: 219fb16 Pull Request resolved: #37106

eellison · 2020-04-23T00:29:32Z

I think this code is in the same place as shape analysis where it's soon to be deleted and it's not on by default anywhere. Could you instead submit a PR speeduping of https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/tensorexpr_fuser.cpp if it's warranted ?

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. [ghstack-poisoned]

suo · 2020-04-24T00:08:14Z

I think this code is in the same place as shape analysis where it's soon to be deleted and it's not on by default anywhere. Could you instead submit a PR speeduping of https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/tensorexpr_fuser.cpp if it's warranted ?

This pass is still what the existing fusion compiler uses, and thus what people who are trying to fuse code today uses, so we need these changes. I will also fix up the texpr fuser too though.

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. After this change, on the baseline introduced in #36345 we go from ~193s to ~5.2s, bringing us to a reasonable range (although there are still substantial improvements that can be made). Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: e9b36de Pull Request resolved: #37106

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Differential Revision: [D21219179](https://our.internmc.facebook.com/intern/diff/D21219179) [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. After this change, on the baseline introduced in #36345 we go from ~193s to ~5.2s, bringing us to a reasonable range (although there are still substantial improvements that can be made). Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: 268558c Pull Request resolved: #37106

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Differential Revision: [D21219179](https://our.internmc.facebook.com/intern/diff/D21219179) [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. After this change, on the baseline introduced in #36345 we go from ~193s to ~5.2s, bringing us to a reasonable range (although there are still substantial improvements that can be made). Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: 74f3839 Pull Request resolved: #37106

eellison

Is there a POC for graph_fuser who can verify some of these changes ? If not I can try to understand all of how the graph fuser works but I have not done that yet. or a high level explanation of why it's correct. f and prim::fusionGroup are purely functional may be true i'm not convinced all inputs and outputs don't have writers.

eellison · 2020-04-28T23:48:13Z

-void AliasDb::unsafeGiveFreshAlias(const Value* value) {
+void AliasDb::createValue(const Value* value) {
  TORCH_INTERNAL_ASSERT(isMutableTypeInternal(value->type()));
-  TORCH_INTERNAL_ASSERT(value->type()->containedTypes().size() == 0);


Why were these assertions removed ?

Adding contained types stuff to the mutation API wasn't necessary for this PR—but I think it's a good followup/could do it for this one if you feel that it's important.

sure it doesn't need to be done in this PR

eellison · 2020-04-28T23:51:21Z

  return ret;
 }

+void Lint(const AliasDb* db) {


It's worth noting that not all passes maintain the invariants here as they are mutating the graph. For example, constant propagation inserts new Tensor Constants which will not have an element in the memory dag.

eellison · 2020-04-28T23:52:52Z

+   * `Lint()`ing the AliasDb can help with this.
+   */
+  // Copy `existing`s aliasing info to `new_value`, and remove `existing`.
+  void replaceWithNewValue(Value* existing, Value* new_value);


It's a little scary to make this a public api. I would maybe prepend _ or __ to all of these methods. Or name them internal. Now that they are public we no longer need friend struct MutationRemover.

I put a big warning in the comment,idk . The reason they are named this way is that it's the same as the LLVM alias analysis update API. I suppose we can add a _ or something but idk if it will deter a sufficiently motivated downstream user.

I believe we still need to friend MutationRemover because it manipulates write caches as well.

I would maybe put something an internal clarification here, since we don't specify anywhere what updating AliasDb correctly means.

eellison · 2020-04-28T23:55:00Z

      // dag so we must regive x an alias db element. We have already verified
      // that the mutated value is a fresh alias with a single use.
-      aliasDb_->unsafeGiveFreshAlias(mutated_value);
+      aliasDb_->createValue(mutated_value);


is this not still unsafe? the invariants for when this is okay are still very specific: the value has to not have contained elements and be unaliased / mutated in the graph.

See above, I wanted to basically copy an existing API so people would know what it did.

eellison · 2020-04-28T23:56:38Z

    auto* g = inputs[0]->owningGraph();
    auto* input_list =
        g->insertNode(g->createList(TensorType::get(), inputs))->output();
+    aliasDb_->createValue(input_list);


It's a little unfortunate that we're creating a new value with a contained element

eellison · 2020-04-29T00:01:19Z

+  // Two checks that we want to add but can't until the mutation API is more
+  // fully developed.
+  // - Every mutable value in the aliasdb belongs to the graph
+  // - All container values have contained elements


I think we're not checking this bc we break this invariant as part of this PR.

Yeah, the comment notes that it is not yet added pending a more robust mutation API.

suo · 2020-04-29T01:16:50Z

f and prim::fusionGroup are purely functional may be true i'm not convinced all inputs and outputs don't have writers.

The purpose of copying or replacing aliasing information is to preserve the information that some inputs or outputs have writers.

Other than that, I tried to provide a high level description in the PR summary of what graph fusion is doing and why I think the changes are correct. The lint pass ensures that we're not missing the propagation of aliasing information, so the main place where we might be wrong is that we're copying the wrong aliasing information—but our replacements follow replaceAllUsesWith so it seems pretty safe. Maybe @zdevito can take a closer look?

eellison

LGTM, this increases complexity but at a great speedup. As you said the fusion graphs are functional so they're shouldn't be too much to worry about here. This is also off by default.

eellison · 2020-04-29T17:13:31Z

-void AliasDb::unsafeGiveFreshAlias(const Value* value) {
+void AliasDb::createValue(const Value* value) {
  TORCH_INTERNAL_ASSERT(isMutableTypeInternal(value->type()));
-  TORCH_INTERNAL_ASSERT(value->type()->containedTypes().size() == 0);


sure it doesn't need to be done in this PR

eellison · 2020-04-29T17:17:34Z

+   * `Lint()`ing the AliasDb can help with this.
+   */
+  // Copy `existing`s aliasing info to `new_value`, and remove `existing`.
+  void replaceWithNewValue(Value* existing, Value* new_value);


I would maybe put something an internal clarification here, since we don't specify anywhere what updating AliasDb correctly means.

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Differential Revision: [D21219179](https://our.internmc.facebook.com/intern/diff/D21219179) [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. After this change, on the baseline introduced in #36345 we go from ~193s to ~5.2s, bringing us to a reasonable range (although there are still substantial improvements that can be made). Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: f80e060 Pull Request resolved: #37106

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Differential Revision: [D21219179](https://our.internmc.facebook.com/intern/diff/D21219179) [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. After this change, on the baseline introduced in #36345 we go from ~193s to ~5.2s, bringing us to a reasonable range (although there are still substantial improvements that can be made). Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: a29befa Pull Request resolved: #37106

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Differential Revision: [D21219179](https://our.internmc.facebook.com/intern/diff/D21219179) [ghstack-poisoned]

Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b, c) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. After this change, on the baseline introduced in #36345 we go from ~193s to ~5.2s, bringing us to a reasonable range (although there are still substantial improvements that can be made). Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. ghstack-source-id: eba073e Pull Request resolved: #37106

facebook-github-bot · 2020-05-01T06:18:04Z

@suo merged this pull request in 9e32a1f.

…tes to aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. [ghstack-poisoned]

…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. [ghstack-poisoned]

…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]

…42141) Summary: Pull Request resolved: #42141 Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D22798377 Pulled By: eellison fbshipit-source-id: 9a133bcaa3b051c0fb565afb23a3eed56dbe71f9

Summary: Pull Request resolved: pytorch#37106 Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Test Plan: Imported from OSS Differential Revision: D21219179 Pulled By: suo fbshipit-source-id: 8ae5397b3a0ad90edec2fbc555647091f1ad5284

…ytorch#42141) Summary: Pull Request resolved: pytorch#42141 Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from pytorch#37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D22798377 Pulled By: eellison fbshipit-source-id: 9a133bcaa3b051c0fb565afb23a3eed56dbe71f9

suo requested a review from apaszke as a code owner April 22, 2020 21:32

suo mentioned this pull request Apr 22, 2020

[jit] speed up alias analysis #36345

Closed

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Apr 22, 2020

suo changed the title ~~[jit] update graph fuser aliasdb in-place~~ [wip] update graph fuser aliasdb in-place Apr 23, 2020

suo mentioned this pull request Apr 24, 2020

add fmt #37123

Closed

suo requested a review from eellison April 24, 2020 17:13

suo linked an issue Apr 24, 2020 that may be closed by this pull request

[JIT] Huge delay (1274s vs 0.031s) when running scripted model #36040

Open

eellison reviewed Apr 29, 2020

View reviewed changes

eellison approved these changes Apr 29, 2020

View reviewed changes

facebook-github-bot closed this in 9e32a1f May 1, 2020

facebook-github-bot added the merged label May 1, 2020

facebook-github-bot deleted the gh/suo/306/head branch May 4, 2020 14:17

eellison mentioned this pull request Jul 27, 2020

[JIT] Make create autodiff subgraphs do in place updates to aliasDb #42141

Closed

mruberry added the Merged label Oct 28, 2020

Conversation

suo commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 Build failures summary and remediations

Uh oh!

eellison commented Apr 23, 2020

Uh oh!

suo commented Apr 24, 2020

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

suo commented Apr 29, 2020

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented May 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

suo commented Apr 22, 2020 •

edited

Loading

dr-ci Bot commented Apr 22, 2020 •

edited

Loading