Implement MM fusion (MM with add reduction tree) by apaszke · Pull Request #4615 · pytorch/pytorch

apaszke · 2018-01-11T21:46:28Z

A tree where leaves are matrix multiplies and inner
vertices are adds can be computed as a single mm.
Such subgraph often appear in backward if a single weight
is reused multiple times (e.g. in RNNs).

NOTE: this seems to be slightly slower on the GPU than the
naive implementation, but it's a huge win on the CPU
(think 100x lower overhead)

A tree where leaves are matrix multiplies and inner vertices are adds can be computed as a single mm. Such subgraph often appear in backward if a single weight is reused multiple times (e.g. in RNNs). NOTE: this seems to be slightly slower on the GPU than the naive implementation, but it's a huge win on the CPU (think 100x lower overhead)

pytorchbot · 2018-01-11T21:46:29Z

@apaszke, thanks for your PR! We identified @zdevito to be a potential reviewer.

Sign in to view

+// This pass looks for trees in the graph, where leaves are mm ops, and the inner
+// vertices are add nodes. Once we have such a tree they can be reduced to two
+// concats and a single mm (basically into a single multiply of a wide matrix, with
+// a tall matrix).


Sign in to view

+// a tall matrix).
+// Such patterns show up mostly in backward of RNNs, since the derivative of many
+// uses of matrix multiplies with same weights forms exactly such a tree
+// (note that it's usually also highly imbalanced i.e. has O(n) depth).


Sign in to view

+  return arr;
+}
+
+struct TreeToken {


Sign in to view

+  std::array<int64_t, 2> lhs_sizes;
+  std::array<int64_t, 2> rhs_sizes;
+  Node *node = nullptr;
+  bool valid = false;


Sign in to view

+  Node *node = nullptr;
+  bool valid = false;
+
+  static TreeToken from_mm(Node *mm) {


Sign in to view

+// TreeTokens will be used to label nodes of the graph, if the nodes will fit
+// our mm/add tree pattern. Basically we do dynamic programming on DAGs, where
+// when we reach node N with inputs A and B, then A and B have already been
+// procesed, and we can try to unify their TreeTokens (if they have them)


zdevito

This looks good! Clean and understandable.

Sign in to view

+//                 |  R2  |
+//                 |      |
+//                 +------+
+// +------+------+ +------+


Sign in to view

+// If we ever get around implementing this, the right solution is probably to fuse
+// MMs for the common part, and assume it's an input leaf for the outer two parts
+// (I don't think it's beneficial to recompute, unless the subtree is super small,
+// but let's not get into such details).


Sign in to view

+    // See Note [Overlapping trees]
+    if (&l == &r || !l.is_root || !r.is_root)
+      return token;
+    // We can batch the tree only if all sizes match, because we need to


Sign in to view

+      // See Note [Overlapping trees] (regarding the uses().size() == 1 check)
+      // We could treat a subtree with multiple uses as if it was overlapping.
+      if (lhs_it != tokens.end() && rhs_it != tokens.end() &&
+          lhs->output()->uses().size() == 1 && rhs->output()->uses().size() == 1) {


Sign in to view

+      if (lhs_it != tokens.end() && rhs_it != tokens.end() &&
+          lhs->output()->uses().size() == 1 && rhs->output()->uses().size() == 1) {
+        if (auto token = TreeToken::unify(node, lhs_it->second, rhs_it->second))
+          tokens[node] = token;


Sign in to view

+// topological order and labeling nodes with TreeTokens. Then, we look for roots of
+// the trees we formed and fuse them.
+
+enum class Side {


apaszke · 2018-01-13T10:29:15Z

Hmm the Windows contbuilds seem to be failing at test stage... Any ideas for non-standard things I could have used? cc: @peterjc123

Sign in to view

  const std::vector<std::int64_t>& strides() const { return strides_; }

-  TypePtr withSizesStrides(const std::vector<std::int64_t>& sizes, const std::vector<std::int64_t>& strides) const {
+  TypePtr withSizesStrides(at::IntList sizes, at::IntList strides) const {


peterjc123 · 2018-01-13T16:19:47Z

@apaszke Sorry, I don't have too much idea on how to debug this stuff. If this one can't pass, then how about skipping the tests, listing it in #4092 and waiting for future fixs?

Implement MM fusion (MM with add reduction tree) A tree where leaves are matrix multiplies and inner vertices are adds can be computed as a single mm. Such subgraph often appear in backward if a single weight is reused multiple times (e.g. in RNNs). NOTE: this seems to be slightly slower on the GPU than the naive implementation, but it's a huge win on the CPU (think 100x lower overhead)

apaszke requested review from ezyang and zdevito January 11, 2018 21:46

onnxbot-worker-3 mentioned this pull request Jan 11, 2018

[auto] pytorch-pr-4615 onnxbot/onnx-fb-universe#259

Closed

ezyang reviewed Jan 11, 2018

View reviewed changes

Comment thread torch/csrc/jit/passes/batch_mm.cpp

return arr;

}

struct TreeToken {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 11, 2018

View reviewed changes

Comment thread torch/csrc/jit/passes/batch_mm.cpp Outdated

std::array<int64_t, 2> lhs_sizes;

std::array<int64_t, 2> rhs_sizes;

Node *node = nullptr;

bool valid = false;

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jan 11, 2018

View reviewed changes

Comment thread torch/csrc/jit/passes/batch_mm.cpp Outdated

Node *node = nullptr;

bool valid = false;

static TreeToken from_mm(Node *mm) {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Improve comments

33685d8

ezyang reviewed Jan 12, 2018

View reviewed changes

zdevito approved these changes Jan 12, 2018

View reviewed changes

apaszke added 2 commits January 13, 2018 07:15

Fix comments

7eece2e

Add const to static variables

155c04d

peterjc123 reviewed Jan 13, 2018

View reviewed changes

Skip tests on Windows

e1c9b8d

yf225 mentioned this pull request Jan 15, 2018

Missing components / tests on Windows #4092

Closed

13 tasks

apaszke merged commit 1a02d3a into master Jan 17, 2018

apaszke deleted the batch_mm branch January 17, 2018 20:36

ezyang added the open source label Jun 24, 2019

Conversation

apaszke commented Jan 11, 2018

Uh oh!

pytorchbot commented Jan 11, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke commented Jan 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

peterjc123 commented Jan 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

apaszke commented Jan 13, 2018 •

edited

Loading