Skip to content

Topologically-safe node moves#13026

Closed
suo wants to merge 4 commits intomasterfrom
suo/mut-position
Closed

Topologically-safe node moves#13026
suo wants to merge 4 commits intomasterfrom
suo/mut-position

Conversation

@suo
Copy link
Copy Markdown
Member

@suo suo commented Oct 23, 2018

Add new methods to move a node before/after another node while preserving data data dependencies.

Any suggestions for a pithier name for the methods would be appreciated 😃

@suo suo added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 23, 2018
@suo suo requested a review from zdevito October 23, 2018 23:59
@suo suo force-pushed the suo/mut-position branch from afa7b1a to 8e89980 Compare October 24, 2018 00:00
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

michaelsuo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@suo suo force-pushed the suo/mut-position branch 2 times, most recently from 5cffb33 to c271b03 Compare October 25, 2018 23:34
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

michaelsuo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@suo suo force-pushed the suo/mut-position branch from c271b03 to 8c8bef3 Compare October 26, 2018 00:06
@yf225
Copy link
Copy Markdown
Contributor

yf225 commented Oct 26, 2018

@pytorchbot retest this please

Copy link
Copy Markdown
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! See comments for where I think there is a fixable bug. I also think it may be possible to remove the MoveSide thing by phrasing everything as moveBefore.

Comment thread torch/csrc/jit/ir.h Outdated
// Move 'this' (already in the graph) after 'n' in the topological order.
//
// Tries to preserve value dependencies, so other nodes might be moved. The
// only guarantee is that in the new ordering, `this` is after `n`.

This comment was marked as off-topic.

This comment was marked as off-topic.

Comment thread torch/csrc/jit/ir.cpp Outdated
}

// Does `this` output any values used by `nodes`?
template <typename T>

This comment was marked as off-topic.

This comment was marked as off-topic.

Comment thread torch/csrc/jit/ir.cpp Outdated

// Does the working set consume any values produced by `n`?
bool consumesFrom(Node* n) const {
const auto& users = getUsersSameBlock(n);

This comment was marked as off-topic.

Comment thread torch/csrc/jit/ir.cpp
Comment thread torch/csrc/jit/ir.cpp
Comment thread torch/csrc/jit/ir.cpp
Comment thread torch/csrc/jit/ir.cpp Outdated
bool Node::tryMove(Node* movePoint, MoveSide moveSide) {
JIT_ASSERT(this->inBlockList() && movePoint->inBlockList());
JIT_ASSERT(this->owningBlock() == movePoint->owningBlock());
JIT_ASSERT(this != movePoint);

This comment was marked as off-topic.

Comment thread torch/csrc/jit/ir.cpp
Comment thread torch/csrc/jit/ir.cpp
@suo suo force-pushed the suo/mut-position branch from 8c8bef3 to 961766e Compare October 26, 2018 18:44
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

michaelsuo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@suo suo force-pushed the suo/mut-position branch from 961766e to 0aac01e Compare October 26, 2018 20:20
@suo suo force-pushed the suo/mut-position branch from 0aac01e to 65c1823 Compare October 26, 2018 20:25
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

michaelsuo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@suo suo deleted the suo/mut-position branch October 27, 2018 00:19
@apaszke
Copy link
Copy Markdown
Contributor

apaszke commented Oct 28, 2018

How does this relate to the DynamicDAG data structure that @zou3519 added for autodiff subgraph search? Is this meant to be faster/easier to maintain?

@suo
Copy link
Copy Markdown
Member Author

suo commented Oct 29, 2018

There's a lot of overlap; I think the main difference is that the topological index is located directly in the graph and so it stays updated/correct by construction. This makes things simpler for pass authors, who only have to deal with the graph and its canonical topological ordering.

As it is currently implemented it seems impractical to embed DynamicDAG into the graph, because common operations like node insertion would become quite expensive.

I think a good path to converging would be to optimize DynamicDAG a bit around some of the performance cliffs, and then embed it into the IR so that passes don't have to construct their own copy every time.

@apaszke
Copy link
Copy Markdown
Contributor

apaszke commented Oct 30, 2018

Hmmm could you please comment a bit more on the performance characteristics of this solution that are lacking from the DynamicDAG? The algorithm generally is pretty solid, and generally it’s not necessarily true that insertions would become expensive. It’s also fundamentally an online solution with proofs of correctness, so it’s an appealing solution.

@zou3519
Copy link
Copy Markdown
Contributor

zou3519 commented Oct 30, 2018

@apaszke This solution solves a different problem from DynamicDAG

There is one main difference:

  1. moveAfterTopologicalValid/moveBeforeTopologicalValid care about if a node is directly before or after another node. DynamicDAG cares about if nodes come before or after (not necessarily directly before or after) each other.

edit: Sorry, I realized from reading the code that I don't know how moveAfterTopologicalValid/moveBeforeTopologicalValid work at all, so please ignore this comment. I will let @michaelsuo explain.

@zou3519
Copy link
Copy Markdown
Contributor

zou3519 commented Oct 30, 2018

As it is currently implemented it seems impractical to embed DynamicDAG into the graph, because common operations like node insertion would become quite expensive.

This is not true. We can modify DynamicDAG to have "gaps" between nodes in topological index, just like what you implemented last week, @michaelsuo, to make node insertions fast.

Comment thread test/cpp/jit/tests.h

// Test that moveAfter(n) and moveBefore(n->next()) are not necessarily
// equivalent. Here, the dependency ordering is n -> o -> p. So we can't
// move `n` after `o`, but we can move `n` before `p` (which pushes `o` after

This comment was marked as off-topic.

This comment was marked as off-topic.

@suo
Copy link
Copy Markdown
Member Author

suo commented Oct 30, 2018

The main performance concern was that addEdge is AR * log (AR), but the ord of new node is set to the end. So if you inserted new nodes anywhere except the end, AR could be quite large and you get quadratic insertion behavior.

My other concerns were around maintainability. I don't think pass authors should have to construct a separate data structure every time they want to do reordering, and having ord be out of sync with the the graph's canonical topological ordering is potentially confusing.

But like I originally said, I think all these problems are fixable by modifying DynamicDAG (adding "gaps" like @zou3519 describes) and then making it part of the graph interface. I think the ideal end state is we basically merge the topological index stuff and DynamicDAG together to implement moveBeforeTopologicallyValid.

@zdevito
Copy link
Copy Markdown
Contributor

zdevito commented Oct 30, 2018

I think it is also important to keep in mind that the reason for making these queries about valid moves centralized is because mutability is going to make a bunch of moves that appear to be topologically valid become invalid. We can to limit the number of places that have to interact with the metadata about mutating operators.

@ezyang ezyang added the merged label Jun 25, 2019
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Add new methods to move a node before/after another node while preserving data data dependencies.

Any suggestions for a pithier name for the methods would be appreciated 😃
Pull Request resolved: pytorch#13026

Differential Revision: D10854574

Pulled By: QueryConnectionException

fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants