Generate Dynamic Shapes#37693
Conversation
💊 CI failures summary and remediationsAs of commit 399704a (more details on the Dr. CI page):
1 job timed out:
🚧 1 ongoing upstream failure:These were probably caused by upstream breakages that are not fixed yet:
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 33 times. |
zdevito
left a comment
There was a problem hiding this comment.
Partial review: I didn't get to profiling_record.cpp/h yet, Ill try to that part tonight.
| push(stack, pttp->isSubtypeOf(expected)); | ||
| const TypePtr& expected = af.types[inst.X]; | ||
| auto expected_type = expected->cast<TensorType>(); | ||
| bool pass = true; |
There was a problem hiding this comment.
This looks more complicated than it needs to be
There was a problem hiding this comment.
agreed this can be simplified.
| Function** functions; | ||
| std::function<void(std::vector<IValue>&)>* profile_functions; | ||
| TypePtr* types; | ||
| ShapeSymbolTable symbols2dims; |
There was a problem hiding this comment.
ActiveFrame is a way of keeping a bunch of pointers on the stack in the interpreter rather than reloading after every instruction. But here it is holding an entire hash table, which is destroyed every time we enter/exit a function. This symbol table should be local to Frame, not ActiveFrame, I believe.
There was a problem hiding this comment.
ah yes... i think i actually moved it in one of the branches :-( but I guess I lost it in one of the rebases.
| pass &= bindSymbolicShapes( | ||
| af, t.sizes(), expected_type->symbolic_sizes()); | ||
| if (pass) { | ||
| pttp = expected_type->merge(pttp, false); |
There was a problem hiding this comment.
I think the composition of bindSymbolicShapes, merging and isSubtypeOf here is not ideal. First, it is a bit hard to follow direct intent of tracking whether the guard matches. I am still not sure what merge, and false is suppose to be here. Second, this is in performance critical code. isSubtypeOf checks are slow, and there is a lot we can do here to combine the bindSymbolicShape, and isSubtypeOf together to make a single pass over the tensor. Finally, I am confused as to whether returning false from bindSymbolicShapes will cause the guard to fail. I think it should, since we found evidence of the program no longer passing its profiled model, but it doesn't directly lead to a push false, so I am not sure.
There was a problem hiding this comment.
it doesn't directly lead to a push false, so I am not sure.
yes it's not too obvious that expected_type either contains static or symbolic symbols and isSubtypeOf will fail when comparing with pttp which only has static symbols unless they are either the same or we run bindSymbolicShapes . I explicitly added && bound_successfully && to push.
zdevito
left a comment
There was a problem hiding this comment.
This is getting there -- inline comments highlight a few places where I can't follow the data structures making the logic difficult to understand.
| GRAPH_DEBUG(symbol, " is now bound to ", new_size); | ||
| new_symbols.push_back(symbol); | ||
| } else { | ||
| if (symbol_table.getValue(symbol) == new_sizes[i]->static_size()) { |
| c10::ShapeSymbol ShapeSymbolTable::toSymbol( | ||
| Dimension val, | ||
| std::map<Dimension, c10::ShapeSymbol>& dims2symbols, | ||
| ProfilingRecord* pr) { |
| c10::ShapeSymbol ShapeSymbolTable::GetSymbolInSet( | ||
| Dimension new_size, | ||
| c10::ShapeSymbol set, | ||
| ProfilingRecord* pr) { |
| namespace torch { | ||
| namespace jit { | ||
|
|
||
| c10::ShapeSymbol ShapeSymbolTable::toSymbol( |
There was a problem hiding this comment.
I don't understand the name toSymbol, getOrCreateSymbolForDim? Shouldn't this be a method on whatever owns dims2symbols? It is weird that this is taking the map as an argument.
There was a problem hiding this comment.
I removed toSymbol and added a comment to getSymbolInSet
| std::shared_ptr<Graph> profiled_graph_; | ||
| std::mutex mutex_; | ||
| size_t profiling_count_; | ||
| std::map<int64_t, std::map<Value*, TensorTypePtr>> profiled_types_per_frame_; |
There was a problem hiding this comment.
I can't follow this data structure in the code. There are no comments about what it stores, and it results in a lot of "x.first" and "x.second" confusion.
| void reset() { | ||
| sets_.clear(); | ||
| }; | ||
| std::map<c10::ShapeSymbol, std::map<Dimension, c10::ShapeSymbol>> sets_; |
There was a problem hiding this comment.
I don't understand what this stores, it is a nested map of ShapeSymbol to Dimension to Shape symbol?
There was a problem hiding this comment.
I added a comment.
We are implementing the algorithm we discussed but the implementation merges by every dimension location, not by a whole set at once. It makes the implementation so much simpler, but it's a bit of a break from the conceptual algorithm we discussed.
// A helper structure used for partitioning
// a set associated with ShapeSymbol into
// subsets each associated with a unique
// Dimension (dimension value)
// if the same symbol is
// assigned to multiple `Dimension` in one of the
// profiling runs the set of dimension locations
// associated with the set will be split into
// two subsets each associated with the new Dimension
// value
struct ShapeSymbolSets {
zdevito
left a comment
There was a problem hiding this comment.
This looks good now. I have minor nits about the speed along the hot path but that can be done in a different PR since one seems functionally correct and the profiler is not on by default.
| continue; | ||
| } | ||
| auto symbol = *sym_shapes[i]; | ||
| if (!isBound(symbol)) { |
There was a problem hiding this comment.
This is in the hot path, so be careful about hash table lookups, this is doing two lookups even though it only needs to be doing 1.
| TORCH_INTERNAL_ASSERT(!s.static_size()); | ||
| data_[s] = v; | ||
| } | ||
| std::map<c10::ShapeSymbol, Dimension> data_; |
There was a problem hiding this comment.
unordered map? A red-black tree along the fast path is going to be pretty slow.
| // to assign dimension values to ShapeSymbols | ||
| // and fail a guard if the same symbol | ||
| // is assigned more than one dimension value. | ||
| struct ShapeSymbolTable { |
There was a problem hiding this comment.
A class used by the interpreter but not specific to profiling should probably go in a different file.
| // * For every two runs, we iterate over all `symbic_shapes_` and compare their `ShapeSymbol`s in the same position. | ||
| // * if we observe that for every dimension location that has the`ShapeSymbol S1` in run #1 there is **only one** `ShapeSymbol S2` in the same dimension location in run #2, we conclude that the invariant holds. | ||
| // * However, if we observe some dimension locations in run #2 have `ShapeSymbol S2` and the other ones have `ShapeSymbol S3` we would like to partition the virtual set of dimension locations associated with `ShapeSymbol S1` into two new subsets, so the invariant holds. | ||
| // * The partitioning works by assigning a new symbol to the dimension locations (associated with `ShapeSymbol S1`) that have `ShapeSymbol S2` and another new symbol to the dimension locations that have `ShapeSymbol S3`. In other words, |
There was a problem hiding this comment.
Thanks for adding this comment, it makes it much clearer what is happening in the merging algorithm.
d612343 to
6160115
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
51e8b78 to
2d8035b
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
address Zach's feedback address Zach feedback rename data structures; cleanly separate ShapeSymbolTable from PartitioningSetHelper address feedback add hash function for shapesymbols fix bbrks rollback 3rd party deps swith back to map clang-format
2d8035b to
399704a
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@Krovatkin merged this pull request in 42870dd. |
Summary: Yay! Pull Request resolved: pytorch#37693 Differential Revision: D21641663 Pulled By: Krovatkin fbshipit-source-id: 64e70138b31800371887d24ceb1c5d18945b4412
Yay!