Use comparison key in OpSchema to avoid duplicate work between __hash__ and __eq__#161234
Use comparison key in OpSchema to avoid duplicate work between __hash__ and __eq__#161234swolchok wants to merge 2 commits intogh/swolchok/789/basefrom
__hash__ and __eq__#161234Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161234
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 513deee with merge base a85711d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| self._comparison_key = (self.op, args_to_hash) | ||
|
|
||
| def __hash__(self) -> int: | ||
| return hash(self._comparison_key) |
There was a problem hiding this comment.
NOTE: this must be used as a read only data class
if we take this literally, then i'd say why not precompute the hash value and store that instead of storing the tuple. Are you just hedging here, in case someone is mutating fields?
There was a problem hiding this comment.
I think this is likely due to the function _inplace_rewrap_schema_suggestion where args and kwargs are mutated. This function is quite old. We can mark a TODO here and come back when we cleanup the DTensor prop logic and make OpSchema immutable.
There was a problem hiding this comment.
ok, this sounds reasonable to me. lets do it. i'm stamping. @XilunWu can you open a PR to just add this todo?
| if len(self.args_schema) != len(other.args_schema): | ||
| return False | ||
|
|
||
| # compare each element and early return if any of them is different |
There was a problem hiding this comment.
although the code here isn't directly moved into comparison key, i confirmed that the args_to_hash and kwargs_to_hash members are equivalent to the logic below, this LGTM
| new_arg_schema.append(arg) | ||
| self.args_schema = tuple(new_arg_schema) | ||
| self.kwargs_schema = origin_schema.kwargs_schema | ||
| self._recompute_comparison_key() |
There was a problem hiding this comment.
hmm, how did you confirm these are the only 2 places to call recompute? ideally we don't have to call it anywhere but post_init but this makes me wonder
There was a problem hiding this comment.
It needs to be called anywhere somebody mutates args_schema or kwargs_schema. It's already been "supposed to be readonly" for years and I took that as a given. If you want it validated exhaustively then we will have to skip this optimization (but note that for any Python class, somebody could break your invariants if they wanted to by messing with stuff they've been requested not to).
There was a problem hiding this comment.
I am a bit concerned about this, _inplace_rewrap_schema_suggestion where the OpSchema object get muted inside, is used here:
pytorch/torch/distributed/tensor/_sharding_prop.py
Lines 361 to 367 in 47d2673
Though, we don't need the hashing for suggestion_schema afterwards.
There was a problem hiding this comment.
I was going to suggest add self._recompute_comparison_key() at the end of _inplace_rewrap_schema_suggestion for now to be safe, but this will for sure introduce extra overhead than before.
There was a problem hiding this comment.
suggest add self._recompute_comparison_key() at the end of _inplace_rewrap_schema_suggestion for now to be safe
Great suggestion. We are replying to a comment thread about the line that implements it. :)
|
Starting merge as part of PR stack under #161285 |
`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3. Pull Request resolved: #161235 Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj ghstack dependencies: #161231, #161234
…161240) get_write_alias() call count reduction explained briefly in code comment. We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check. Pull Request resolved: #161240 Approved by: https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235
…ly_alias_match (#161284) Containers are truthy iff they're non-empty. Pull Request resolved: #161284 Approved by: https://github.com/Skylion007, https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235, #161240
…h__` and `__eq__` (pytorch#161234) The performance cost of `dict` lookups keyed by `OpSchema` is a significant minority of DTensor overhead. With this change we shave a net ~1% off the total running time of the benchmark from pytorch#160580, as measured by using cProfile and comparing cumulative time spent in propagate + OpSchema's `__post_init__`. (`__post_init__` grew from 2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)). Pull Request resolved: pytorch#161234 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231
`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3. Pull Request resolved: pytorch#161235 Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj ghstack dependencies: pytorch#161231, pytorch#161234
…ytorch#161240) get_write_alias() call count reduction explained briefly in code comment. We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check. Pull Request resolved: pytorch#161240 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235
…ly_alias_match (pytorch#161284) Containers are truthy iff they're non-empty. Pull Request resolved: pytorch#161284 Approved by: https://github.com/Skylion007, https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240
…#161285) Drives down the overhead of return_and_correct_storage_aliasing slightly. Hopefully you'll agree it doesn't compromise readability. Pull Request resolved: pytorch#161285 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240, pytorch#161284
Stack from ghstack (oldest at bottom):
is, not ==, to check exact type matches in _python_dispatch #161304__hash__and__eq__#161234The performance cost of
dictlookups keyed byOpSchemais asignificant minority of DTensor overhead. With this change we shave a
net ~1% off the total running time of the benchmark from #160580, as
measured by using cProfile and comparing cumulative time spent in
propagate + OpSchema's
__post_init__. (__post_init__grew from2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)).
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta