Skip to content

Use comparison key in OpSchema to avoid duplicate work between __hash__ and __eq__#161234

Closed
swolchok wants to merge 2 commits intogh/swolchok/789/basefrom
gh/swolchok/789/head
Closed

Use comparison key in OpSchema to avoid duplicate work between __hash__ and __eq__#161234
swolchok wants to merge 2 commits intogh/swolchok/789/basefrom
gh/swolchok/789/head

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Aug 22, 2025

Stack from ghstack (oldest at bottom):

The performance cost of dict lookups keyed by OpSchema is a
significant minority of DTensor overhead. With this change we shave a
net ~1% off the total running time of the benchmark from #160580, as
measured by using cProfile and comparing cumulative time spent in
propagate + OpSchema's __post_init__. (__post_init__ grew from
2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)).

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161234

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 513deee with merge base a85711d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 22, 2025
@swolchok swolchok requested review from XilunWu and wanchaol and removed request for wanchaol August 22, 2025 03:16
@swolchok swolchok added the topic: not user facing topic category label Aug 22, 2025
@swolchok swolchok requested a review from wanchaol August 22, 2025 16:05
[ghstack-poisoned]
self._comparison_key = (self.op, args_to_hash)

def __hash__(self) -> int:
return hash(self._comparison_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: this must be used as a read only data class

if we take this literally, then i'd say why not precompute the hash value and store that instead of storing the tuple. Are you just hedging here, in case someone is mutating fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely due to the function _inplace_rewrap_schema_suggestion where args and kwargs are mutated. This function is quite old. We can mark a TODO here and come back when we cleanup the DTensor prop logic and make OpSchema immutable.

Copy link
Contributor

@wconstab wconstab Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this sounds reasonable to me. lets do it. i'm stamping. @XilunWu can you open a PR to just add this todo?

if len(self.args_schema) != len(other.args_schema):
return False

# compare each element and early return if any of them is different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although the code here isn't directly moved into comparison key, i confirmed that the args_to_hash and kwargs_to_hash members are equivalent to the logic below, this LGTM

new_arg_schema.append(arg)
self.args_schema = tuple(new_arg_schema)
self.kwargs_schema = origin_schema.kwargs_schema
self._recompute_comparison_key()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, how did you confirm these are the only 2 places to call recompute? ideally we don't have to call it anywhere but post_init but this makes me wonder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be called anywhere somebody mutates args_schema or kwargs_schema. It's already been "supposed to be readonly" for years and I took that as a given. If you want it validated exhaustively then we will have to skip this optimization (but note that for any Python class, somebody could break your invariants if they wanted to by messing with stuff they've been requested not to).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about this, _inplace_rewrap_schema_suggestion where the OpSchema object get muted inside, is used here:

suggestion_schema = None
if needs_redistribute:
suggestion_schema = OpSchema(
op_schema.op, tuple(expected_input_specs), {}
)
suggestion_schema._inplace_rewrap_schema_suggestion(op_schema)

Though, we don't need the hashing for suggestion_schema afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest add self._recompute_comparison_key() at the end of _inplace_rewrap_schema_suggestion for now to be safe, but this will for sure introduce extra overhead than before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest add self._recompute_comparison_key() at the end of _inplace_rewrap_schema_suggestion for now to be safe

Great suggestion. We are replying to a comment thread about the line that implements it. :)

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #161285

pytorchmergebot pushed a commit that referenced this pull request Aug 25, 2025
`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3.

Pull Request resolved: #161235
Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj
ghstack dependencies: #161231, #161234
pytorchmergebot pushed a commit that referenced this pull request Aug 25, 2025
…161240)

get_write_alias() call count reduction explained briefly in code comment.

We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check.
Pull Request resolved: #161240
Approved by: https://github.com/wconstab
ghstack dependencies: #161231, #161234, #161235
pytorchmergebot pushed a commit that referenced this pull request Aug 25, 2025
…ly_alias_match (#161284)

Containers are truthy iff they're non-empty.
Pull Request resolved: #161284
Approved by: https://github.com/Skylion007, https://github.com/wconstab
ghstack dependencies: #161231, #161234, #161235, #161240
pytorchmergebot pushed a commit that referenced this pull request Aug 25, 2025
Drives down the overhead of return_and_correct_storage_aliasing slightly. Hopefully you'll agree it doesn't compromise readability.
Pull Request resolved: #161285
Approved by: https://github.com/wconstab
ghstack dependencies: #161231, #161234, #161235, #161240, #161284
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…h__` and `__eq__` (pytorch#161234)

The performance cost of `dict` lookups keyed by `OpSchema` is a
significant minority of DTensor overhead. With this change we shave a
net ~1% off the total running time of the benchmark from pytorch#160580, as
measured by using cProfile and comparing cumulative time spent in
propagate + OpSchema's `__post_init__`. (`__post_init__` grew from
2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)).

Pull Request resolved: pytorch#161234
Approved by: https://github.com/wconstab
ghstack dependencies: pytorch#161231
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3.

Pull Request resolved: pytorch#161235
Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj
ghstack dependencies: pytorch#161231, pytorch#161234
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ytorch#161240)

get_write_alias() call count reduction explained briefly in code comment.

We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check.
Pull Request resolved: pytorch#161240
Approved by: https://github.com/wconstab
ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…#161285)

Drives down the overhead of return_and_correct_storage_aliasing slightly. Hopefully you'll agree it doesn't compromise readability.
Pull Request resolved: pytorch#161285
Approved by: https://github.com/wconstab
ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240, pytorch#161284
@github-actions github-actions bot deleted the gh/swolchok/789/head branch September 25, 2025 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor Merged oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants