Skip to content

[DTensor] Add InputDim.__eq__ type guard to prevent int comparison bugs#178599

Closed
stmcgovern wants to merge 1 commit intopytorch:mainfrom
stmcgovern:fix/dimspec-eq-typeguard
Closed

[DTensor] Add InputDim.__eq__ type guard to prevent int comparison bugs#178599
stmcgovern wants to merge 1 commit intopytorch:mainfrom
stmcgovern:fix/dimspec-eq-typeguard

Conversation

@stmcgovern
Copy link
Copy Markdown
Collaborator

Follow on for #177972 .

InputDim is a dataclass wrapping an int (input_dim), which makes it easy to accidentally compare shard.dim (an int) with an InputDim instance instead of InputDim.input_dim. This comparison always
returns False, making downstream logic dead code.

The original instance of this bug (shard.dim == in_dim) in propagate_shape_and_sharding was fixed in #177973. This change adds a structural guard: InputDim.eq raises TypeError when compared with non-DimSpec types (like int), directing the developer to use .input_dim instead. A matching hash preserves the hash contract. This prevents the bug class from recurring. @wconstab

@pytorch-bot pytorch-bot Bot added the release notes: distributed (dtensor) release notes category label Mar 27, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/178599

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 639a836 with merge base d3c36f0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

InputDim is a dataclass wrapping an int (input_dim), which makes it
easy to accidentally compare shard.dim (an int) with an InputDim
instance instead of InputDim.input_dim. This comparison silently
returns False, making downstream logic dead code.

The original instance of this bug (shard.dim == in_dim) survived over
3 years in propagate_shape_and_sharding before being fixed(pytorch#177973). This
change adds a structural guard: InputDim.__eq__ raises TypeError when
compared with non-DimSpec types (like int), directing the developer to
use .input_dim instead. A matching __hash__ (salted with the class to
avoid collisions with raw ints in dicts/sets) preserves the hash
contract. This prevents the bug class from recurring.
@stmcgovern stmcgovern force-pushed the fix/dimspec-eq-typeguard branch from e1d587b to 639a836 Compare March 27, 2026 15:02
@stmcgovern stmcgovern requested a review from wconstab March 27, 2026 17:05
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 27, 2026
@wconstab wconstab requested a review from pianpwk April 1, 2026 00:25
Copy link
Copy Markdown
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! let me ask claude to review just in case

@wconstab
Copy link
Copy Markdown
Contributor

wconstab commented Apr 1, 2026

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 1, 2026

Claude finished @wconstab's task in 3m 41s —— View job


Review of PR #178599

  • Read changed files and gather context
  • Review implementation in _view_ops.py
  • Review tests in test_view_ops.py
  • Post review feedback

Overall: This PR looks good. The change is a targeted defensive guard against a real bug class that went undetected for years. I reviewed all usages of InputDim in the codebase and confirmed no existing code paths will break.

Implementation (torch/distributed/tensor/_ops/_view_ops.py)

The __eq__ / __hash__ implementation is correct:

  • eq=False on the dataclass — correctly disables the auto-generated __eq__ so the custom one takes over.
  • isinstance(other, InputDim) path — correctly delegates to the wrapped input_dim field comparison.
  • not isinstance(other, DimSpec) path — raises TypeError for int, str, None, etc. This is the key guard. The error message is actionable ("Did you mean to use .input_dim?").
  • return NotImplemented fallback — correct for comparisons with other DimSpec subclasses (Singleton, Broadcast, etc.). Python will then try the reflected __eq__ on the other object, which will return NotImplemented as well, resulting in False. This is the right behavior.
  • __hash__ — uses hash((InputDim, self.input_dim)) which correctly salts with the type to avoid collisions with raw ints. Maintains the hash contract (equal objects have equal hashes).

One minor observation: __eq__ has a return type annotation of -> bool but can also return NotImplemented (which is not a bool). This is technically a mypy inconsistency, but it's the standard Python pattern for __eq__ and the file already has # mypy: allow-untyped-defs at the top, so this is fine in practice.

Safety verification

I audited all usages of InputDim in the codebase:

  • All existing equality comparisons are between two InputDim instances (e.g., test_view_ops.py:2066) — these work correctly.
  • All int comparisons in production code use .input_dim field access (e.g., _view_ops.py:797: placement.dim == input_dim.input_dim), not direct object comparison — these are unaffected.
  • InputDim is stored in lists/tuples, not sets/dicts, so the __hash__ change doesn't affect any existing containers.
  • The other InputDim class in torch/_export/passes/ is a completely separate NamedTuple and is unaffected.

Tests (test/distributed/tensor/test_view_ops.py)

Test coverage is thorough:

  • InputDim-to-InputDim equality and inequality
  • Hash contract validation (equal objects have equal hashes)
  • Hash salting (InputDim(0) doesn't hash like int 0)
  • Both directions of the TypeError: dim == 0 and 0 == dim
  • Regex match on the error message

No issues found. LGTM.


Copy link
Copy Markdown
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stmcgovern
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 1, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

IvanKobzarev pushed a commit to IvanKobzarev/pytorch that referenced this pull request Apr 3, 2026
…gs (pytorch#178599)

Follow on for pytorch#177972 .

InputDim is a dataclass wrapping an int (input_dim), which makes it easy to accidentally compare shard.dim (an int) with an InputDim instance instead of InputDim.input_dim. This comparison always
 returns False, making downstream logic dead code.

The original instance of this bug (shard.dim == in_dim) in propagate_shape_and_sharding was fixed in pytorch#177973. This change adds a structural guard: InputDim.__eq__ raises TypeError when compared with non-DimSpec types (like int), directing the developer to use .input_dim instead. A matching __hash__ preserves the hash contract. This prevents the bug class from recurring. @wconstab
Pull Request resolved: pytorch#178599
Approved by: https://github.com/wconstab
nklshy-aws pushed a commit to nklshy-aws/pytorch that referenced this pull request Apr 7, 2026
…gs (pytorch#178599)

Follow on for pytorch#177972 .

InputDim is a dataclass wrapping an int (input_dim), which makes it easy to accidentally compare shard.dim (an int) with an InputDim instance instead of InputDim.input_dim. This comparison always
 returns False, making downstream logic dead code.

The original instance of this bug (shard.dim == in_dim) in propagate_shape_and_sharding was fixed in pytorch#177973. This change adds a structural guard: InputDim.__eq__ raises TypeError when compared with non-DimSpec types (like int), directing the developer to use .input_dim instead. A matching __hash__ preserves the hash contract. This prevents the bug class from recurring. @wconstab
Pull Request resolved: pytorch#178599
Approved by: https://github.com/wconstab
bobrenjc93 pushed a commit to bobrenjc93/pytorch that referenced this pull request Apr 9, 2026
…gs (pytorch#178599)

Follow on for pytorch#177972 .

InputDim is a dataclass wrapping an int (input_dim), which makes it easy to accidentally compare shard.dim (an int) with an InputDim instance instead of InputDim.input_dim. This comparison always
 returns False, making downstream logic dead code.

The original instance of this bug (shard.dim == in_dim) in propagate_shape_and_sharding was fixed in pytorch#177973. This change adds a structural guard: InputDim.__eq__ raises TypeError when compared with non-DimSpec types (like int), directing the developer to use .input_dim instead. A matching __hash__ preserves the hash contract. This prevents the bug class from recurring. @wconstab
Pull Request resolved: pytorch#178599
Approved by: https://github.com/wconstab
bobrenjc93 pushed a commit to bobrenjc93/pytorch that referenced this pull request Apr 10, 2026
…gs (pytorch#178599)

Follow on for pytorch#177972 .

InputDim is a dataclass wrapping an int (input_dim), which makes it easy to accidentally compare shard.dim (an int) with an InputDim instance instead of InputDim.input_dim. This comparison always
 returns False, making downstream logic dead code.

The original instance of this bug (shard.dim == in_dim) in propagate_shape_and_sharding was fixed in pytorch#177973. This change adds a structural guard: InputDim.__eq__ raises TypeError when compared with non-DimSpec types (like int), directing the developer to use .input_dim instead. A matching __hash__ preserves the hash contract. This prevents the bug class from recurring. @wconstab
Pull Request resolved: pytorch#178599
Approved by: https://github.com/wconstab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: distributed (dtensor) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants