Skip to content

[DTensor] layernorm output meta#175652

Closed
pianpwk wants to merge 2 commits intogh/pianpwk/100/basefrom
gh/pianpwk/100/head
Closed

[DTensor] layernorm output meta#175652
pianpwk wants to merge 2 commits intogh/pianpwk/100/basefrom
gh/pianpwk/100/head

Conversation

@pianpwk
Copy link
Copy Markdown
Contributor

@pianpwk pianpwk commented Feb 24, 2026

Stack from ghstack (oldest at bottom):

autoparallel has its own layernorm fwd/bwd registrations, because pytorch's version reports the wrong tensor meta for (out, mean, rstd), by coping out's meta to all 3. This creates separate metas for each.

The other issue is layernorm always produces contiguous outputs - fixes this too.

https://github.com/meta-pytorch/autoparallel/blob/454780d2a27456a380c0d8e997c8fc2cf82ef5d8/autoparallel/shardings/propagation_rules.py#L460-L611

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 24, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175652

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1733897 with merge base e81980e (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pianpwk added a commit that referenced this pull request Feb 24, 2026
ghstack-source-id: 258ae66
Pull Request resolved: #175652
@pianpwk pianpwk changed the title [DTensor] layernorm/min/max output meta [DTensor] layernorm output meta Feb 25, 2026
@pianpwk pianpwk marked this pull request as ready for review February 25, 2026 20:46
@pianpwk
Copy link
Copy Markdown
Contributor Author

pianpwk commented Feb 25, 2026

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 25, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pianpwk
Copy link
Copy Markdown
Contributor Author

pianpwk commented Feb 26, 2026

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

postmath pushed a commit to postmath/pytorch that referenced this pull request Feb 26, 2026
autoparallel has its own layernorm fwd/bwd registrations, because pytorch's version reports the wrong tensor meta for (out, mean, rstd), by coping out's meta to all 3. This creates separate metas for each.

The other issue is layernorm always produces contiguous outputs - fixes this too.

https://github.com/meta-pytorch/autoparallel/blob/454780d2a27456a380c0d8e997c8fc2cf82ef5d8/autoparallel/shardings/propagation_rules.py#L460-L611
Pull Request resolved: pytorch#175652
Approved by: https://github.com/wconstab
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
ghstack-source-id: d09bfc5
Pull Request resolved: pytorch/pytorch#175652
pianpwk added a commit to pianpwk/autoparallel that referenced this pull request Mar 18, 2026
…_layer_norm, and native_layer_norm_backward

These three rules were carried as local overrides in autoparallel while
upstream PyTorch lacked proper handling:

- constant_pad_nd: non-replicate strategy filtering on padded dims
  (upstreamed in pytorch/pytorch#175656)
- native_layer_norm forward: correct per-output shapes and contiguous
  strides (upstreamed in pytorch/pytorch#175652)
- native_layer_norm backward: contiguous stride handling for grad_input
  (upstreamed in a companion PR to pytorch/pytorch)

With all three fixes now in upstream PyTorch, the overrides can be
removed and autoparallel defers to the upstream register_op_strategy
implementations.

Authored with Claude.
@github-actions github-actions Bot deleted the gh/pianpwk/100/head branch March 29, 2026 02:23
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
autoparallel has its own layernorm fwd/bwd registrations, because pytorch's version reports the wrong tensor meta for (out, mean, rstd), by coping out's meta to all 3. This creates separate metas for each.

The other issue is layernorm always produces contiguous outputs - fixes this too.

https://github.com/meta-pytorch/autoparallel/blob/454780d2a27456a380c0d8e997c8fc2cf82ef5d8/autoparallel/shardings/propagation_rules.py#L460-L611
Pull Request resolved: pytorch#175652
Approved by: https://github.com/wconstab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants