[DTensor] layernorm output meta#175652
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175652
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 1733897 with merge base e81980e ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
autoparallel has its own layernorm fwd/bwd registrations, because pytorch's version reports the wrong tensor meta for (out, mean, rstd), by coping out's meta to all 3. This creates separate metas for each. The other issue is layernorm always produces contiguous outputs - fixes this too. https://github.com/meta-pytorch/autoparallel/blob/454780d2a27456a380c0d8e997c8fc2cf82ef5d8/autoparallel/shardings/propagation_rules.py#L460-L611 Pull Request resolved: pytorch#175652 Approved by: https://github.com/wconstab
ghstack-source-id: d09bfc5 Pull Request resolved: pytorch/pytorch#175652
…_layer_norm, and native_layer_norm_backward These three rules were carried as local overrides in autoparallel while upstream PyTorch lacked proper handling: - constant_pad_nd: non-replicate strategy filtering on padded dims (upstreamed in pytorch/pytorch#175656) - native_layer_norm forward: correct per-output shapes and contiguous strides (upstreamed in pytorch/pytorch#175652) - native_layer_norm backward: contiguous stride handling for grad_input (upstreamed in a companion PR to pytorch/pytorch) With all three fixes now in upstream PyTorch, the overrides can be removed and autoparallel defers to the upstream register_op_strategy implementations. Authored with Claude.
autoparallel has its own layernorm fwd/bwd registrations, because pytorch's version reports the wrong tensor meta for (out, mean, rstd), by coping out's meta to all 3. This creates separate metas for each. The other issue is layernorm always produces contiguous outputs - fixes this too. https://github.com/meta-pytorch/autoparallel/blob/454780d2a27456a380c0d8e997c8fc2cf82ef5d8/autoparallel/shardings/propagation_rules.py#L460-L611 Pull Request resolved: pytorch#175652 Approved by: https://github.com/wconstab
Stack from ghstack (oldest at bottom):
autoparallel has its own layernorm fwd/bwd registrations, because pytorch's version reports the wrong tensor meta for (out, mean, rstd), by coping out's meta to all 3. This creates separate metas for each.
The other issue is layernorm always produces contiguous outputs - fixes this too.
https://github.com/meta-pytorch/autoparallel/blob/454780d2a27456a380c0d8e997c8fc2cf82ef5d8/autoparallel/shardings/propagation_rules.py#L460-L611