Skip to content

Fix different seq length#167481

Closed
Microve wants to merge 1 commit intopytorch:mainfrom
Microve:export-D86685546
Closed

Fix different seq length#167481
Microve wants to merge 1 commit intopytorch:mainfrom
Microve:export-D86685546

Conversation

@Microve
Copy link
Contributor

@Microve Microve commented Nov 10, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167481

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 4 Unrelated Failures

As of commit 6e78f15 with merge base 8cf0bdd (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link

meta-codesync bot commented Nov 10, 2025

@Microve has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86685546.

@Microve
Copy link
Contributor Author

Microve commented Nov 11, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 11, 2025
Microve added a commit to Microve/pytorch that referenced this pull request Nov 11, 2025
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev

Differential Revision: D86685546
Microve added a commit to Microve/pytorch that referenced this pull request Nov 11, 2025
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev

Differential Revision: D86685546
Microve added a commit to Microve/pytorch that referenced this pull request Nov 12, 2025
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev, eellison

Differential Revision: D86685546
Microve added a commit to Microve/pytorch that referenced this pull request Nov 12, 2025
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev, eellison

Differential Revision: D86685546
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, need to fix lint

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 12, 2025
pytorch-bot bot pushed a commit that referenced this pull request Nov 12, 2025
…ributed_ranks (#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev, eellison

Differential Revision: D86685546
Microve added a commit to Microve/pytorch that referenced this pull request Nov 12, 2025
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev, eellison

Differential Revision: D86685546
Microve added a commit to Microve/pytorch that referenced this pull request Nov 13, 2025
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev, eellison

Differential Revision: D86685546
…ributed_ranks (pytorch#167481)

Summary:

`align_runtime_estimations_across_all_distributed_ranks` is only needed when there are collectives in the graph. If there are no collectives, since Partitioner may make different decisions on the saved tensors. It could potentially cause failure in `runtime_estimations_align_across_all_distributed_ranks`

Test Plan:
Without this change, the job could fail because the different number of nodes in the backward graph:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-9990706fc5?job_attempt=0&version=0&tab=summary&env=PRODUCTION

tlparse of 16/3 in rank 113: https://fburl.com/3cxltwt4
tlparse of 16/3 in rank 124: https://fburl.com/p1grio09

With this change, it can run without hanging:
https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-aps-fb_him_sfsdp_h100-c2323f01c2?job_attempt=0&version=0&tab=summary&env=PRODUCTION

Reviewed By: IvanKobzarev, eellison

Differential Revision: D86685546
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

@pytorch-auto-revert
Copy link

@pytorchbot revert -m "Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable" -c autorevert

This PR is attributed to have caused regression in:

Please investigate and fix the issues.

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Nov 14, 2025
This reverts commit c78e646.

Reverted #167481 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#167481 (comment)))
@pytorchmergebot
Copy link
Collaborator

@Microve your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 14, 2025
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
Differential Revision: D86685546

Pull Request resolved: pytorch#167481
Approved by: https://github.com/eellison
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
This reverts commit c78e646.

Reverted pytorch#167481 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](pytorch#167481 (comment)))
pytorch-bot bot pushed a commit that referenced this pull request Nov 19, 2025
Differential Revision: D87413883
Microve added a commit to Microve/pytorch that referenced this pull request Nov 19, 2025
Summary: Pull Request resolved: pytorch#168144

Test Plan: Tests are in D86685546

Differential Revision: D87413883
pytorchmergebot pushed a commit that referenced this pull request Nov 20, 2025
Differential Revision: D87413883

Pull Request resolved: #168144
Approved by: https://github.com/eellison
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
Differential Revision: D87413883

Pull Request resolved: #168144
Approved by: https://github.com/eellison
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jan 17, 2026
@github-actions github-actions bot closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants