Skip to content

Update slow tests#171051

Closed
pytorchupdatebot wants to merge 1 commit intomainfrom
update_slow_tests_1766389370
Closed

Update slow tests#171051
pytorchupdatebot wants to merge 1 commit intomainfrom
update_slow_tests_1766389370

Conversation

@pytorchupdatebot
Copy link
Copy Markdown
Collaborator

This PR is auto-generated weekly by this action.
Update the list of slow tests.

@pytorch-bot pytorch-bot Bot added the ci-no-td Do not run TD on this PR label Dec 22, 2025
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Dec 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171051

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ebf9f80 with merge base 5c61c25 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorchbot
Copy link
Copy Markdown
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 22, 2025
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

xgz2 pushed a commit that referenced this pull request Dec 22, 2025
This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml).
Update the list of slow tests.
Pull Request resolved: #171051
Approved by: https://github.com/pytorchbot
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml).
Update the list of slow tests.
Pull Request resolved: pytorch#171051
Approved by: https://github.com/pytorchbot
@github-actions github-actions Bot deleted the update_slow_tests_1766389370 branch January 22, 2026 02:20
weifengpy added a commit that referenced this pull request Feb 14, 2026
It's timing out because it's moved out of slow test #171051

some device disabled test_index already, just not cuda device: #173181


from claude

  Root Cause

  The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking
  >10 minutes for a single test, with the full suite never completing).

  Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement
  combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for
  a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations
  (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest
  and once in DistTensorOpsTestWithLocalTensor.

  Breakdown of combinations per call:
  - 2-tensor calls: 8-16 combinations each (76 total) — reasonable
  - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products

  Fix

  Reduced the 3-tensor _test_op calls from 8 to 2 representative ones:
  1. x[z, y] — basic multi-index (64 combinations)
  2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations)

  This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the
  full suite from never-completing to ~11 minutes.

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Feb 14, 2026
It's timing out because it's moved out of slow test #171051

some device disabled test_index already, just not cuda device: #173181


from claude

  Root Cause

  The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking
  >10 minutes for a single test, with the full suite never completing).

  Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement
  combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for
  a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations
  (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest
  and once in DistTensorOpsTestWithLocalTensor.

  Breakdown of combinations per call:
  - 2-tensor calls: 8-16 combinations each (76 total) — reasonable
  - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products

  Fix

  Reduced the 3-tensor _test_op calls from 8 to 2 representative ones:
  1. x[z, y] — basic multi-index (64 combinations)
  2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations)

  This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the
  full suite from never-completing to ~11 minutes.

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 17, 2026
It's timing out because it's moved out of slow test #171051

some device disabled test_index already, just not cuda device: #173181

from claude

  Root Cause

  The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking
  >10 minutes for a single test, with the full suite never completing).

  Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement
  combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for
  a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations
  (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest
  and once in DistTensorOpsTestWithLocalTensor.

  Breakdown of combinations per call:
  - 2-tensor calls: 8-16 combinations each (76 total) — reasonable
  - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products

  Fix

  Reduced the 3-tensor _test_op calls from 8 to 2 representative ones:
  1. x[z, y] — basic multi-index (64 combinations)
  2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations)

  This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the
  full suite from never-completing to ~11 minutes.
Pull Request resolved: #175030
Approved by: https://github.com/wconstab
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
It's timing out because it's moved out of slow test pytorch#171051

some device disabled test_index already, just not cuda device: pytorch#173181

from claude

  Root Cause

  The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking
  >10 minutes for a single test, with the full suite never completing).

  Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement
  combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for
  a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations
  (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest
  and once in DistTensorOpsTestWithLocalTensor.

  Breakdown of combinations per call:
  - 2-tensor calls: 8-16 combinations each (76 total) — reasonable
  - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products

  Fix

  Reduced the 3-tensor _test_op calls from 8 to 2 representative ones:
  1. x[z, y] — basic multi-index (64 combinations)
  2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations)

  This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the
  full suite from never-completing to ~11 minutes.
Pull Request resolved: pytorch#175030
Approved by: https://github.com/wconstab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/slow ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants