[ROCm] Skip test_index on MI300X due to timeout by c0de128 · Pull Request #173181 · pytorch/pytorch

c0de128 · 2026-01-23T15:44:41Z

Summary

Skip DistTensorOpsTest.test_index on MI300 architecture (gfx942) in addition to MI200. The test times out after 300 seconds on MI300X.

Fixes #171119

Background

The test was disabled after #171051 updated the slow tests list. The test already had @skipIfRocmArch(MI200_ARCH) decorator; this PR extends it to also skip on MI300X (MI300_ARCH) until the underlying performance issue is resolved.

Changes

Added MI300_ARCH import to test file
Extended @skipIfRocmArch(MI200_ARCH) to @skipIfRocmArch(MI200_ARCH + MI300_ARCH)

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

PR authored with assistance from Claude.

pytorch-bot · 2026-01-23T15:44:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173181

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b0367b7 with merge base c7e67ec ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

c0de128 · 2026-01-23T15:44:47Z

@pytorchbot label 'module: rocm' 'topic: not user facing'

linux-foundation-easycla · 2026-01-23T15:44:52Z

The committers listed above are authorized under a signed CLA.

✅ login: c0de128 / name: Kevin McKay (b0367b7)

Skip DistTensorOpsTest.test_index on MI300 architecture (gfx942) in addition to MI200. The test times out after 300 seconds on MI300X, similar to the existing MI200 skip. Fixes pytorch#171119 Signed-off-by: c0de128 <kevin.mckay@outlook.com>

c0de128 · 2026-01-24T02:41:02Z

@jeffdaily @sunway513 Could you approve CI for this ROCm fix? It addresses issue #171119 - extends test_index skip to MI300X (test times out after 300s). Thanks!

It's timing out because it's moved out of slow test #171051 some device disabled test_index already, just not cuda device: #173181 from claude Root Cause The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking >10 minutes for a single test, with the full suite never completing). Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest and once in DistTensorOpsTestWithLocalTensor. Breakdown of combinations per call: - 2-tensor calls: 8-16 combinations each (76 total) — reasonable - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products Fix Reduced the 3-tensor _test_op calls from 8 to 2 representative ones: 1. x[z, y] — basic multi-index (64 combinations) 2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations) This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the full suite from never-completing to ~11 minutes. [ghstack-poisoned]

It's timing out because it's moved out of slow test #171051 some device disabled test_index already, just not cuda device: #173181 from claude Root Cause The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking >10 minutes for a single test, with the full suite never completing). Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest and once in DistTensorOpsTestWithLocalTensor. Breakdown of combinations per call: - 2-tensor calls: 8-16 combinations each (76 total) — reasonable - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products Fix Reduced the 3-tensor _test_op calls from 8 to 2 representative ones: 1. x[z, y] — basic multi-index (64 combinations) 2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations) This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the full suite from never-completing to ~11 minutes. Pull Request resolved: #175030 Approved by: https://github.com/wconstab

c0de128 · 2026-02-24T15:23:10Z

Closing — no maintainer engagement after 4+ weeks.

It's timing out because it's moved out of slow test pytorch#171051 some device disabled test_index already, just not cuda device: pytorch#173181 from claude Root Cause The test_index method in test/distributed/tensor/test_tensor_ops.py:623 was causing the test suite to hang (taking >10 minutes for a single test, with the full suite never completing). Why: test_index made 15 calls to _test_op, which uses DTensorConverter to generate all possible sharding placement combinations via itertools.product. The 8 three-tensor calls (lines 672-729) each generated 40-80 combinations, for a total of ~504 combinations out of 564. Each combination requires multiple NCCL collective operations (distribute_tensor + full_tensor), making the test extremely slow. The test runs twice — once in DistTensorOpsTest and once in DistTensorOpsTestWithLocalTensor. Breakdown of combinations per call: - 2-tensor calls: 8-16 combinations each (76 total) — reasonable - 3-tensor calls: 40-80 combinations each (504 total) — combinatorial explosion from 4×4×4=64 or 5×4×4=80 products Fix Reduced the 3-tensor _test_op calls from 8 to 2 representative ones: 1. x[z, y] — basic multi-index (64 combinations) 2. x[:, z, :, y] with broadcast — covers 4D tensor + broadcast pattern (60 combinations) This reduces total combinations from 564 to ~200, bringing test_index from >10 minutes down to ~2 minutes, and the full suite from never-completing to ~11 minutes. Pull Request resolved: pytorch#175030 Approved by: https://github.com/wconstab

pytorch-bot Bot added module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Jan 23, 2026

pytorchbot added the open source label Jan 23, 2026

c0de128 force-pushed the fix/rocm-skip-test-index-mi300 branch from 2e6dc23 to b0367b7 Compare January 23, 2026 15:59

bdhirsh requested a review from jeffdaily January 26, 2026 17:06

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 26, 2026

c0de128 mentioned this pull request Feb 5, 2026

[ROCm] Fix flash attention meta kernel logsumexp shape for HIP #172911

Closed

weifengpy mentioned this pull request Feb 14, 2026

[Dist][CI] fix distributed timeout #175030

Closed

c0de128 closed this Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Skip test_index on MI300X due to timeout#173181

[ROCm] Skip test_index on MI300X due to timeout#173181
c0de128 wants to merge 1 commit intopytorch:mainfrom
c0de128:fix/rocm-skip-test-index-mi300

c0de128 commented Jan 23, 2026

Uh oh!

pytorch-bot Bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

c0de128 commented Jan 23, 2026

Uh oh!

linux-foundation-easycla Bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

c0de128 commented Jan 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

c0de128 commented Jan 23, 2026

Summary

Background

Changes

Uh oh!

pytorch-bot Bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173181

✅ No Failures

Uh oh!

c0de128 commented Jan 23, 2026

Uh oh!

linux-foundation-easycla Bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

c0de128 commented Jan 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented Jan 23, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jan 23, 2026 •

edited

Loading