Skip to content

[Reland][Inductor] Prune configs that require more shared memory than the hardware limit. #161996

Closed
wychi wants to merge 4 commits intomainfrom
wychi-autotune-prune-configs-by-shared-mem
Closed

[Reland][Inductor] Prune configs that require more shared memory than the hardware limit. #161996
wychi wants to merge 4 commits intomainfrom
wychi-autotune-prune-configs-by-shared-mem

Conversation

@wychi
Copy link
Contributor

@wychi wychi commented Sep 2, 2025

Summary:
This is a re-land of PR161040, which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:

No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.

Test Plan:

pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

… the hardware limit

Summary:
This is a re-land of [PR161040](#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:
```
No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
```

Test Plan:

```
pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py

```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…memory than the hardware limit"

Summary:
This is a re-land of [PR161040](#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:
```
No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
```

Test Plan:

```
pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py

```

Reviewers:

Subscribers:

Tasks:

Tags:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…memory than the hardware limit"

Summary:
This is a re-land of [PR161040](#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:
```
No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
```

Test Plan:

```
pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py

```

Reviewers:

Subscribers:

Tasks:

Tags:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
…memory than the hardware limit"

Summary:
This is a re-land of [PR161040](#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:
```
No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
```

Test Plan:

```
pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py

```

Reviewers:

Subscribers:

Tasks:

Tags:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161996

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (3 Unrelated Failures)

As of commit a1aa82d with merge base 6737e2c (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@wychi wychi added topic: not user facing topic category ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Sep 2, 2025
@wychi
Copy link
Contributor Author

wychi commented Sep 3, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
… the hardware limit. (pytorch#161996)

Summary:
This is a re-land of [PR161040](pytorch#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:
```
No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
```

Test Plan:
```
pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py
```

Pull Request resolved: pytorch#161996
Approved by: https://github.com/coconutruben
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
… the hardware limit. (pytorch#161996)

Summary:
This is a re-land of [PR161040](pytorch#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:
```
No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
```

Test Plan:
```
pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py
```

Pull Request resolved: pytorch#161996
Approved by: https://github.com/coconutruben
@github-actions github-actions bot deleted the wychi-autotune-prune-configs-by-shared-mem branch October 4, 2025 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants