[Reland][Inductor] Prune configs that require more shared memory than the hardware limit. by wychi · Pull Request #161996 · pytorch/pytorch

wychi · 2025-09-02T19:09:15Z

Summary:
This is a re-land of PR161040, which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs.

This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error:

No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help.

Test Plan:

pytest test/inductor/test_max_autotune.py
pytest test/inductor/test_triton_heuristics.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

… the hardware limit Summary: This is a re-land of [PR161040](#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs. This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error: ``` No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help. ``` Test Plan: ``` pytest test/inductor/test_max_autotune.py pytest test/inductor/test_triton_heuristics.py ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…memory than the hardware limit" Summary: This is a re-land of [PR161040](#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs. This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error: ``` No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help. ``` Test Plan: ``` pytest test/inductor/test_max_autotune.py pytest test/inductor/test_triton_heuristics.py ``` Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

pytorch-bot · 2025-09-02T19:09:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161996

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://api.github.com/repos/pytorch/pytorch timed out.

✅ You can merge normally! (3 Unrelated Failures)

As of commit a1aa82d with merge base 6737e2c ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Limited CI on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90 / test (smoke, 1, 1, linux.aws.h100) (gh) (similar failure)
test_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_error_messages_cuda
rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) (gh) (similar failure)
'Test'

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh) (trunk failure)
Process completed with exit code 134.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wychi · 2025-09-03T00:21:08Z

@pytorchbot merge

pytorchmergebot · 2025-09-03T00:23:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

… the hardware limit. (pytorch#161996) Summary: This is a re-land of [PR161040](pytorch#161040), which had previously caused test failures on AMD GPUs. The tests are now configured to target only NVIDIA GPUs. This diff removes configurations that exceed the hardware shared memory limit, which causes the following compilation error: ``` No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 327680 Hardware limit:232448 Reducing block sizes or `num_stages` may help. ``` Test Plan: ``` pytest test/inductor/test_max_autotune.py pytest test/inductor/test_triton_heuristics.py ``` Pull Request resolved: pytorch#161996 Approved by: https://github.com/coconutruben

wychi added 4 commits September 2, 2025 12:05

wychi requested review from coconutruben and jeanschmidt September 2, 2025 19:09

pytorch-bot bot added ci-no-td Do not run TD on this PR ciflow/inductor module: inductor labels Sep 2, 2025

wychi added topic: not user facing topic category ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Sep 2, 2025

coconutruben approved these changes Sep 2, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2025

pytorchmergebot added the merging label Sep 3, 2025

pytorchmergebot added the Merged label Sep 3, 2025

pytorchmergebot closed this in 00636e0 Sep 3, 2025

pytorchmergebot removed the merging label Sep 3, 2025

github-actions bot deleted the wychi-autotune-prune-configs-by-shared-mem branch October 4, 2025 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reland][Inductor] Prune configs that require more shared memory than the hardware limit. #161996

[Reland][Inductor] Prune configs that require more shared memory than the hardware limit. #161996
wychi wants to merge 4 commits intomainfrom
wychi-autotune-prune-configs-by-shared-mem

wychi commented Sep 2, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading

Uh oh!

wychi commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wychi commented Sep 2, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161996

❗ 1 Active SEVs

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

wychi commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wychi commented Sep 2, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading