Skip to content

Expand on autotune support for persistent reduction kernels#162056

Closed
jataylo wants to merge 10 commits intopytorch:mainfrom
jataylo:jack-per-reduction
Closed

Expand on autotune support for persistent reduction kernels#162056
jataylo wants to merge 10 commits intopytorch:mainfrom
jataylo:jack-per-reduction

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Sep 3, 2025

After the removal of want_no_x_dim for persistent reduction kernels, we can improve the autotuning setup for persistent reduction kernels.

Currently even with tuning enable, filtering will only try a single config in many cases. Avoid filtering with autotune mode, and override MAX_BLOCK limit. Also we always include tiny_config when autotuning is enabled.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162056

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 Cancelled Job, 3 Unrelated Failures

As of commit 8cb82ef with merge base 8d81564 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jataylo jataylo marked this pull request as draft September 3, 2025 10:33
@jataylo jataylo added ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/rocm-mi355 Trigger "default" config CI on ROCm MI355 runners labels Sep 3, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 3, 2025

Warning: Unknown label ciflow/rocm-mi355.
Currently recognized labels are

  • ciflow/binaries
  • ciflow/binaries_libtorch
  • ciflow/binaries_wheel
  • ciflow/triton_binaries
  • ciflow/inductor
  • ciflow/inductor-periodic
  • ciflow/inductor-rocm
  • ciflow/inductor-perf-test-nightly-rocm
  • ciflow/inductor-perf-compare
  • ciflow/inductor-micro-benchmark
  • ciflow/inductor-micro-benchmark-cpu-x86
  • ciflow/inductor-perf-test-nightly-x86-zen
  • ciflow/inductor-cu126
  • ciflow/linux-aarch64
  • ciflow/mps
  • ciflow/nightly
  • ciflow/periodic
  • ciflow/periodic-rocm-mi300
  • ciflow/rocm
  • ciflow/rocm-mi300
  • ciflow/s390
  • ciflow/riscv64
  • ciflow/slow
  • ciflow/trunk
  • ciflow/unstable
  • ciflow/xpu
  • ciflow/vllm
  • ciflow/torchbench
  • ciflow/op-benchmark
  • ciflow/pull
  • ciflow/h100
  • ciflow/h100-distributed
  • ciflow/win-arm64
  • ciflow/h100-symm-mem
  • ciflow/h100-cutlass-backend

Please add the new label to .github/pytorch-probot.yml

@facebook-github-bot
Copy link
Contributor

@haoyuz has imported this pull request. If you are a Meta employee, you can view this in D82321855.

@jataylo jataylo changed the title Autotuning support for persistent reduction kernels Expand on autotune support for persistent reduction kernels Sep 23, 2025
@jataylo
Copy link
Collaborator Author

jataylo commented Sep 23, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased jack-per-reduction onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jack-per-reduction && git pull --rebase)

@naromero77amd
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/162056/head returned non-zero exit code 1

Rebasing (1/10)
Auto-merging torch/_inductor/runtime/triton_heuristics.py
CONFLICT (content): Merge conflict in torch/_inductor/runtime/triton_heuristics.py
error: could not apply c2ade625d7d... Autotuning support for persistent reduction kernels
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply c2ade625d7d... # Autotuning support for persistent reduction kernels

Raised by https://github.com/pytorch/pytorch/actions/runs/18022544434

@naromero77amd
Copy link
Collaborator

PR was recreated here: #163908

and has already been merged upstream. Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/rocm-mi355 Trigger "default" config CI on ROCm MI355 runners module: inductor open source release notes: inductor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants