Log autotune choices and benchmark result to scuba/chrome trace#159496

Closed

wychi wants to merge 1 commit intopytorch:mainfrom

wychi:export-D79235037

Contributor

wychi commented Jul 30, 2025 •

edited

Loading

Summary:
Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.

Example

Event: mm_template_autotuning
Column: autotune_choices

{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}

Test Plan:

TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS =1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt

Rollback Plan:

Differential Revision: D79235037

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot bot commented Jul 30, 2025

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI. @wychi, please do step 2 of internal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

linux-foundation-easycla bot commented Jul 30, 2025 •

edited

Loading

The committers listed above are authorized under a signed CLA.

✅ login: wychi / name: wychi (b39d6c9)

pytorch-bot bot added the module: inductor label

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159496

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit b39d6c9 with merge base e2ee9cf ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
Process completed with exit code 1.
inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
Process completed with exit code 1.

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Contributor

facebook-github-bot commented Jul 30, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

facebook-github-bot added the fb-exported label

wychi force-pushed the export-D79235037 branch from 4ae0eed to f1ca06e Compare

July 30, 2025 18:49

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

f1ca06e

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 19,
  "num_triton_choices": 18,
  "best_time": 0.03200000151991844,
  "best_triton_pos": 1,
  "best_triton_time": 0.3172159940004349
}
```

Test Plan:
```
TORCHINDUCTOR_REPORT_AUTOTUNE_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from f1ca06e to 4ed15a3 Compare

July 30, 2025 18:51

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

4ed15a3

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 19,
  "num_triton_choices": 18,
  "best_time": 0.03200000151991844,
  "best_triton_pos": 1,
  "best_triton_time": 0.3172159940004349
}
```

Test Plan:
```
TORCHINDUCTOR_REPORT_AUTOTUNE_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Differential Revision: D79235037

Contributor

facebook-github-bot commented Jul 30, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

1 similar comment

Contributor

facebook-github-bot commented Jul 30, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

fb06b84

…rch#159496)

Summary:
Pull Request resolved: pytorch#159496

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.

**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 19,
  "num_triton_choices": 18,
  "best_time": 0.03200000151991844,
  "best_triton_pos": 1,
  "best_triton_time": 0.3172159940004349
}
```

Test Plan:
```
TORCHINDUCTOR_REPORT_AUTOTUNE_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from 4ed15a3 to fb06b84 Compare

July 30, 2025 18:53

Contributor Author

wychi commented Jul 30, 2025

@pytorchbot label "topic: not user facing"

pytorch-bot bot added the topic: not user facing label

stashuk-olek self-assigned this

stashuk-olek requested review from eellison and masnesral and removed request for masnesral

July 31, 2025 17:30

wychi force-pushed the export-D79235037 branch from fb06b84 to 9422c76 Compare

July 31, 2025 17:34

pytorch-bot bot added the ciflow/inductor label

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

9422c76

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 19,
  "num_triton_choices": 18,
  "best_time": 0.03200000151991844,
  "best_triton_pos": 1,
  "best_triton_time": 0.3172159940004349
}
```

Test Plan:
```
TORCHINDUCTOR_REPORT_AUTOTUNE_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: stashuk-olek

Differential Revision: D79235037

Contributor

facebook-github-bot commented Jul 31, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

27e8dee

…rch#159496)

Summary:
Pull Request resolved: pytorch#159496

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.

**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 19,
  "num_triton_choices": 18,
  "best_time": 0.03200000151991844,
  "best_triton_pos": 1,
  "best_triton_time": 0.3172159940004349
}
```

Test Plan:
```
TORCHINDUCTOR_REPORT_AUTOTUNE_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch 2 times, most recently from 27e8dee to e24d823 Compare

July 31, 2025 21:10

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

e24d823

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_REPORT_AUTOTUNE_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: stashuk-olek

Differential Revision: D79235037

Contributor

facebook-github-bot commented Jul 31, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

e65ca7e

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from 570e15f to e65ca7e Compare

August 1, 2025 05:24

Contributor

facebook-github-bot commented Aug 1, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

751bf35

…rch#159496)

Summary:
Pull Request resolved: pytorch#159496

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.

**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from e65ca7e to 751bf35 Compare

August 1, 2025 05:28

eellison removed their request for review

August 1, 2025 15:49

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

4e549bf

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from 751bf35 to 4e549bf Compare

August 1, 2025 18:48

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

3ea2326

…rch#159496)

Summary:
Pull Request resolved: pytorch#159496

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.

**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from 4e549bf to 3ea2326 Compare

August 1, 2025 18:52

Contributor

facebook-github-bot commented Aug 1, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

cd853a7

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from 3ea2326 to cd853a7 Compare

August 1, 2025 22:47

Contributor

facebook-github-bot commented Aug 1, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

1a3d169

…rch#159496)

Summary:
Pull Request resolved: pytorch#159496

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.

**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from cd853a7 to 1a3d169 Compare

August 1, 2025 22:51

wychi added a commit to wychi/pytorch that referenced this pull request


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

7e65fd1

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037


          Log autotune choices and benchmark result to scuba/chrome trace (pyto…

b39d6c9

…rch#159496)

Summary:

Report the kernel choices and benchmark data to better understand how kernels are selected and the performance gap between the best kernel (likely a CUDA kernel) and Triton kernels.


**Example**

Event: mm_template_autotuning
Column: autotune_choices

```json
{
  "num_choices": 52,
  "num_triton_choices": 19,
  "best_kernel": "cutlass_f6c25cf2",
  "best_kernel_desc": "cutlass3x_sm90_tensorop_gemm_f16_f16_f32_void_f16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma swizzle=8",
  "best_time": 0.6283040046691895,
  "best_triton_pos": 26,
  "best_triton_time": 0.6832960247993469,
  "best_triton_kernel": "triton_mm_17",
  "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0"
}
```

Test Plan:
```
TORCHINDUCTOR_MAX_AUTOTUNE_REPORT_CHOICES_STATS=1 buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
```

Rollback Plan:

Reviewed By: masnesral, stashuk-olek

Differential Revision: D79235037

wychi force-pushed the export-D79235037 branch from 1a3d169 to b39d6c9 Compare

August 1, 2025 23:02

Contributor

facebook-github-bot commented Aug 1, 2025

This pull request was exported from Phabricator. Differential Revision: D79235037

Contributor Author

wychi commented Aug 1, 2025

@pytorchbot merge this

pytorch-bot bot commented Aug 1, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: this

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

Contributor

facebook-github-bot commented Aug 2, 2025

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Aug 2, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this in

b599d91

pytorchmergebot added Merged and removed merging labels

wychi mentioned this pull request

Log max_autotune exceptions (#159687) #159688

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk fb-exported Merged module: inductor topic: not user facing