[Inductor] Fix assertion failure in efficient_conv_bn_eval for default args by dumko2001 · Pull Request #169126 · pytorch/pytorch

dumko2001 · 2025-11-26T15:25:54Z

Summary:
The efficient_conv_bn_eval pass previously asserted that batch_norm nodes have exactly 8 arguments. This caused a crash when batch_norm was called with default arguments (e.g., only 3 positional args), which is valid in FX graphs.
This PR replaces the brittle assertion with inspect.signature binding to correctly normalize arguments and fill defaults, ensuring the pass works for all valid batch_norm calls.
Test Plan:

Reproduced the crash with a script using F.batch_norm(input, mean, var) (3 args).
Verified the fix with the same script (now passes).
Ran existing tests: python test/inductor/test_efficient_conv_bn_eval.py (passed).
Fixes torch.compile Assertion failure in efficient_conv_bn_eval_graph_transform_inlined when compiling FX graph with F.batch_norm (len(args)==3) #169011

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela @azahed98 @mlazos

pytorch-bot · 2025-11-26T15:25:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169126

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ You can merge normally! (2 Unrelated Failures), 1 Unclassified Failure

As of commit 35c7ad0 with merge base da74fec ():

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

pull / linux-jammy-py3.10-clang18 / test (onnx, 1, 1, linux.4xlarge) (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
test/onnx/ops/test_ops.py::NativeOnnxOpsTest::test_rotary_embedding_opcheck

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-aarch64-py3.10 / test (default, 2, 5, linux.arm64.m7g.4xlarge) (gh) (disabled by #184187, #184211)
test/inductor/test_fused_attention.py::SDPAPatternRewriterCpuTests::test_sdpa_rewriter_14_cpu

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 8, 10, linux.rocm.gpu.gfx950.1) (gh) (trunk failure)
test/inductor/test_select_algorithm.py::TestTemplateRender::test_external_template_prologue_epilogue_fusion

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-11-26T15:26:01Z

The committers listed above are authorized under a signed CLA.

✅ login: dumko2001 / name: dumko2001 (80aef0d, a62caed, 2128cf7, 9142b21, cc60667, db0c8da, e5579b5)

dumko2001 · 2025-12-04T16:33:54Z

@pytorchbot label "release notes: bug fix (inductor)"

pytorch-bot · 2025-12-04T16:34:03Z

Didn't find following labels among repository labels: release notes: bug fix (inductor)

dumko2001 · 2025-12-04T16:34:53Z

@pytorchbot label "release notes: inductor"

jansel

Could you add a test for what this is fixing?

dumko2001 · 2025-12-06T04:45:59Z

Could you add a test for what this is fixing?

Hey I have pushed a commit

dumko2001 · 2025-12-07T08:47:57Z

@jansel I have added a regression test test_functional_batch_norm_defaults in test/inductor/test_efficient_conv_bn_eval.py that specifically targets the 3-argument F.batch_norm case.

jansel · 2025-12-09T05:06:54Z

Looks like lints are failing

dumko2001 · 2025-12-09T10:31:49Z

@jansel fixed lint issues, could u run checks again

jansel

Fix failing lints/tests

jansel · 2026-05-09T00:17:45Z

@pytorchbot rebase

pytorchmergebot · 2026-05-09T00:19:51Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2026-05-09T00:19:56Z

Successfully rebased fix/issue-169011 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix/issue-169011 && git pull --rebase)

pytorch-bot · 2026-05-09T00:20:02Z

~~Workflows were awaiting approval.~~ CI has now been triggered for the ciflow labels on this PR.

jansel · 2026-05-09T15:59:57Z

@claude explain those CI failures, are they related?

claude · 2026-05-09T16:00:24Z

Claude finished @jansel's task in 1m 56s —— View job

Investigating CI Failures

Check current CI status
Read the changed files to understand the PR scope
Analyze failing tests for relevance to this PR
Provide diagnosis

Diagnosis

Neither of the CI failures is related to this PR. This PR only modifies torch/_inductor/fx_passes/efficient_conv_bn_eval.py and test/inductor/test_efficient_conv_bn_eval.py. Here's the breakdown:

1. trunk / linux-jammy-rocm-py3.10-mi355 / test (distributed, 1, 4) — test_isend timeout

This is a distributed test (test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend) that timed out after 300 seconds on the ROCm/MI355 gloo backend. The error:

RuntimeError: Process 0 terminated or timed out after 310.07577753067017 seconds

The test was retried and timed out again identically. This is a gloo isend hang on the ROCm platform — completely unrelated to the inductor conv-bn fusion pass. No convolutions, batch normalization, or inductor code paths are involved.

2. trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 5, 5) — test_mkldnn_rnn_backward_dtype (FLAKY)

The bot already identified this as flaky with a similar failure on trunk (link). This is a meta kernel registration test for MKLDNN RNN, also completely unrelated to the inductor conv-bn eval pass.

Summary: Both failures are unrelated infrastructure/flaky issues affecting distributed (ROCm gloo test_isend timeout) and MKLDNN (meta kernel test). The PR's own tests are not shown as failing. Safe to merge.

jansel · 2026-05-09T16:21:56Z

@pytorchbot rebase

pytorchmergebot · 2026-05-09T16:23:49Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2026-05-09T16:23:52Z

Successfully rebased fix/issue-169011 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix/issue-169011 && git pull --rebase)

jansel · 2026-05-10T00:08:20Z

@claude explain those CI failures, are they related?

claude · 2026-05-10T00:08:46Z

Claude finished @jansel's task in 2m 7s —— View job

Investigating CI Failures

Check current CI status
Analyze failing tests for relevance to this PR
Provide diagnosis

Diagnosis

Neither of the 2 flagged CI issues is related to this PR. This PR only modifies torch/_inductor/fx_passes/efficient_conv_bn_eval.py and test/inductor/test_efficient_conv_bn_eval.py. Here's the breakdown:

1. pull / linux-docs / build-docs-python-false — "Process completed with exit code 1"

This is a docs build job. Reviewing the full log shows that Sphinx successfully built and committed all Python docs (5405 files changed, committed as "Generate Python docs from 78c5c55..."). The exit code 1 appears to come from a post-build infrastructure step, not from any documentation error related to this PR's changes. No convolution or batch norm documentation is involved.

2. pull / linux-jammy-py3.14t-clang18 / test (dynamo_wrapped, 1, 3) — "ValueError: too many values to unpack (expected 2, got 3)"

This is a free-threaded Python 3.14t dynamo_wrapped test job. The bot explicitly notes: "this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing." Reviewing the actual test log shows the test_nn.py tests are running and passing normally. The ValueError is from a different subsystem — the dynamo_wrapped testing infrastructure on the experimental py3.14t build. No inductor conv-bn eval code is exercised in these test_nn.py tests.

Summary: Both issues are unrelated infrastructure/platform problems. The efficient conv-bn eval tests themselves are not shown as failing. Safe to merge.

jansel · 2026-05-10T00:40:22Z

@pytorchbot rebase

pytorchmergebot · 2026-05-10T00:42:09Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2026-05-10T00:42:12Z

Successfully rebased fix/issue-169011 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix/issue-169011 && git pull --rebase)

jansel · 2026-05-10T15:38:28Z

@claude explain those CI failures, are they related?

claude · 2026-05-10T15:38:56Z

Claude finished @jansel's task in 2m 18s —— View job

Investigating CI Failures

Check current CI status
Analyze failing tests for relevance to this PR
Provide diagnosis

Diagnosis

Neither of the 2 CI failures is related to this PR. This PR only modifies torch/_inductor/fx_passes/efficient_conv_bn_eval.py and test/inductor/test_efficient_conv_bn_eval.py. Here's the breakdown:

1. trunk / linux-jammy-rocm-py3.10-mi355 / test (distributed, 2, 4) — test_broadcast_basics

This is a distributed gloo test (test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics) on the ROCm/MI355 platform. The log shows all other gloo tests (allgather, allreduce, alltoall, broadcast_stress, etc.) passing normally in the same job. The test_broadcast_basics failure is a gloo process group test — completely unrelated to the inductor conv-bn fusion pass. No convolutions, batch normalization, or inductor code paths are involved.

2. pull / linux-jammy-py3.10-clang18 / test (onnx, 1, 1) — test_rotary_embedding_opcheck

This is an ONNX operator test that fails with a stride mismatch in onnx.RotaryEmbedding.opset23:

MetadataMismatchError: Stride mismatch! Strides are (96, 8, 24, 1) and (96, 32, 8, 1) (mismatched at 1)!

The test verifies that the ONNX RotaryEmbedding custom op's FakeTensor implementation matches its concrete implementation's tensor metadata. This is an ONNX-specific issue with rotary embedding stride computation — completely unrelated to the inductor conv-bn eval pass. The bot notes this job "did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing."

Summary: Both failures are unrelated to this PR. The first is a ROCm distributed gloo flake, and the second is a pre-existing ONNX rotary embedding opcheck bug. The PR's own efficient conv-bn eval tests are not shown as failing. Safe to merge.

jansel · 2026-05-17T19:48:04Z

@pytorchbot rebase

pytorchmergebot · 2026-05-17T19:50:04Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

…_eval

- Remove duplicate @inductor_config.patch decorator in test - Remove commented assertion and unnecessary callable check - Use local normalized_args variable instead of mutating bn_node.args - Add docstring to satisfy linter

pytorchmergebot · 2026-05-17T19:50:09Z

Successfully rebased fix/issue-169011 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix/issue-169011 && git pull --rebase)

jansel · 2026-05-18T20:31:02Z

@claude are those failures related?

claude · 2026-05-18T20:48:42Z

Claude finished @jansel's task in 1m 52s —— View job

Investigating CI Failures

Check current CI status
Analyze failing tests for relevance to this PR
Provide diagnosis

Diagnosis

Neither of the 2 CI failures is related to this PR. This PR only modifies torch/_inductor/fx_passes/efficient_conv_bn_eval.py and test/inductor/test_efficient_conv_bn_eval.py. Here's the breakdown:

1. trunk / linux-jammy-aarch64-py3.10 / test (default, 2, 5) — test_sdpa_rewriter_14_cpu

This is a fused attention SDPA pattern rewriter test (test/inductor/test_fused_attention.py::SDPAPatternRewriterCpuTests::test_sdpa_rewriter_14_cpu) that fails with "Unexpected success". This means the test was decorated with @expectedFailure (or similar skip-on-failure marker) on aarch64, but is now passing — so the framework reports it as a failure. This is completely unrelated to the conv-bn eval pass. It's an SDPA attention rewriter test, not a conv-BN fusion test. The DrCI bot also flags this as unclassifiable because "this job did not run on the merge base."

2. pull / linux-jammy-py3.10-clang18 / test (onnx, 1, 1) — test_rotary_embedding_opcheck

This is an ONNX operator test (test/onnx/ops/test_ops.py::NativeOnnxOpsTest::test_rotary_embedding_opcheck) that tests the onnx.RotaryEmbedding custom op. This has appeared as an unclassified failure in previous CI runs on this PR as well (same failure was noted in the May 10th CI run). It has no connection to the inductor conv-bn eval pass.

Summary: Both failures are unrelated. The first is an expectedFailure test that now passes on aarch64 (a trunk-level issue), and the second is a pre-existing ONNX rotary embedding opcheck failure. The PR's own efficient conv-bn eval tests are not shown as failing. Safe to merge.

jansel · 2026-05-19T01:43:55Z

@pytorchbot merge -i

pytorchmergebot · 2026-05-19T01:46:36Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-jammy-py3.10-clang18 / test (onnx, 1, 1, linux.4xlarge), trunk / linux-jammy-aarch64-py3.10 / test (default, 2, 5, linux.arm64.m7g.4xlarge), trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 8, 10, linux.rocm.gpu.gfx950.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot Bot added the module: inductor label Nov 26, 2025

dumko2001 changed the title ~~Fix assertion failure in efficient_conv_bn_eval for default args~~ [Inductor] Fix assertion failure in efficient_conv_bn_eval for default args Nov 26, 2025

pytorchbot added the open source label Nov 26, 2025

soulitzer requested review from eellison and jansel December 4, 2025 16:17

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 4, 2025

pytorch-bot Bot added the release notes: inductor label Dec 4, 2025

jansel requested changes Dec 6, 2025

View reviewed changes

jansel added the ciflow/inductor label Dec 9, 2025

pytorch-bot Bot removed the ciflow/inductor label Dec 9, 2025

jansel added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Dec 12, 2025

jansel requested changes Dec 12, 2025

View reviewed changes

pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Dec 18, 2025

dumko2001 force-pushed the fix/issue-169011 branch from f2b5b72 to 5e22252 Compare December 18, 2025 17:11

dumko2001 requested a review from jansel December 18, 2025 17:13

jansel added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Dec 18, 2025

pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Dec 19, 2025

dumko2001 and others added 7 commits May 17, 2026 19:50

Fix assertion failure in efficient_conv_bn_eval for default args

f6bbbc2

Add regression test for issue pytorch#169011

34823d4

Fix Pyrefly lint error: assert callable(bn_node.target)

80519e1

Fix lint error: replace assert with safety check in efficient_conv_bn…

e782499

…_eval

Fix PYFMT lint: add blank line between imports

6408715

Add FX graph regression test for issue pytorch#169011

35c7ad0

Conversation

dumko2001 commented Nov 26, 2025 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169126

❗ 1 Active SEVs

❌ You can merge normally! (2 Unrelated Failures), 1 Unclassified Failure

Uh oh!

linux-foundation-easycla Bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dumko2001 commented Dec 4, 2025

Uh oh!

pytorch-bot Bot commented Dec 4, 2025

Uh oh!

dumko2001 commented Dec 4, 2025

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

dumko2001 commented Dec 6, 2025

Uh oh!

dumko2001 commented Dec 7, 2025

Uh oh!

jansel commented Dec 9, 2025

Uh oh!

dumko2001 commented Dec 9, 2025

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

jansel commented May 9, 2026

Uh oh!

pytorchmergebot commented May 9, 2026

Uh oh!

pytorchmergebot commented May 9, 2026

Uh oh!

pytorch-bot Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jansel commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Investigating CI Failures

Diagnosis

Uh oh!

jansel commented May 9, 2026

Uh oh!

pytorchmergebot commented May 9, 2026

Uh oh!

pytorchmergebot commented May 9, 2026

Uh oh!

jansel commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Investigating CI Failures

Diagnosis

Uh oh!

jansel commented May 10, 2026

Uh oh!

pytorchmergebot commented May 10, 2026

Uh oh!

pytorchmergebot commented May 10, 2026

Uh oh!

jansel commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Investigating CI Failures

Diagnosis

Uh oh!

jansel commented May 17, 2026

Uh oh!

pytorchmergebot commented May 17, 2026

Uh oh!

dumko2001 commented Nov 26, 2025 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Nov 26, 2025 •

edited

Loading

linux-foundation-easycla Bot commented Nov 26, 2025 •

edited

Loading

pytorch-bot Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 10, 2026 •

edited

Loading

claude Bot commented May 10, 2026 •

edited

Loading

claude Bot commented May 18, 2026 •

edited

Loading