[Inductor] Fix assertion failure in efficient_conv_bn_eval for default args#169126
[Inductor] Fix assertion failure in efficient_conv_bn_eval for default args#169126dumko2001 wants to merge 7 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169126
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ You can merge normally! (2 Unrelated Failures), 1 Unclassified FailureAs of commit 35c7ad0 with merge base da74fec ( UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: bug fix (inductor)" |
|
Didn't find following labels among repository labels: release notes: bug fix (inductor) |
|
@pytorchbot label "release notes: inductor" |
jansel
left a comment
There was a problem hiding this comment.
Could you add a test for what this is fixing?
Hey I have pushed a commit |
|
@jansel I have added a regression test |
|
Looks like lints are failing |
|
@jansel fixed lint issues, could u run checks again |
f2b5b72 to
5e22252
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
|
|
|
@claude explain those CI failures, are they related? |
|
Claude finished @jansel's task in 1m 56s —— View job Investigating CI Failures
DiagnosisNeither of the CI failures is related to this PR. This PR only modifies 1. This is a distributed test ( The test was retried and timed out again identically. This is a gloo 2. The bot already identified this as flaky with a similar failure on trunk (link). This is a meta kernel registration test for MKLDNN RNN, also completely unrelated to the inductor conv-bn eval pass. Summary: Both failures are unrelated infrastructure/flaky issues affecting distributed (ROCm gloo test_isend timeout) and MKLDNN (meta kernel test). The PR's own tests are not shown as failing. Safe to merge. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
|
@claude explain those CI failures, are they related? |
|
Claude finished @jansel's task in 2m 7s —— View job Investigating CI Failures
DiagnosisNeither of the 2 flagged CI issues is related to this PR. This PR only modifies 1. This is a docs build job. Reviewing the full log shows that Sphinx successfully built and committed all Python docs (5405 files changed, committed as "Generate Python docs from 78c5c55..."). The exit code 1 appears to come from a post-build infrastructure step, not from any documentation error related to this PR's changes. No convolution or batch norm documentation is involved. 2. This is a free-threaded Python 3.14t dynamo_wrapped test job. The bot explicitly notes: "this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing." Reviewing the actual test log shows the Summary: Both issues are unrelated infrastructure/platform problems. The efficient conv-bn eval tests themselves are not shown as failing. Safe to merge. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
|
@claude explain those CI failures, are they related? |
|
Claude finished @jansel's task in 2m 18s —— View job Investigating CI Failures
DiagnosisNeither of the 2 CI failures is related to this PR. This PR only modifies 1. This is a distributed gloo test ( 2. This is an ONNX operator test that fails with a stride mismatch in The test verifies that the ONNX Summary: Both failures are unrelated to this PR. The first is a ROCm distributed gloo flake, and the second is a pre-existing ONNX rotary embedding opcheck bug. The PR's own efficient conv-bn eval tests are not shown as failing. Safe to merge. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
- Remove duplicate @inductor_config.patch decorator in test - Remove commented assertion and unnecessary callable check - Use local normalized_args variable instead of mutating bn_node.args - Add docstring to satisfy linter
|
Successfully rebased |
|
@claude are those failures related? |
|
Claude finished @jansel's task in 1m 52s —— View job Investigating CI Failures
DiagnosisNeither of the 2 CI failures is related to this PR. This PR only modifies 1. This is a fused attention SDPA pattern rewriter test ( 2. This is an ONNX operator test ( Summary: Both failures are unrelated. The first is an |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 3 checks: pull / linux-jammy-py3.10-clang18 / test (onnx, 1, 1, linux.4xlarge), trunk / linux-jammy-aarch64-py3.10 / test (default, 2, 5, linux.arm64.m7g.4xlarge), trunk / linux-jammy-rocm-py3.10-mi355 / test (default, 8, 10, linux.rocm.gpu.gfx950.1) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
The efficient_conv_bn_eval pass previously asserted that
batch_normnodes have exactly 8 arguments. This caused a crash whenbatch_normwas called with default arguments (e.g., only 3 positional args), which is valid in FX graphs.This PR replaces the brittle assertion with
inspect.signaturebinding to correctly normalize arguments and fill defaults, ensuring the pass works for all validbatch_normcalls.Test Plan:
F.batch_norm(input, mean, var)(3 args).python test/inductor/test_efficient_conv_bn_eval.py(passed).Fixes torch.compile Assertion failure in efficient_conv_bn_eval_graph_transform_inlined when compiling FX graph with F.batch_norm (len(args)==3) #169011
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela @azahed98 @mlazos