improve noop elimination for view by BoyuanFeng · Pull Request #151095 · pytorch/pytorch

BoyuanFeng · 2025-04-11T05:46:29Z

This PR improves noop elimination.

View Noop

>>> torch.Size([1,2,3]) == [1,2,3]
False
>>> torch.Size([1,2,3]) == (1,2,3)
True

So we add tuple(size) in view_noop.

Example:

import torch

@torch.compile()
def f(x):
    batch_size = x.shape[0]
    x = x.transpose(1, 2) # (batch_size, 2, 3)
    x = x.reshape(batch_size, 2, 3) # noop
    return x

x = torch.randn((2,3,2))
f(x)

x = torch.randn((4,3,2))
f(x)

Before:

After:

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-04-11T05:46:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151095

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 14 Unrelated Failures

As of commit 1c70eae with merge base 1f29190 ():

NEW FAILURE - The following job has failed:

trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 2, 5, ephemeral.linux.4xlarge.nvidia.gpu) (gh) (similar failure)
export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 5, 5, ephemeral.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (similar failure)
export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_specialization_cpp_serdes

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 4, 5, ephemeral.linux.4xlarge.nvidia.gpu) (gh) (trunk failure)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 2, 5, ephemeral.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict
pull / linux-focal-py3.13-clang10 / test (default, 1, 5, ephemeral.linux.4xlarge) (gh) (trunk failure)
export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict
pull / linux-focal-py3.13-clang10 / test (default, 5, 5, ephemeral.linux.4xlarge) (gh) (trunk failure)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict
pull / linux-focal-py3.9-clang10 / test (default, 3, 5, ephemeral.linux.4xlarge) (gh) (trunk failure)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 1, 3, ephemeral.linux.2xlarge) (gh) (trunk failure)
'Test'
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, ephemeral.linux.4xlarge) (gh) (trunk failure)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, ephemeral.linux.4xlarge) (gh) (trunk failure)
export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_specialization_cpp_serdes
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, ephemeral.linux.2xlarge) (gh) (trunk failure)
export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict
pull / linux-jammy-py3.9-gcc11 / test (default, 5, 5, ephemeral.linux.2xlarge) (gh) (trunk failure)
export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict
trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (trunk failure)
export/test_export.py::TestExport::test_symint_input_specialization
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (trunk failure)
export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict

This comment was automatically generated by Dr. CI and updates every 15 minutes.

BoyuanFeng · 2025-04-12T19:11:19Z

test/distributed/tensor/parallel/test_micro_pipeline_tp.py

        self.assertEqual(
            all_gathers[2].res_node.target,
-            torch.ops.aten.view.dtype,
+            torch.ops._c10d_functional.wait_tensor.default,


view.dtype is noop eliminated cc @weifengpy

offline discussed with @weifengpy and the change makes sense

BoyuanFeng · 2025-04-15T23:58:43Z

torch/_inductor/fx_passes/fuse_attention.py

    b_inp = functools.partial(torch.empty, (1, 1, 8, 8), device=device)
    m_inp = functools.partial(torch.empty, (2, 1, 1, 4), device=device)
+    # need 2d attn_mask to generate patterns with view op
+    m_inp_2d = functools.partial(torch.empty, (2, 4), device=device)


#113004 added _sfdp_pattern_15, which takes a 4d m_inp (shape: (2,1,1,4)) as attn_mask and calls (attn_mask == 0).view((bs, 1, 1, k_len)). After this pr, the view op will be noop eliminated so the search pattern will not have that view op. In _test_sdpa_rewriter_15, the graph still takes a 2d mask so the view op will not be noop eliminated. This difference prevents pattern match.

The change here still passes a 2d attn_mask so the view op is not noop eliminated in the search pattern.

cc @Valentine233 @jgong5 @leslie-fang-intel

LGTM. Thanks!

eellison

Looks good

eellison · 2025-04-16T16:49:36Z

torch/_inductor/fx_passes/post_grad.py

+def view_default_noop(arg, size):
+    return arg.shape == tuple(size)
+


These should use statically_known_true here. We should not add a guard on arg[0] == size[0] if then arg[1] != size[1].

would be nice to have recursive api for this.

but just

return len(arg.shape) == len(size) and all(statically_known_true(a == b) for zip(arg.shape, size))

works for now.

cc @laithsakka

apparently statically_known_true(sym_eq(ls1,ls2)) does this - thx u @laithsakka

BoyuanFeng · 2025-04-16T18:05:37Z

torch/_inductor/fx_passes/post_grad.py

-    return arg.shape == size
+@register_noop_decomp(aten.view.default)
+def view_default_noop(arg, size):
+    return statically_known_true(sym_eq(arg.shape, tuple(size)))


arg.shape is a tuple but not a list. sym_eq returns False if one input is tuple and another input is list.

Did this used to specialize on size?

BoyuanFeng · 2025-04-16T23:53:39Z

@pytorchbot merge -f "skip unrelated export failure"

pytorchmergebot · 2025-04-16T23:55:14Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR improves noop elimination. ### View Noop ```python >>> torch.Size([1,2,3]) == [1,2,3] False >>> torch.Size([1,2,3]) == (1,2,3) True ``` So we add `tuple(size)` in `view_noop`. Example: ```python import torch @torch.compile() def f(x): batch_size = x.shape[0] x = x.transpose(1, 2) # (batch_size, 2, 3) x = x.reshape(batch_size, 2, 3) # noop return x x = torch.randn((2,3,2)) f(x) x = torch.randn((4,3,2)) f(x) ``` Before: ![image](https://github.com/user-attachments/assets/be488881-6c99-43a9-b088-fa481f675775) After: ![image](https://github.com/user-attachments/assets/6d93be3d-128b-44d4-ad6a-d3d18e272329) Pull Request resolved: pytorch#151095 Approved by: https://github.com/eellison

improve view_noop

bc549aa

BoyuanFeng added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category module: inductor labels Apr 11, 2025

pytorch-bot bot added the ciflow/inductor label Apr 11, 2025

BoyuanFeng marked this pull request as draft April 11, 2025 16:37

BoyuanFeng added 3 commits April 11, 2025 11:01

noop for aten.view.dtype

89eafa8

update test

49e2544

Merge branch 'main' into bf/noop-elimination

938cc6f

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 12, 2025

BoyuanFeng commented Apr 12, 2025

View reviewed changes

update serilization

e70e5e7

BoyuanFeng changed the title ~~improve noop elimination~~ improve noop elimination for view Apr 13, 2025

BoyuanFeng added 4 commits April 14, 2025 10:46

Merge branch 'main' into bf/noop-elimination

342cd4c

fix sfdp_pattern 15 & 17

2dc37b0

Merge branch 'main' into bf/noop-elimination

f940f7c

nit

f6291e3

BoyuanFeng commented Apr 15, 2025

View reviewed changes

BoyuanFeng marked this pull request as ready for review April 16, 2025 03:51

BoyuanFeng requested review from eellison and zou3519 April 16, 2025 16:42

eellison approved these changes Apr 16, 2025

View reviewed changes

BoyuanFeng added 2 commits April 16, 2025 11:02

use statically_known_true to avoid guard

ee28fed

Merge branch 'main' into bf/noop-elimination

1c70eae

BoyuanFeng commented Apr 16, 2025

View reviewed changes

pytorchmergebot added the merging label Apr 16, 2025

pytorchmergebot closed this in 2fd2692 Apr 16, 2025

pytorchmergebot added Merged and removed merging labels Apr 16, 2025

This was referenced Apr 23, 2025

[Bug]: noop elimination for slice errors when end = -1 vllm-project/vllm#17078

Closed

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass vllm-project/vllm#10902

Merged

github-actions bot deleted the bf/noop-elimination branch May 28, 2025 02:18

BoyuanFeng mentioned this pull request Oct 17, 2025

[torch.compile] Enable attention and allreduce fusion without custom ops enabled vllm-project/vllm#24604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve noop elimination for view#151095

improve noop elimination for view#151095
BoyuanFeng wants to merge 11 commits intomainfrom
bf/noop-elimination

BoyuanFeng commented Apr 11, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Apr 11, 2025 •

edited

Loading

Uh oh!

BoyuanFeng Apr 12, 2025

Uh oh!

BoyuanFeng Apr 15, 2025

Uh oh!

BoyuanFeng Apr 15, 2025

Uh oh!

Valentine233 Apr 16, 2025

Uh oh!

eellison left a comment

Uh oh!

eellison Apr 16, 2025

Uh oh!

eellison Apr 16, 2025

Uh oh!

BoyuanFeng Apr 16, 2025

Uh oh!

zou3519 Apr 17, 2025

Uh oh!

BoyuanFeng commented Apr 16, 2025

Uh oh!

pytorchmergebot commented Apr 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		def view_default_noop(arg, size):
		return arg.shape == tuple(size)

Conversation

BoyuanFeng commented Apr 11, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

View Noop

Uh oh!

pytorch-bot bot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151095

❌ 1 New Failure, 1 Pending, 14 Unrelated Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BoyuanFeng commented Apr 16, 2025

Uh oh!

pytorchmergebot commented Apr 16, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BoyuanFeng commented Apr 11, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 11, 2025 •

edited

Loading