[aoti-fx] Initial AOTInductor FX by angelayi · Pull Request #160765 · pytorch/pytorch

angelayi · 2025-08-15T18:43:53Z

Stack from ghstack (oldest at bottom):

Using the existing WrapperFxCodegen backend, this PR prototypes an AOT version of it which will directly return a graph module.

How to use:

exported_gm = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes).module()
compiled_gm = torch._inductor.aot_compile(
    exported_gm, inp, options={"fx_wrapper": True, "compile_threads": 1}
)
assert torch.allclose(model(*inp), compiled_gm(*inp))

Example graph:

class GraphModule(torch.nn.Module):
    def forward(self, arg2_1: "f32[3, 3]"):
        # No stacktrace found for following nodes
        linear_weight: "f32[3, 3]" = self.linear_weight
        buf0: "f32[3]" = torch.empty_strided([3], [1], dtype = torch.float32, device = device(type='cuda', index=0))
        triton_kernel_wrapper_mutation = torch.ops.higher_order.triton_kernel_wrapper_mutation(kernel_idx = 0, constant_args_idx = 0, grid = [(1, 1, 1)], tma_descriptor_metadata = {}, kwargs = {'out_ptr0': buf0, 'xnumel': 3, 'XBLOCK': 4});  triton_kernel_wrapper_mutation = None
        buf1: "f32[3, 3]" = torch.empty_strided([3, 3], [3, 1], dtype = torch.float32, device = device(type='cuda', index=0))
        linear_weight_view: "f32[3, 3]" = torch.as_strided(linear_weight, [3, 3], [1, 3], 0);  linear_weight = None
        addmm: "f32[3, 3]" = torch.addmm(buf0, arg2_1, linear_weight_view, alpha = 1, beta = 1, out = buf1);  buf0 = arg2_1 = linear_weight_view = addmm = None
        return buf1

The motivation behind this is that backends like ExecuTorch/MTIA would like to use inductor's optimization technologies, but might have their own graph lowering pipelines so they might not want to use AOTI (which generates an so).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

[ghstack-poisoned]

pytorch-bot · 2025-08-15T18:43:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160765

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d931dda with merge base 80cca83 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

blaine-rister · 2025-08-15T21:57:28Z

Awesome! This could be very useful for unifying MTIA's inference UX and compilation flow with GPUs and CPUs.

torch/_inductor/codegen/wrapper_fxir.py

torch/_inductor/compile_fx.py

torch/_inductor/codegen/wrapper_fxir.py

To use: ```python ep = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes) gm = torch._inductor.aot_compile( ep.module(), inp, options={"fx_wrapper": True, "compile_threads": 1} ) assert torch.allclose(model(*inp), gm(*inp)) ``` [ghstack-poisoned]

Using the existing WrapperFxCodegen backend, this PR prototypes an AOT version of it which will directly return a graph module. How to use: ```python exported_gm = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes).module() compiled_gm = torch._inductor.aot_compile( exported_gm, inp, options={"fx_wrapper": True, "compile_threads": 1} ) assert torch.allclose(model(*inp), compiled_gm(*inp)) ``` The motivation behind this is that backends like ExecuTorch/MTIA would like to use inductor's optimization technologies, but might have their own graph lowering pipelines so they might not want to use AOTI (which generates an so). [ghstack-poisoned]

pytorchmergebot · 2025-08-18T15:34:59Z

Starting merge as part of PR stack under #160766

Pull Request resolved: #160766 Approved by: https://github.com/jansel ghstack dependencies: #160765

Using the existing WrapperFxCodegen backend, this PR prototypes an AOT version of it which will directly return a graph module. How to use: ```python exported_gm = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes).module() compiled_gm = torch._inductor.aot_compile( exported_gm, inp, options={"fx_wrapper": True, "compile_threads": 1} ) assert torch.allclose(model(*inp), compiled_gm(*inp)) ``` The motivation behind this is that backends like ExecuTorch/MTIA would like to use inductor's optimization technologies, but might have their own graph lowering pipelines so they might not want to use AOTI (which generates an so). Pull Request resolved: pytorch#160765 Approved by: https://github.com/jansel

Pull Request resolved: pytorch#160766 Approved by: https://github.com/jansel ghstack dependencies: pytorch#160765

Fixes #162357 Fixes #160970 Fixes #161038 Fixes #160951 Fixes #161698 These tests were introduced in #160765 and they are all flaky when `torch._inductor.aot_compile` uses multiple threads (the default option). The issue could be reproduced by running them locally multiple times. For example, ``` pytest --flake-runs 10 --flake-finder -v inductor/test_fxir_backend.py -k test_aoti_fx_add (output logs at P1938386961) ... --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 2), ('async_compile_cache_hit', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 2), ('async_compile_cache_hit', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 2), ('async_compile_cache_hit', 1)] graph_break [] ================================================================================================================================================= short test summary info ================================================================================================================================================== FAILED [0.4834s] inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add - AttributeError: 'NoneType' object has no attribute '__code__' FAILED [0.4576s] inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add - AttributeError: 'NoneType' object has no attribute '__code__' FAILED [0.4613s] inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add - AttributeError: 'NoneType' object has no attribute '__code__' =============================================================================================================================================== 3 failed, 7 passed in 12.89s =============================================================================================================================================== ``` Setting `compile_threads` to 1 will get rid of the test flakiness, but there might be underlying issues from #160765. Pull Request resolved: #162472 Approved by: https://github.com/angelayi, https://github.com/Skylion007

Using the existing WrapperFxCodegen backend, this PR prototypes an AOT version of it which will directly return a graph module. How to use: ```python exported_gm = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes).module() compiled_gm = torch._inductor.aot_compile( exported_gm, inp, options={"fx_wrapper": True, "compile_threads": 1} ) assert torch.allclose(model(*inp), compiled_gm(*inp)) ``` The motivation behind this is that backends like ExecuTorch/MTIA would like to use inductor's optimization technologies, but might have their own graph lowering pipelines so they might not want to use AOTI (which generates an so). Pull Request resolved: pytorch#160765 Approved by: https://github.com/jansel

Pull Request resolved: pytorch#160766 Approved by: https://github.com/jansel ghstack dependencies: pytorch#160765

Fixes pytorch#162357 Fixes pytorch#160970 Fixes pytorch#161038 Fixes pytorch#160951 Fixes pytorch#161698 These tests were introduced in pytorch#160765 and they are all flaky when `torch._inductor.aot_compile` uses multiple threads (the default option). The issue could be reproduced by running them locally multiple times. For example, ``` pytest --flake-runs 10 --flake-finder -v inductor/test_fxir_backend.py -k test_aoti_fx_add (output logs at P1938386961) ... --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 2), ('async_compile_cache_hit', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 2), ('async_compile_cache_hit', 1)] graph_break [] --------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------------------------- inductor [('async_compile_cache_miss', 2), ('async_compile_cache_hit', 1)] graph_break [] ================================================================================================================================================= short test summary info ================================================================================================================================================== FAILED [0.4834s] inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add - AttributeError: 'NoneType' object has no attribute '__code__' FAILED [0.4576s] inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add - AttributeError: 'NoneType' object has no attribute '__code__' FAILED [0.4613s] inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add - AttributeError: 'NoneType' object has no attribute '__code__' =============================================================================================================================================== 3 failed, 7 passed in 12.89s =============================================================================================================================================== ``` Setting `compile_threads` to 1 will get rid of the test flakiness, but there might be underlying issues from pytorch#160765. Pull Request resolved: pytorch#162472 Approved by: https://github.com/angelayi, https://github.com/Skylion007

[aoti-fx] Initial AOTInductor FX

9e50f7d

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor release notes: fx release notes category labels Aug 15, 2025

This was referenced Aug 15, 2025

[aoti-fx] Dynamic shapes support #160766

Closed

P1905802140 #160767

Closed

blaine-rister reviewed Aug 15, 2025

View reviewed changes

torch/_inductor/codegen/wrapper_fxir.py Outdated Show resolved Hide resolved

blaine-rister reviewed Aug 15, 2025

View reviewed changes

torch/_inductor/compile_fx.py Outdated Show resolved Hide resolved

blaine-rister reviewed Aug 15, 2025

View reviewed changes

torch/_inductor/compile_fx.py Show resolved Hide resolved

blaine-rister reviewed Aug 15, 2025

View reviewed changes

torch/_inductor/codegen/wrapper_fxir.py Show resolved Hide resolved

angelayi added 2 commits August 15, 2025 16:14

jansel approved these changes Aug 16, 2025

View reviewed changes

angelayi requested review from avikchaudhuri, tugsbayasgalan, ydwu4 and zhxchen17 as code owners August 16, 2025 23:50

pytorchmergebot closed this in bab7982 Aug 18, 2025

pytorchmergebot pushed a commit that referenced this pull request Aug 18, 2025

[aoti-fx] Dynamic shapes support (#160766)

6ac9035

Pull Request resolved: #160766 Approved by: https://github.com/jansel ghstack dependencies: #160765

pytorchmergebot added the Merged label Aug 18, 2025

can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025

[aoti-fx] Dynamic shapes support (pytorch#160766)

7ae39ca

Pull Request resolved: pytorch#160766 Approved by: https://github.com/jansel ghstack dependencies: pytorch#160765

huydhn mentioned this pull request Sep 9, 2025

Fix flaky AOTFxirTestCase #162472

Closed

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025

[aoti-fx] Dynamic shapes support (pytorch#160766)

d04e3ca

Pull Request resolved: pytorch#160766 Approved by: https://github.com/jansel ghstack dependencies: pytorch#160765

github-actions bot deleted the gh/angelayi/112/head branch September 21, 2025 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aoti-fx] Initial AOTInductor FX#160765

[aoti-fx] Initial AOTInductor FX#160765
angelayi wants to merge 4 commits intogh/angelayi/112/basefrom
gh/angelayi/112/head

angelayi commented Aug 15, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

blaine-rister commented Aug 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pytorchmergebot commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

angelayi commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160765

✅ No Failures

Uh oh!

blaine-rister commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pytorchmergebot commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

angelayi commented Aug 15, 2025 •

edited

Loading

pytorch-bot bot commented Aug 15, 2025 •

edited

Loading

blaine-rister commented Aug 15, 2025 •

edited

Loading