Skip to content

Hoist choose_dispatcher to top level, remove unnecessary returns#158176

Closed
ezyang wants to merge 2 commits intogh/ezyang/3100/basefrom
gh/ezyang/3100/head
Closed

Hoist choose_dispatcher to top level, remove unnecessary returns#158176
ezyang wants to merge 2 commits intogh/ezyang/3100/basefrom
gh/ezyang/3100/head

Conversation

[ghstack-poisoned]
@ezyang ezyang requested a review from bdhirsh as a code owner July 12, 2025 01:35
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158176

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit a48ff98 with merge base 4b9a6f7 (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

stack.enter_context(compiled_autograd._disable())

compiler_fn, flat_fn, dup_fake_flat_args, aot_config, fw_metadata = (
flat_fn = functional_call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's flat_fn being used in this file after this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used line 1228 below? I'm not sure what you actually mean here lol.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 14, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 14, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / cuda12.8-py3.10-gcc9-sm80 / build, trunk / linux-jammy-rocm-py3.10 / build

Details for Dev Infra team Raised by workflow job

[ghstack-poisoned]
@ezyang
Copy link
Contributor Author

ezyang commented Jul 15, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #158251

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #158319

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Contributor Author

ezyang commented Jul 15, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-jammy-py3-clang12-mobile-build / build, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Contributor Author

ezyang commented Jul 16, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-jammy-py3-clang12-mobile-build / build, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Contributor Author

ezyang commented Jul 16, 2025

@pytorchbot merge -f "-i flag not working"

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot merge: error: argument -f/--force: expected one argument

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Try @pytorchbot --help for more info.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 16, 2025

@pytorchbot merge -f "zinger"

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 16, 2025

You need to provide a reason for using force merge, in the format @pytorchbot merge -f 'Explanation'.
The explanation needs to be clear on why this is needed. Here are some good examples:

  • Bypass checks due to unrelated upstream failures from ...
  • This is a minor fix to ..., which shouldn't break anything
  • This is pre-tested in a previous CI run
  • Bypass flaky ... check

@ezyang
Copy link
Contributor Author

ezyang commented Jul 16, 2025

@pytorchbot merge -f "you can't parse the real explanation"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 16, 2025
The starting point for this refactor is that I need access to the fully
general joint graph representation in an export-like interface, but I
then subsequently need a way to feed this joint graph into the rest of
the compilation pipeline so I can get an actual callable that I can run
once I've finished modifying it.  Previously, people had added export
capabilities to AOTAutograd by having an export flag that toggled what
exactly the functions return and triggering aot_dispatch to go to a
different "export" implementation, but I've found this difficult to
understand and has lead to a bit of duplicate code for the export path.

So the idea here is to reorganize the structure of the function calls in AOTAutograd. Here, it is helpful to first describe how things used to work:

* Start with aot_autograd.py top level functions like aot_function, _aot_export_function and aot_module_simplified. These call:
  * create_aot_dispatcher_function. This does a bunch of stuff (forward metadata collection) and adds many context managers. This calls:
    * One of aot_dispatch_base, aot_dispatch_export or aot_dispatch_autograd, which:
      * Call aot_dispatch_autograd_graph or aot_dispatch_base_graph to actually do the graph capture
      * Do some base/export/autograd specific post-processing on the graph

Notice the pattern of nested function invocations means that there is no way to easily get the graph capture result from the autograd case; furthermore, the export path is "bolted" on to force the entire chain of functions to have a different return result than normal, and no way to *resume* the rest of the post-processing to actually get a callable.

Here is the new structure:

* Start with aot_autograd.py top level functions like aot_function, _aot_export_function and aot_module_simplified. These now orchestrate this top level flow:
  * Start a context manager (stack); this stateful context block takes care of all of the nested context managers which originally necessitated the nested call structure
  * Call create_aot_state to do initial setup and setup all the context managers on stack. These context managers do NOT exit upon return of this.
  * Call aot_stage1_graph_capture to do the graph capture
  * Call aot_stage2_compile or aot_stage2_export depending on what postprocessing you want

With this new structure, it's now possible (although not done in this PR) to return the graph after aot_stage1_graph_capture and do something with it, before running aot_stage2_compile to finish the job.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: #158213
Approved by: https://github.com/jamesjwu
ghstack dependencies: #158149, #158150, #158173, #158176
pytorchmergebot pushed a commit that referenced this pull request Jul 16, 2025
…nd functions to frontend_utils (#158251)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: #158251
Approved by: https://github.com/jamesjwu
ghstack dependencies: #158149, #158150, #158173, #158176, #158213
pytorchmergebot pushed a commit that referenced this pull request Jul 16, 2025
Also a small amount of extra code cleanup.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: #158319
Approved by: https://github.com/jingsh
ghstack dependencies: #158149, #158150, #158173, #158176, #158213, #158251
@github-actions github-actions bot deleted the gh/ezyang/3100/head branch August 16, 2025 02:18
Khanaksahu pushed a commit to Khanaksahu/pytorch that referenced this pull request Nov 17, 2025
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 1fc4f6e
Pull-Request: pytorch/pytorch#158176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants