[annotate] Annotation should be mapped across submod by yushangdi · Pull Request #165202 · pytorch/pytorch

yushangdi · 2025-10-10T23:58:37Z

The match for backward nodes might be in a different submod, so we should check all submod for potential matches.

In flex attention, this could happen if mask_mod has operations (such as index) that increase the seq_nr of the forward graph nodes. Then the backward flex_attention nodes cannot find a match in its own subgraph.

python test/functorch/test_aot_joint_with_descriptors.py -k preserve_annotate

Also tested on torchtitan joint_graph_runner branch. The flex_attention backward nodes are annotated now.

NGPU=8   CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml"   LOG_RANK=0   TRAIN_FILE="torchtitan.train"   TORCHFT_LIGHTHOUSE="http://localhost:29510"   PYTORCH_ALLOC_CONF="expandable_segments:True"   torchrun     --nproc_per_node=8     --rdzv_backend c10d     --rdzv_endpoint="localhost:0"     --local-ranks-filter 0     --role rank     --tee 3     -m torchtitan.train     --job.config_file ./torchtitan/models/llama3/train_configs/debug_model.toml     --model.name joint_graph_runner.llama3     --compile.enable     --parallelism.data_parallel_shard_degree=2     --parallelism.tensor_parallel_degree=4     --model.flavor=debugmodel_flex_attn

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

pytorch-bot · 2025-10-10T23:58:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165202

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 676774b with merge base 37d57ac ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (similar failure)
RuntimeError: doctests 1/1 failed!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/fx/traceback.py

torch/_functorch/_aot_autograd/utils.py

SherlockNoMad

lgtm with minor comments.
get_custom_metadata doesn't seems useful to user, let's make it private.

yushangdi · 2025-10-13T19:55:22Z

@pytorchbot merge

pytorchmergebot · 2025-10-13T19:57:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-13T23:12:53Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch rebase origin/main returned non-zero exit code 1

Rebasing (1/1)
Auto-merging test/dynamo/test_fx_annotate.py
CONFLICT (content): Merge conflict in test/dynamo/test_fx_annotate.py
error: could not apply d018c77aadd... [annotate] Annotation should be mapped across submod (#165202)
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply d018c77aadd... # [annotate] Annotation should be mapped across submod (#165202)

Details for Dev Infra team

Raised by workflow job

yushangdi · 2025-10-14T16:10:24Z

@pytorchbot merge

pytorchmergebot · 2025-10-14T16:13:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

The match for backward nodes might be in a different submod, so we should check all submod for potential matches. In flex attention, this could happen if `mask_mod` has operations (such as index) that increase the seq_nr of the forward graph nodes. Then the backward flex_attention nodes cannot find a match in its own subgraph. ``` python test/functorch/test_aot_joint_with_descriptors.py -k preserve_annotate ``` Also tested on torchtitan joint_graph_runner branch. The flex_attention backward nodes are annotated now. ``` NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" LOG_RANK=0 TRAIN_FILE="torchtitan.train" TORCHFT_LIGHTHOUSE="http://localhost:29510" PYTORCH_ALLOC_CONF="expandable_segments:True" torchrun --nproc_per_node=8 --rdzv_backend c10d --rdzv_endpoint="localhost:0" --local-ranks-filter 0 --role rank --tee 3 -m torchtitan.train --job.config_file ./torchtitan/models/llama3/train_configs/debug_model.toml --model.name joint_graph_runner.llama3 --compile.enable --parallelism.data_parallel_shard_degree=2 --parallelism.tensor_parallel_degree=4 --model.flavor=debugmodel_flex_attn ``` Pull Request resolved: pytorch#165202 Approved by: https://github.com/SherlockNoMad

) This is an e2e prototype to run llama3-simplefsdp using export-y aot_autograd workflow. Setup: shard_dp = 2, tp = 4. MVP - [Done] Start with a simpleFSDP model, enable TP + FSDP - [Done] Apply [aot_export_joing_with_descriptor](pytorch/pytorch#163609) on parallelized module with DTensor input to get the joint graph - [Done] Apply min_cut_partitioner to get forward and backward graph module - [Done but Need verification] Apply prefect/bucketing graph passes on fw_gm and bw_gm to reorder/group the communication collectives - [Done] Run the joint graph with `aot_compile_joint_with_descriptors` - [Done] Region Inductor for FlexAttention, need to run on top of pytorch/pytorch#165202 and pytorch/pytorch#164776 Nest Step - Enable CudaGraph - Enable SimpleFSDP + EP - Showcase user annotation on MoE for dispatch, compute, combine region - Enable PP with custom Runner Issues - pytorch/pytorch#164559 - pytorch/pytorch#164543 - What's input order for aot_export_joint graph? using model.parameter() 's order as input seems wrong. Repro steps: NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" with-proxy ./run_train.sh --model.name compiler_toolkit.llama3 --compile.enable --parallelism.data_parallel_shard_degree=2 --parallelism.tensor_parallel_degree=4 Run with FlexAttention: NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" with-proxy ./run_train.sh --model.name compiler_toolkit.llama3 --compile.enable --parallelism.data_paral lel_shard_degree=2 --parallelism.tensor_parallel_degree=4 --model.flavor=debugmodel_flex_attn Sample output: P1975157784: rank0_autograd_function_0fea2786.py P1975158481: rank1_autograd_function_28587623.py --------- Co-authored-by: Simon Fan <xmfan@meta.com>

…torch#1794) This is an e2e prototype to run llama3-simplefsdp using export-y aot_autograd workflow. Setup: shard_dp = 2, tp = 4. MVP - [Done] Start with a simpleFSDP model, enable TP + FSDP - [Done] Apply [aot_export_joing_with_descriptor](pytorch/pytorch#163609) on parallelized module with DTensor input to get the joint graph - [Done] Apply min_cut_partitioner to get forward and backward graph module - [Done but Need verification] Apply prefect/bucketing graph passes on fw_gm and bw_gm to reorder/group the communication collectives - [Done] Run the joint graph with `aot_compile_joint_with_descriptors` - [Done] Region Inductor for FlexAttention, need to run on top of pytorch/pytorch#165202 and pytorch/pytorch#164776 Nest Step - Enable CudaGraph - Enable SimpleFSDP + EP - Showcase user annotation on MoE for dispatch, compute, combine region - Enable PP with custom Runner Issues - pytorch/pytorch#164559 - pytorch/pytorch#164543 - What's input order for aot_export_joint graph? using model.parameter() 's order as input seems wrong. Repro steps: NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" with-proxy ./run_train.sh --model.name compiler_toolkit.llama3 --compile.enable --parallelism.data_parallel_shard_degree=2 --parallelism.tensor_parallel_degree=4 Run with FlexAttention: NGPU=8 CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" with-proxy ./run_train.sh --model.name compiler_toolkit.llama3 --compile.enable --parallelism.data_paral lel_shard_degree=2 --parallelism.tensor_parallel_degree=4 --model.flavor=debugmodel_flex_attn Sample output: P1975157784: rank0_autograd_function_0fea2786.py P1975158481: rank1_autograd_function_28587623.py --------- Co-authored-by: Simon Fan <xmfan@meta.com>

pytorch-bot bot added the ciflow/inductor label Oct 10, 2025

yushangdi added the release notes: fx release notes category label Oct 11, 2025

yushangdi force-pushed the annotation_submod branch from 6ef57a6 to fc75cff Compare October 11, 2025 01:22

pytorch-bot bot added the module: dynamo label Oct 11, 2025

yushangdi changed the title ~~annotation should be mapped across submod~~ [annotate] Annotation should be mapped across submod Oct 11, 2025

yushangdi requested review from SherlockNoMad and anijain2305 October 11, 2025 01:22

yushangdi marked this pull request as ready for review October 11, 2025 01:23

yushangdi requested a review from bdhirsh as a code owner October 11, 2025 01:23

yushangdi mentioned this pull request Oct 11, 2025

[annotate] Copy fwd to bwd metadata for subgraphs as well #164795

Closed

SherlockNoMad reviewed Oct 13, 2025

View reviewed changes

torch/fx/traceback.py Outdated Show resolved Hide resolved

SherlockNoMad reviewed Oct 13, 2025

View reviewed changes

torch/fx/traceback.py Show resolved Hide resolved

SherlockNoMad reviewed Oct 13, 2025

View reviewed changes

torch/_functorch/_aot_autograd/utils.py Outdated Show resolved Hide resolved

SherlockNoMad approved these changes Oct 13, 2025

View reviewed changes

yushangdi force-pushed the annotation_submod branch 2 times, most recently from a0c8704 to 431e6b8 Compare October 13, 2025 16:53

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 13, 2025

pytorchmergebot added the merging label Oct 13, 2025

SherlockNoMad mentioned this pull request Oct 13, 2025

[Compiler Toolkit] JointGraph-based Training Prototype for llama3 pytorch/torchtitan#1794

Merged

pytorchmergebot removed the merging label Oct 13, 2025

annotation should be mapped across submod

e1e6e87

yushangdi force-pushed the annotation_submod branch from c82aa30 to e1e6e87 Compare October 14, 2025 01:02

update test

676774b

pytorchmergebot added the merging label Oct 14, 2025

pytorchmergebot closed this in 5eddbb5 Oct 14, 2025

pytorchmergebot added Merged and removed merging labels Oct 14, 2025

github-actions bot deleted the annotation_submod branch November 14, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[annotate] Annotation should be mapped across submod#165202

[annotate] Annotation should be mapped across submod#165202
yushangdi wants to merge 2 commits intomainfrom
annotation_submod

yushangdi commented Oct 10, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SherlockNoMad left a comment

Uh oh!

yushangdi commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

yushangdi commented Oct 14, 2025

Uh oh!

pytorchmergebot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yushangdi commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165202

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SherlockNoMad left a comment

Choose a reason for hiding this comment

Uh oh!

yushangdi commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge failed

Uh oh!

yushangdi commented Oct 14, 2025

Uh oh!

pytorchmergebot commented Oct 14, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yushangdi commented Oct 10, 2025 •

edited

Loading

pytorch-bot bot commented Oct 10, 2025 •

edited

Loading