[invoke_subgraph][inductor] Thread graphsafe rng input states for hops by anijain2305 · Pull Request #160713 · pytorch/pytorch

anijain2305 · 2025-08-15T06:15:57Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

[ghstack-poisoned]

pytorch-bot · 2025-08-15T06:16:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160713

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7379228 with merge base fa75ba9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 94d2b4f Pull-Request: #160713

[ghstack-poisoned]

ghstack-source-id: 34fbe6a Pull-Request: #160713

[ghstack-poisoned]

eellison

one question about recursion

eellison · 2025-08-20T17:09:03Z

torch/_functorch/partitioners.py

+    There is a catch: for a short period, the joint graph is in a “bad” state.
+    The HOP subgraphs expect additional inputs (because of the new
+    placeholders), but the outer graph call sites don't yet provide them. We
+    can't fix this in the joint graph because the joint graph's input signature
+    is fixed (primals, tangents). As a compromise, we keep the joint graph in
+    somewhat of a bad state for some time and, once the outer forward and
+    backward graphs are partitioned, insert the corresponding RNG placeholders
+    and wire up the calls.


Would it be clearer to have a temporary node that represents the yet to be added placeholder node ? Then both the non-hop and hop could have a pass to lift them to placeholders ?

I thought about that a little more, but I could not figure out how to add that temporary node. It can not be a placholder on the joint graph because the signature is primals, tangents where both of them are lists. And other parts of the stack (mostly partitioner) assumes this signature. If we make it (primals, tangents, fwd_rng_state, bw_rng_state), we will have to make changes at many many places. At that point, it might end up being more hacky.

eellison · 2025-08-20T17:10:56Z

torch/_functorch/partitioners.py

+    """
+
+    rng_count = 0
+    rng_string = "bwd_rng_state" if is_backward else "fwd_rng_state"


I would expect this function to be recursive, for when we have

module:
hop:
hop (rng)

This will work because the run_joint_graph_passes_on_hops runs recursively. So there will be a sequence of partition_hop_level2_joint -> partition_hop_level1_joint -> partition_main_joint.

Overall, we have not very thoroughly tested nested hops. But I can try to add a few more tests in the followup PR. Some ideas in my mind are AC + invoke_subgraph.

anijain2305 · 2025-08-21T19:55:55Z

Talked offline with @eellison - he suggested to insert a custom op in the joint graph that is later lifted as input during the partitioning. This should keep the joint graph in somewhat reasonable state. This can be done in a follow up PR.

anijain2305 · 2025-08-21T20:33:23Z

@pytorchbot merge

pytorchmergebot · 2025-08-21T20:35:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch#160713) Pull Request resolved: pytorch#160713 Approved by: https://github.com/eellison

Update

195ef33

[ghstack-poisoned]

anijain2305 requested a review from zou3519 as a code owner August 15, 2025 06:15

anijain2305 added a commit that referenced this pull request Aug 15, 2025

[invoke_subgraph][inductor] Thread graphsafe rng input states for hops

e9322c5

ghstack-source-id: 94d2b4f Pull-Request: #160713

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 15, 2025

anijain2305 added the topic: not user facing topic category label Aug 15, 2025

Update

67ea136

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Aug 15, 2025

[invoke_subgraph][inductor] Thread graphsafe rng input states for hops

cd8a97a

ghstack-source-id: 34fbe6a Pull-Request: #160713

anijain2305 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 15, 2025

anijain2305 requested a review from eellison August 15, 2025 19:03

Update

7379228

[ghstack-poisoned]

anijain2305 mentioned this pull request Aug 15, 2025

[invoke_subgraph] Support integer outputs in the subgraph #160777

Closed

eellison reviewed Aug 20, 2025

View reviewed changes

eellison approved these changes Aug 21, 2025

View reviewed changes

pytorchmergebot added the merging label Aug 21, 2025

pytorchmergebot closed this in 5805c42 Aug 21, 2025

pytorchmergebot added Merged and removed merging labels Aug 21, 2025

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025

[invoke_subgraph][inductor] Thread graphsafe rng input states for hops (

4163c8a

pytorch#160713) Pull Request resolved: pytorch#160713 Approved by: https://github.com/eellison

github-actions bot deleted the gh/anijain2305/849/head branch September 21, 2025 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[invoke_subgraph][inductor] Thread graphsafe rng input states for hops#160713

[invoke_subgraph][inductor] Thread graphsafe rng input states for hops#160713
anijain2305 wants to merge 3 commits intogh/anijain2305/849/basefrom
gh/anijain2305/849/head

anijain2305 commented Aug 15, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

eellison left a comment

Uh oh!

eellison Aug 20, 2025

Uh oh!

anijain2305 Aug 20, 2025 •

edited

Loading

Uh oh!

eellison Aug 20, 2025

Uh oh!

anijain2305 Aug 20, 2025

Uh oh!

anijain2305 commented Aug 21, 2025 •

edited

Loading

Uh oh!

anijain2305 commented Aug 21, 2025

Uh oh!

pytorchmergebot commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anijain2305 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160713

✅ No Failures

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

anijain2305 Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

anijain2305 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

anijain2305 commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anijain2305 commented Aug 21, 2025

Uh oh!

pytorchmergebot commented Aug 21, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anijain2305 commented Aug 15, 2025 •

edited

Loading

pytorch-bot bot commented Aug 15, 2025 •

edited

Loading

anijain2305 Aug 20, 2025 •

edited

Loading

anijain2305 commented Aug 21, 2025 •

edited

Loading