[while_loop] support closures by ydwu4 · Pull Request #123018 · pytorch/pytorch

ydwu4 · 2024-03-30T00:37:03Z

Stack from ghstack (oldest at bottom):

We add an additional_inputs arguments to the HOP while_loop and rename the operands to carried_inputs based on offline discussion with @zou3519 . This allows us to support closures, parameters and buffers.

The alternative is to pass the lifted inputs directly to outputs of body_fn. But since we want the body_fn's output to not aliasing input. We'll need to copy the inputs and remove the copies later. This is a bit more work to do.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-03-30T00:37:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123018

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit c2fb37c with merge base 09c72ea ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/ops_handler.py:
pull / linux-docs / build-docs-python-false (gh)
Process completed with exit code 2.

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-jammy-py3.8-gcc11 / test (docs_test, 1, 1, linux.2xlarge, unstable) (gh)
Process completed with exit code 2.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 2d3fb2d Pull Request resolved: #123018

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: e44af3e Pull Request resolved: #123018

aakhundov

Thanks a lot @ydwu4 for extending the torch.while_loop API to handle additional inputs in a more efficient way (without need for cloning in each iteration)! Left a few comments, mostly nits.

aakhundov · 2024-04-01T05:01:13Z

+        assert (
+            len(additional_inputs) == 0
+        ), "Additional inputs are set automatically by dynamo"


If additional inputs are set automatically by dynamo, what is the reason of passing an empty tuple here? Should the number of arguments match?

yeah, its schema has changed to torch.ops.higher_order.while_loop(cond, body, carried_inputs, additional_inputs). When dynamo is compiling this higher order op, the additional_inputs is expected to be a tuple.

I've updated the PR to incorporate this additional_inputs when it has non-zero length e.g. when run the tests with 'PYTORCH_TEST_WITH_DYNAMO=1' (i.e. torch.compile (a function with while_loop)).

aakhundov · 2024-04-01T05:09:05Z

+            "body_fn",
+        )
+
+        additional_lifted_inputs = tuple(cond_shared + cond_unique + body_unique)


Curious, why do we include cond_shared into the lifted inputs here?

cond_shared and body_shared refer to the same proxy in parent graph. Using either of them is OK. I can add a comment here.

aakhundov · 2024-04-01T05:11:30Z

-        return super().__call__(cond_fn, body_fn, operands)
+        if not isinstance(additional_inputs, tuple):
+            raise RuntimeError(
+                "additional_inputs must be a tuple, got " f"{type(additional_inputs)}"


UX question: if additional_inputs are generated internally (and are not considered a part of public API), would the errors re. additional_inputs make sense to the user?

Yeah, you're right. This should probably just be an assertion.

aakhundov · 2024-04-01T05:12:10Z

            raise RuntimeError(
-                "operands must be a tuple of tensors, ints, floats, or bools, got "
-                f"{operands}"
+                "carried_inputs must be a tuple, got " f"{type(carried_inputs)}"


Nit: why separate f-string here? And below.

Nice catch..I don't remember why i'm writing this way lol

aakhundov · 2024-04-01T05:24:19Z

-            body_outer_inputs
-        )  # carry over the state from body_fn
+
+        # carry over the state from body_fn


Could we make this comment a bit more specific mentioning that we only carry over the carried_inputs part of the inputs, but not the additional ones?

aakhundov · 2024-04-01T05:26:13Z


        assert (
-            len(operands) > 0
+            len(carried_inputs) > 0


Actually, we can fetch the device from additional_inputs, too. So maybe just check that we have any input at least once?

aakhundov · 2024-04-01T05:27:04Z

+        fx_carried_inputs = V.graph.current_node.args[-2]
+        fx_additional_inputs = V.graph.current_node.args[-1]
+        fake_carried_inputs = [x.meta["val"] for x in fx_carried_inputs]  # type: ignore[union-attr]
+        fake_additional_inputs = [x.meta["val"] for x in fx_additional_inputs]  # type: ignore[union-attr]


As these are used identically below for carried and additional inputs, would it make sense to combine them upstream to something like all_inputs and then do fx, fake, and use that single list below?

aakhundov · 2024-04-01T05:30:53Z

-def while_loop(cond_fn, body_fn, operands):
-    if any(map(is_triton, operands)):
+def while_loop(cond_fn, body_fn, carried_inputs, additional_inputs):
+    if any(map(is_triton, carried_inputs)) or any(map(is_triton, additional_inputs)):


Nit: combine carried_inputs + additional_inputs instead?

aakhundov · 2024-04-01T05:31:59Z

@@ -0,0 +1,234 @@
+import logging


Seems this file inadvertently made its way to this PR? :)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 6deb60e Pull Request resolved: #123018

zou3519 · 2024-04-01T19:29:39Z


 class WhileLoopOp(HigherOrderOperator):
-    def __call__(self, cond_fn, body_fn, operands):
+    def __call__(self, cond_fn, body_fn, carried_inputs, additional_inputs):


What are the chances you can make this positional-only lol.

e.g.:

def __call__(self, cond_fn, body_fn, carried_inputs, additional_inputs, /)

Yeah, I think we can do that.

I just found previously, we're not using this WhileLoopOp but the plain HigherOrderOp. Updated it to use the WhileLoopOp for checking the inputs. Will add more tests for invalide inputs as a follow-up.

zou3519 · 2024-04-01T19:29:58Z


 class WhileLoopOp(HigherOrderOperator):
-    def __call__(self, cond_fn, body_fn, operands):
+    def __call__(self, cond_fn, body_fn, carried_inputs, additional_inputs):


can you add type annotations for these?

We add an additional_inputs arguments to the HOP while_loop and rename the operands to carried_inputs based on offline discussion with zou3519 . This allows us to support closures, parameters and buffers. The alternative is to pass the lifted inputs directly to outputs of body_fn. But since we want the body_fn's output to not aliasing input. We'll need to copy the inputs and remove the copies later. This is a bit more work to do. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

aakhundov · 2024-04-01T21:39:33Z

@ydwu4 thanks for addressing the comments. Lots of tests are failing, could you have a look?

We add an additional_inputs arguments to the HOP while_loop and rename the operands to carried_inputs based on offline discussion with zou3519 . This allows us to support closures, parameters and buffers. The alternative is to pass the lifted inputs directly to outputs of body_fn. But since we want the body_fn's output to not aliasing input. We'll need to copy the inputs and remove the copies later. This is a bit more work to do. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 374b170 Pull Request resolved: #123018

ydwu4 · 2024-04-02T16:37:58Z

Need to wait for the change of xla's while_loop xla key implementation in pytorch/xla#6872 to fix the xla test failures.

We add an additional_inputs arguments to the HOP while_loop and rename the operands to carried_inputs based on offline discussion with zou3519 . This allows us to support closures, parameters and buffers. The alternative is to pass the lifted inputs directly to outputs of body_fn. But since we want the body_fn's output to not aliasing input. We'll need to copy the inputs and remove the copies later. This is a bit more work to do. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: dfee1fe Pull Request resolved: #123018

ydwu4 · 2024-04-03T16:36:11Z

@pytorchbot merge

pytorchmergebot · 2024-04-03T16:37:52Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

ydwu4 · 2024-04-03T16:53:27Z

@pytorchbot merge

pytorchmergebot · 2024-04-03T16:55:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

#123018 introduces a necessary bc breaking change and sees a bunch of xla test failures on CI. We made a pr to pytorch/xla to prepare for the breaking change pytorch/xla#6872. We update the pin of pytorch/xla to reflect the change in this PR. Pull Request resolved: #123217 Approved by: https://github.com/clee2000

pytorch#123018 introduces a necessary bc breaking change and sees a bunch of xla test failures on CI. We made a pr to pytorch/xla to prepare for the breaking change pytorch/xla#6872. We update the pin of pytorch/xla to reflect the change in this PR. Pull Request resolved: pytorch#123217 Approved by: https://github.com/clee2000

@zou3519

We add an additional_inputs arguments to the HOP while_loop and rename the operands to carried_inputs based on offline discussion with @zou3519 . This allows us to support closures, parameters and buffers. The alternative is to pass the lifted inputs directly to outputs of body_fn. But since we want the body_fn's output to not aliasing input. We'll need to copy the inputs and remove the copies later. This is a bit more work to do. Pull Request resolved: pytorch#123018 Approved by: https://github.com/aakhundov ghstack dependencies: pytorch#123217

[while_loop] support closures

844c2e5

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor module: dynamo labels Mar 30, 2024

ydwu4 added a commit that referenced this pull request Mar 30, 2024

[while_loop] support closures

175a40e

ghstack-source-id: 2d3fb2d Pull Request resolved: #123018

vadimkantorov mentioned this pull request Mar 30, 2024

[feature request] torch.scan (also port lax.fori_loop / lax.while_loop / lax.associative_scan and hopefully parallelized associative scans) #50688

Open

Update on "[while_loop] support closures"

feb850f

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

pytorch-bot Bot added the module: inductor label Mar 30, 2024

ydwu4 added a commit that referenced this pull request Mar 30, 2024

[while_loop] support closures

fb08799

ghstack-source-id: e44af3e Pull Request resolved: #123018

aakhundov reviewed Apr 1, 2024

View reviewed changes

Update on "[while_loop] support closures"

799c0fd

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Update on "[while_loop] support closures"

db897a7

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ydwu4 added a commit that referenced this pull request Apr 1, 2024

[while_loop] support closures

f56b321

ghstack-source-id: 6deb60e Pull Request resolved: #123018

ydwu4 requested a review from zou3519 April 1, 2024 18:20

zou3519 reviewed Apr 1, 2024

View reviewed changes

ydwu4 added a commit that referenced this pull request Apr 1, 2024

[while_loop] support closures

5dc9a80

ghstack-source-id: 374b170 Pull Request resolved: #123018

ydwu4 mentioned this pull request Apr 2, 2024

[while_loop] prepare for torch while_loop signature change. pytorch/xla#6872

Merged

ydwu4 mentioned this pull request Apr 2, 2024

Update pytorch/xla pin #123217

Closed

ydwu4 added a commit that referenced this pull request Apr 2, 2024

[while_loop] support closures

00bf754

ghstack-source-id: dfee1fe Pull Request resolved: #123018

aakhundov approved these changes Apr 3, 2024

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 3, 2024

pytorchmergebot added the merging label Apr 3, 2024

pytorchmergebot removed the merging label Apr 3, 2024

ydwu4 added the release notes: export label Apr 3, 2024

pytorchmergebot added the merging label Apr 3, 2024

pytorchmergebot added the Merged label Apr 3, 2024

pytorchmergebot closed this in a4035be Apr 3, 2024

pytorchmergebot removed the merging label Apr 3, 2024

github-actions Bot deleted the gh/ydwu4/100/head branch May 4, 2024 01:56

Conversation

ydwu4 commented Mar 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123018

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

aakhundov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aakhundov commented Apr 1, 2024

Uh oh!

ydwu4 commented Apr 2, 2024

Uh oh!

ydwu4 commented Apr 3, 2024

Uh oh!

pytorchmergebot commented Apr 3, 2024

Merge failed

Uh oh!

ydwu4 commented Apr 3, 2024

Uh oh!

pytorchmergebot commented Apr 3, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ydwu4 commented Mar 30, 2024 •

edited

Loading

pytorch-bot Bot commented Mar 30, 2024 •

edited

Loading