Add `torch._dynamo.is_fullgraph_compiling` to allow different codepath depending on fullgraph tracing by fxmarty · Pull Request #120400 · pytorch/pytorch

fxmarty · 2024-02-22T12:43:58Z

This PR fixes https://pytorch.slack.com/archives/C033H6DJSJU/p1708510833453919 & allows to implement different code path depending on whether torch.compile is called with the argument fullgraph=True.

Example (see also the unit test):

def f(x):
    if torch._dynamo.is_fullgraph_compiling():
        # fullgraph=True compliant code path
    else:
        # more permissive code path

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @aakhundov

pytorch-bot · 2024-02-22T12:44:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120400

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures

As of commit 86c5756 with merge base 8a32a07 ():

NEW FAILURES - The following jobs have failed:

inductor / cuda12.1-py3.10-gcc9-sm80 / build (gh)
##[error]The operation was canceled.
inductor / cuda12.1-py3.10-gcc9-sm86 / build (gh)
##[error]The operation was canceled.
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / build (gh)
##[error]The operation was canceled.
pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_sparse_csr.py::TestSparseCSRCPU::test_sparse_to_sparse_compressed_SparseBSC_cpu_float64
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_fx.py::TestFX::test_find_uses
pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_jit.py::TestTypeSharing::test_tracing_gives_different_types
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_sparse_csr.py::TestSparseCSRCPU::test_sparse_to_sparse_compressed_SparseBSC_cpu_float64
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_fx.py::TestFX::test_find_uses
pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_jit.py::TestTypeSharing::test_tracing_gives_different_types
pull / linux-focal-py3.8-clang10-onnx / test (default, 2, 2, linux.2xlarge) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2024-02-22T12:44:27Z

Please seek CI approval before scheduling CIFlow labels

pytorch-bot · 2024-02-22T12:44:53Z

Please seek CI approval before scheduling CIFlow labels

fxmarty · 2024-02-22T12:45:33Z

torch/_dynamo/variables/torch.py

+            # See: https://github.com/pytorch/pytorch/issues/110765
+            tx.mark_inconsistent_side_effects()


@jon-chuang I am not sure whether this is necessary here?

It's necessary for capturing side-effect only code. If your code has torch operation on tensor input, it makes no difference.

fxmarty · 2024-02-22T12:47:19Z

test/dynamo/test_misc.py

+        opt_f = torch.compile(f, fullgraph=True)
+
+        self.assertEqual(f(), torch.zeros(2, 2))
+        self.assertEqual(opt_f(), torch.ones(2, 2))
+
+        opt_g = torch.compile(g, fullgraph=False)
+
+        self.assertEqual(g(), torch.zeros(2, 2))
+        self.assertEqual(opt_g(), torch.zeros(2, 2))


One issue here is that if calling twice the same:

opt_f = torch.compile(f, fullgraph=False) f() opt_f = torch.compile(f, fullgraph=True) f()

somehow torch.compile does not initialize new InstructionTranslator at the second call, so the one_graph attribute is wrong, see https://app.slack.com/client/T2077MDKQ/C033H6DJSJU

Is it legal to call torch.compile several times on the same python object/function?

I think if your original graph was a single graph, you don't recompile for second invocation

You may or may not need to change this behaviour; it's an easy optimization that held previously.

If this PR proposes to capture different graphs based on presence of this function call, then you can make the behaviour stricter - i.e. cause fullgraph flag changing to always recompile - only when your new function call is present; see guards for fullgraph flag.

If this PR proposes to capture different graphs based on presence of this function call

Yes, this PR proposes to capture different graphs based on torch._dynamo.is_fullgraph_compiling() (i.e. based on fullgraph argument). This is working well in case we don't call multiple times torch.compile on the same function/object, but currently does not work when calling successively torch.compile on the same function/object due to InstructionTranslator not being re-initialized.

then you can make the behaviour stricter - i.e. cause fullgraph flag changing to always recompile - only when your new function call is present; see guards for fullgraph flag.

Could you point me out to the file responsible for that? I could not find nopython, one_graph, fullgraph references in guards.py or mutation_guard.py

It may have been removed unfortunately, a bunch of code got reverted due to some meta internal failures (the investigation hasn't concluded afaik).

You may have to introduce a new guard that recompiles when the fullgraph flag changes, if this behaviour is strictly necessary.

It was removed in #115384

Skylion007 · 2024-02-22T22:31:32Z

@janeyx99 This looks like it would be useful to automatically set the capturable parameter on torch optimizers?

pytorch-bot · 2024-02-23T14:21:51Z

Please seek CI approval before scheduling CIFlow labels

pytorch-bot · 2024-02-23T14:34:20Z

Please seek CI approval before scheduling CIFlow labels

torch/_dynamo/guards.py

janeyx99 · 2024-02-23T15:18:19Z

@janeyx99 This looks like it would be useful to automatically set the capturable parameter on torch optimizers?

well we want capturable to be enabled for non fullgraph tracing too. so this shouldn’t change how that logic is currently set up.

pytorch-bot · 2024-02-23T16:11:35Z

Please seek CI approval before scheduling CIFlow labels

fxmarty · 2024-02-23T16:16:49Z

torch/_dynamo/eval_frame.py

        or current_backend == cached_backends.get(backend_obj_id, None)
    )

+def check_nopython(ref_nopython: bool):


The changes in eval_frame.py are probably very far from the best, but it works.

This check_nopython is actually called twice it seems at each guard check, not sure why:

--------------- f fullgraph=True --------------- fullgraph=False guarded_backend_cache.nopython False ref_nopython True guarded_backend_cache.nopython False ref_nopython True --------------- f fullgraph=True call forward guarded_backend_cache.nopython True ref_nopython False guarded_backend_cache.nopython True ref_nopython True --------------- fullgraph=False guarded_backend_cache.nopython False ref_nopython True guarded_backend_cache.nopython False ref_nopython False --------------- fullgraph=True guarded_backend_cache.nopython True ref_nopython False guarded_backend_cache.nopython True ref_nopython True

pytorch-bot · 2024-02-26T16:36:32Z

Please seek CI approval before scheduling CIFlow labels

fxmarty · 2024-02-26T16:37:28Z

Some tests were not passing in the CI but do pass locally, not sure why:

test/test_sparse_csr.py2024-02-26T05:48:12.3626098Z FAILED [0.1356s] test_sparse_csr.py::TestSparseCSRCPU::test_sparse_to_sparse_compressed_SparseBSC_cpu_float64 - torch._dynamo.exc.InternalTorchDynamoError: nnz not found
2024-02-26T06:09:20.6013131Z FAILED [0.0583s] functorch/test_rearrange.py::TestRearrange::test_concatenations_and_stacking - torch._dynamo.exc.InternalTorchDynamoError: dimension d0 is unbound
2024-02-26T06:09:20.6358400Z FAILED [0.0714s] functorch/test_rearrange.py::TestRearrange::test_ellipsis_ops - torch._dynamo.exc.InternalTorchDynamoError: dimension d0 is unbound
2024-02-26T06:09:20.6690803Z FAILED [0.0971s] functorch/test_rearrange.py::TestRearrange::test_rearrange_consistency - torch._dynamo.exc.InternalTorchDynamoError: dimension d0 is unbound
2024-02-26T06:09:20.6960094Z FAILED [0.0425s] functorch/test_rearrange.py::TestRearrange::test_rearrange_permutations - torch._dynamo.exc.InternalTorchDynamoError: dimension d0 is unbound
2024-02-26T06:09:20.7040419Z FAILED [0.0775s] functorch/test_rearrange.py::TestRearrange::test_squeeze - torch._dynamo.exc.InternalTorchDynamoError: dimension d0 is unbound
2024-02-26T06:09:20.7120616Z FAILED [0.0536s] functorch/test_rearrange.py::TestRearrange::test_unsqueeze - torch._dynamo.exc.InternalTorchDynamoError: dimension d0 is unbound
2024-02-26T06:19:56.1722324Z FAILED [0.0209s] torch_np/test_basic.py::TestOneArr::test_asarray_array_func0 - torch._dynamo.exc.InternalTorchDynamoError: Boolean value of Tensor with more than one value is ambiguous
2024-02-26T06:12:06.8184282Z FAILED [0.0375s] test_jit.py::TestTypeSharing::test_tracing_gives_different_types - torch._dynamo.exc.InternalTorchDynamoError: __eq__(): incompatible function arguments. The following argument types are supported:
2024-02-26T06:12:06.8184536Z     1. (self: torch._C.Type, arg0: torch._C.Type) -> bool

There was

2024-02-26T06:07:38.9873830Z FAILED [0.1431s] test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose - hypothesis.errors.Flaky: Hypothesis test_conv2d_transpose(self=<__main__.TestXNNPACKOps testMethod=test_conv2d_transpose>, batch_size=1, input_channels_per_group=1, height=5, width=5, output_channels_per_group=1, groups=1, kernel_h=1, kernel_w=1, stride_h=1, stride_w=1, pad_h=0, pad_w=0, output_pad_h=0, output_pad_w=0, dilation=1, use_bias=False, format=None) produces unreliable results: Falsified on the first call but did not on a subsequent one

as welll but I did not compile with xnnpack so not sure if this one passes locally.

pytorch-bot · 2024-02-26T17:00:15Z

Please seek CI approval before scheduling CIFlow labels

fxmarty · 2024-02-29T11:13:02Z

Hi, I'll be away for two weeks from next week, happy to do modifications here by Friday. It would be cool to have this in 2.3.

fxmarty · 2024-04-15T09:30:47Z

any update @anijain2305? Having a torch._dynamo.is_fullgraph_compiling() and torch._dynamo.is_exporting() would be helpful for us at HF. Are you interested in adding these to PyTorch?

anijain2305 · 2024-04-17T22:12:48Z

@fxmarty Can you share why you need this? The changes required for this are quite awkward. Wondering if you have a specific problem that we can fix so that this flag is not needed.

fxmarty · 2024-04-18T09:20:07Z

Three use cases where I would have wanted this:

Enable @torch.compile.disabe conditionally on whether fullgraph=True is used or not, allowing for much much faster compilation (no CUDA graph recapture) in case dynamically shaped inputs are passed to part (only a part) of the model
Enable data-dependent controlflows in case we are not in fullgraph (or export) mode.
torch._dynamo.is_exporting(): would allow to drop the attn_mask in SDPA and rely on is_causal arg, except if we are exporting, see the comment here https://github.com/huggingface/transformers/blob/9459efb807fef4960eaf58d35e52fac4459933d2/src/transformers/modeling_attn_mask_utils.py#L263-L270 and discussion here Re-enable SDPA's FA2 path huggingface/transformers#30070 (comment)

Basically, I would want to have different execution path depending on fullgraph and/or export.

fxmarty · 2024-04-26T06:57:11Z

any thoughts @anijain2305?

github-actions · 2024-06-25T07:33:45Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

fxmarty · 2024-07-12T14:41:52Z

any interest?

* update non-causal mask for sdpa * add test * update docstrings * add one more test * fix cross attention bug * gentler atol/rtol

add ability to detecte fullgraph dynamo tracing

aceef14

github-actions bot added module: dynamo ciflow/inductor labels Feb 22, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 22, 2024

style

54f1db5

github-actions bot added the ciflow/inductor label Feb 22, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 22, 2024

fxmarty commented Feb 22, 2024

View reviewed changes

pytorchbot added the open source label Feb 22, 2024

fxmarty commented Feb 22, 2024

View reviewed changes

fxmarty mentioned this pull request Feb 22, 2024

Fix torch.compile with fullgraph=True when attention_mask input is used huggingface/transformers#29211

Merged

albanD requested a review from anijain2305 February 22, 2024 15:57

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 22, 2024

Skylion007 requested a review from janeyx99 February 22, 2024 22:31

try to add guard

a277f31

github-actions bot added the ciflow/inductor label Feb 23, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 23, 2024

missing install_guards

2cd6bd5

github-actions bot added the ciflow/inductor label Feb 23, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 23, 2024

fxmarty commented Feb 23, 2024

View reviewed changes

torch/_dynamo/guards.py Show resolved Hide resolved

working guard

1e3bb84

github-actions bot added the ciflow/inductor label Feb 23, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 23, 2024

fxmarty commented Feb 23, 2024

View reviewed changes

Merge branch 'master' into expose-compile-fullgraph-detection

5b1a361

github-actions bot added the ciflow/inductor label Feb 26, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 26, 2024

linting

86c5756

github-actions bot added the ciflow/inductor label Feb 26, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 26, 2024

This was referenced Apr 5, 2024

Llama uses significantly more memory in 4.38 & 4.39 than 4.37 with identical code huggingface/transformers#30010

Closed

Re-enable SDPA's FA2 path huggingface/transformers#30070

Merged

tan-js mentioned this pull request Jun 10, 2024

ValueError: too many values to unpack (expected 2) after running trainer.train() tomaarsen/SpanMarkerNER#60

Open

github-actions bot added the Stale label Jun 25, 2024

pytorch-bot bot added the ciflow/inductor label Jun 25, 2024

github-actions bot closed this Aug 11, 2024

amodab01 referenced this pull request in huggingface/transformers Feb 10, 2025

Ignore non-causal mask in more cases with SDPA (#30138)

221aaec

* update non-causal mask for sdpa * add test * update docstrings * add one more test * fix cross attention bug * gentler atol/rtol

amodab01 mentioned this pull request Feb 10, 2025

torch._subclasses.fake_tensor.DataDependentOutputException: aten._local_scalar_dense.default with `_prepare_4d_attention_mask_for_sdpa( huggingface/transformers#36123

Closed

		# See: https://github.com/pytorch/pytorch/issues/110765
		tx.mark_inconsistent_side_effects()

Conversation

fxmarty commented Feb 22, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120400

❌ 10 New Failures

Uh oh!

pytorch-bot bot commented Feb 22, 2024

Uh oh!

pytorch-bot bot commented Feb 22, 2024

Uh oh!

fxmarty Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

jon-chuang Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fxmarty Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

jon-chuang Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jon-chuang Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fxmarty Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

jon-chuang Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

fxmarty Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Feb 22, 2024

Uh oh!

pytorch-bot bot commented Feb 23, 2024

Uh oh!

pytorch-bot bot commented Feb 23, 2024

Uh oh!

Uh oh!

janeyx99 commented Feb 23, 2024

Uh oh!

pytorch-bot bot commented Feb 23, 2024

Uh oh!

fxmarty Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

pytorch-bot bot commented Feb 26, 2024

Uh oh!

fxmarty commented Feb 26, 2024

Uh oh!

pytorch-bot bot commented Feb 26, 2024

Uh oh!

fxmarty commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fxmarty commented Apr 15, 2024

Uh oh!

anijain2305 commented Apr 17, 2024

Uh oh!

fxmarty commented Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fxmarty commented Apr 26, 2024

Uh oh!

github-actions bot commented Jun 25, 2024

Uh oh!

fxmarty commented Jul 12, 2024

Uh oh!

Reviewers

Assignees

Labels

fxmarty commented Feb 22, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Feb 22, 2024 •

edited

Loading

jon-chuang Feb 22, 2024 •

edited

Loading

jon-chuang Feb 22, 2024 •

edited

Loading

jon-chuang Feb 22, 2024 •

edited

Loading

fxmarty commented Feb 29, 2024 •

edited

Loading

fxmarty commented Apr 18, 2024 •

edited

Loading