Add meta tensor support for _amp_foreach_non_finite_check_and_unscale_ and nan_to_num by wonjoo-wj · Pull Request #94633 · pytorch/pytorch

wonjoo-wj · 2023-02-10T22:49:32Z

Fixes #92916

Add meta tensor support for _amp_foreach_non_finite_check_and_unscale_ and nan_to_num

cc @alanwaketan

pytorch-bot · 2023-02-10T22:49:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94633

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 Failures

As of commit f56287d:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wonjoo-wj · 2023-02-10T22:51:20Z

@bdhirsh, even with these changes cherry-picked onto my pytorch/functioanlization branch, I still see the same errors:

/opt/conda/lib/python3.8/site-packages/torch/_functorch/deprecated.py:93: UserWarning: We've integrated functorch into PyTorch. As the final step of the integration, functorch.functionalize is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use torch.func.functionalize instead; see the PyTorch 2.0 release notes and/or the torch.func migration guide for more details https://pytorch.org/docs/master/func.migrating.html
  warn_deprecated('functionalize')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/torch/_functorch/vmap.py", line 39, in fn
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/_functorch/eager_transforms.py", line 1582, in wrapped
    func_outputs = func(*func_args, **func_kwargs)
  File "<stdin>", line 5, in test
NotImplementedError: Could not run 'aten::_amp_foreach_non_finite_check_and_unscale_' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_amp_foreach_non_finite_check_and_unscale_' is only available for these backends: [XLA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

Anything obvious I'm missing in these changes? Thanks a lot!

bdhirsh · 2023-02-16T00:43:58Z

@wonjoolee95 it looks like it's because nan_to_num's last 3 arguments are all defaultable (you need to include the defaults in your decomp - our tests probably try to call nan_to_num with just one arguments and expect the defaults to get filled in).

…_ and nan_to_num

wonjoo-wj · 2023-02-21T21:50:38Z

The CIs are now looking a lot greener, the failing tests are failing with seemingly unrelated error:

Warning: Failed to download action 'https://api.github.com/repos/actions/upload-artifact/tarball/0b7f8abb1508181956e8e162db84b466c27e18ce'. Error: Response status code does not indicate success: 500 (Internal Server Error).
Warning: Back off 20.144 seconds before retry.
Error: Response status code does not indicate success: 500 (Internal Server Error).

I'll give it a retry.

However, I'm still seeing the same error at #94633 (comment) even with this. Looking into it more.

wonjoo-wj · 2023-02-22T01:06:00Z

Synced with Brian offline, putting some information here. I was able to verify that Meta registration shows up:

>>> print(torch._C._dispatch_dump("aten::nan_to_num.out"))
name: aten::nan_to_num.out
schema: aten::nan_to_num.out(Tensor self, float? nan=None, float? posinf=None, float? neginf=None, *, Tensor(a!) out) -> Tensor(a!)
debug: registered at /workspace/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
alias analysis kind: FROM_SCHEMA
Functionalize: registered at /workspace/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:23953 :: (Tensor _0, float? _1, float? _2, float? _3, Tensor _4) -> Tensor _0 [ boxed unboxed ]
...
Meta: registered at /dev/null:219 :: (none) [ boxed ]
SparseCPU: registered at /workspace/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1379 :: (Tensor _0, float? _1, float? _2, float? _3, Tensor _4) -> Tensor _0 [ boxed unboxed ]
Autograd[alias]: registered at /workspace/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17909 :: (Tensor _0, float? _1, float? _2, float? _3, Tensor _4) -> Tensor _0 [ boxed unboxed ]

With that said, the none at Meta: registered at /dev/null:219 :: (none) is a bit suspicious as other registration don't have such none.

wonjoo-wj · 2023-02-23T22:31:40Z

Oddly enough, I can actually see that these ops work as intended in a Python intepretor:

x = torch.tensor([float('nan'), float('inf'), -float('inf'), 3.14])
x.nan_to_num_(1.0, 2.0, 3.0)
x_xla = torch.tensor([float('nan'), float('inf'), -float('inf'), 3.14], device='xla:0')
x_xla.nan_to_num_(1.0, 2.0, 3.0)

wonjoo-wj · 2023-04-24T23:30:55Z

Closed with pytorch/xla#4687.

wonjoo-wj added the topic: not user facing topic category label Feb 10, 2023

wonjoo-wj self-assigned this Feb 10, 2023

pytorchbot added the open source label Feb 10, 2023

wonjoo-wj force-pushed the meta-tensor branch 2 times, most recently from 4e8fc43 to 3f30969 Compare February 17, 2023 22:21

wonjoo-wj added 3 commits February 18, 2023 03:22

Add meta tensor support for _amp_foreach_non_finite_check_and_unscale…

0c6e709

…_ and nan_to_num

Run linter

9481615

Add default parameter values for nan_to_num

f56287d

wonjoo-wj force-pushed the meta-tensor branch from 3f30969 to f56287d Compare February 18, 2023 03:23

wonjoo-wj mentioned this pull request Feb 23, 2023

[Functionalization] Move nan_to_num_ and _amp_foreach_non_finite_check_and_unscale_ op tests to python pytorch/xla#4687

Merged

wonjoo-wj closed this Apr 24, 2023

github-actions Bot deleted the meta-tensor branch August 20, 2024 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add meta tensor support for _amp_foreach_non_finite_check_and_unscale_ and nan_to_num#94633

Add meta tensor support for _amp_foreach_non_finite_check_and_unscale_ and nan_to_num#94633
wonjoo-wj wants to merge 3 commits intomainfrom
meta-tensor

wonjoo-wj commented Feb 10, 2023 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 10, 2023 •

edited

Loading

Uh oh!

wonjoo-wj commented Feb 10, 2023

Uh oh!

bdhirsh commented Feb 16, 2023

Uh oh!

wonjoo-wj commented Feb 21, 2023

Uh oh!

wonjoo-wj commented Feb 22, 2023

Uh oh!

wonjoo-wj commented Feb 23, 2023

Uh oh!

wonjoo-wj commented Apr 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wonjoo-wj commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94633

❌ 13 Failures

Uh oh!

wonjoo-wj commented Feb 10, 2023

Uh oh!

bdhirsh commented Feb 16, 2023

Uh oh!

wonjoo-wj commented Feb 21, 2023

Uh oh!

wonjoo-wj commented Feb 22, 2023

Uh oh!

wonjoo-wj commented Feb 23, 2023

Uh oh!

wonjoo-wj commented Apr 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wonjoo-wj commented Feb 10, 2023 •

edited

Loading

pytorch-bot Bot commented Feb 10, 2023 •

edited

Loading