integrate functionalization <> LTC torchscript backend by bdhirsh · Pull Request #75527 · pytorch/pytorch

bdhirsh · 2022-04-08T18:15:42Z

This PR integrates functionalization into LazyTensorCore. The high level is:

(1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators.

(2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have FunctionalTensorWrapper(LazyTensorImpl).

(3) A bunch of aliasing bugs are now fixed. The most significant one is that mark_step() no longer severs aliasing relationships between tensors. I included a test in the PR.

What is the interface between functionalization and LTC?

There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are:

(a) factory functions (LazyNativeFunctions::empty/empty_strided). This is the main integration point - I updated those functions to return a wrapped FunctionalTensorWrapper object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again.

(b) converting between devices. When you call ltc_tensor.to('cpu'), we need to sync any updates and "unwrap" the tensor. When you call cpu_tensor.to('lazy'), we need to wrap the tensor up.

(c) python bindings. Python bindings (like mark_step()) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically.

What's the set of changes / what order should I look at things in?

LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following:

(1) ts_native_functions.yaml

Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work.

(2) ts_native_functions.cpp

This is probably where the most important changes to LTC are. There are 4 major changes in this file:

(a) I removed the hand-written kernels for the most of the view ops.

(b) I added the wrapping/unwrapping logic for empty/ empty_strided, and to.device that I mentioned in the integration section above.

(c) I added a lowering for the at::lift operator. This is a new op that's needed for the torch.tensor() constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects.

(d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like block_diag) are CompositeExplicitAutograd, which means that they run underneath functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: at::functionalization::functionalize_aten_op. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition.

(3) lazy_ir.py

Some codegen changes. There are two main changes in the codegen:

(a) Fixed a use-after-free error with ops that take in a std::string. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like div.rounding_mode were storing the string argument as a c10::string_view, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a std::string on the node instead of a c10::string_view

(b) Now that we're codegen'ing a bunch of view_copy nodes, I didn't want to have to write shape inference rules for all of them (since they don't have at::meta:: implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (at::compositeexplicitautograd), and plumb meta tensors through. I added some codegen support for this.

(4) ts_eager_fallback.cpp

I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one).

(5) shape_inference.h/cpp

Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later.

(6) init.cpp

Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above.

(7) test_ts_opinfo.py

Some basic test cleanup. Also added a test explicitly for mark_step() preserving alias relationships.

Other functionalization changes (not specific to LTC)

This is basically the stuff in this PR inside of aten. The important changes are:

(1) detach() support for functionalization (in FunctionalTensorWrapper.h/cpp). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a FunctionalTensorWrapper object.

I ended up duplicating a bit of the detach logic from TensorImpl.h to get this to work, but I couldn't think of a better way to do it.

(2) A helper function for "functionalization" CompositeExplicitAutograd kernel: at::functionalization::functionalize_aten_op (in FunctionalTensorWrapper.h/cpp). The idea here is LTC needs to add some special handling for ops like block_diag that are CompositeExplicitAutograd, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle.

(3) some native_functions.yaml changes. This is mostly just me using the new CompositeExplicitAutogradNonFunctional to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot.

Stack from ghstack (oldest at bottom):

Differential Revision: D35705375

facebook-github-bot · 2022-04-08T18:15:49Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/75527
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours
↩️ [fb-only] Re-run with SSH instructions

❌ 6 New Failures

As of commit f207a08 (more details on the Dr. CI page):

Expand to see more

6/6 failures introduced in this PR

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

trunk / win-vs2019-cuda11.6-py3 / test (default, 2, 5, windows.8xlarge.nvidia.gpu) (1/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-22T08:37:09.3256650Z test_add_done_ca...arg() takes 0 positional arguments but 1 was given

2022-06-22T08:37:09.3221492Z 
2022-06-22T08:37:09.3221824Z For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
2022-06-22T08:37:09.3222300Z   warnings.warn(errors.NumbaWarning(msg))
2022-06-22T08:37:09.3222721Z C:\Jenkins\Miniconda3\lib\site-packages\numba\cuda\envvars.py:17: NumbaWarning: �[1m
2022-06-22T08:37:09.3223376Z Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\nvvm\libdevice.
2022-06-22T08:37:09.3223774Z 
2022-06-22T08:37:09.3224074Z For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
2022-06-22T08:37:09.3224541Z   warnings.warn(errors.NumbaWarning(msg))
2022-06-22T08:37:09.3224948Z ok (0.993s)
2022-06-22T08:37:09.3246416Z   test_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.003s)
2022-06-22T08:37:09.3256650Z   test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given
2022-06-22T08:37:09.3258144Z ok (0.001s)
2022-06-22T08:37:09.3276036Z   test_add_done_callback_simple (__main__.TestFuture) ... ok (0.001s)
2022-06-22T08:37:09.3345228Z   test_chained_then (__main__.TestFuture) ... ok (0.000s)
2022-06-22T08:37:09.4398808Z   test_collect_all (__main__.TestFuture) ... ok (0.113s)
2022-06-22T08:37:09.4414908Z   test_done (__main__.TestFuture) ... ok (0.001s)
2022-06-22T08:37:09.4437016Z   test_done_exception (__main__.TestFuture) ... ok (0.003s)
2022-06-22T08:37:09.4466297Z   test_interleaving_then_and_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.003s)
2022-06-22T08:37:09.4483369Z   test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: ValueError: Expected error
2022-06-22T08:37:09.4483788Z 
2022-06-22T08:37:09.4483875Z At:

trunk / win-vs2019-cuda11.6-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (2/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-22T07:52:08.7807375Z test_cast (__mai...Error: VariableType::ID() not implemented (0.000s)

2022-06-22T07:52:08.6990134Z   test_call_python_mod_from_tracing_fn (__main__.TestScript) ... ok (0.010s)
2022-06-22T07:52:08.7050864Z   test_call_script_fn_from_script_fn (__main__.TestScript) ... ok (0.006s)
2022-06-22T07:52:08.7145519Z   test_call_script_fn_from_script_module (__main__.TestScript) ... ok (0.009s)
2022-06-22T07:52:08.7255689Z   test_call_script_fn_from_tracing_fn (__main__.TestScript) ... ok (0.011s)
2022-06-22T07:52:08.7327179Z   test_call_script_mod_from_script_fn (__main__.TestScript) ... ok (0.007s)
2022-06-22T07:52:08.7457137Z   test_call_script_mod_from_script_module (__main__.TestScript) ... ok (0.013s)
2022-06-22T07:52:08.7469284Z   test_call_script_mod_from_tracing_fn (__main__.TestScript) ... skip: error in first class mode (0.002s)
2022-06-22T07:52:08.7602375Z   test_call_traced_fn_from_tracing_fn (__main__.TestScript) ... ok (0.013s)
2022-06-22T07:52:08.7613967Z   test_call_traced_mod_from_tracing_fn (__main__.TestScript) ... skip: error in first class mode (0.001s)
2022-06-22T07:52:08.7798732Z   test_canonicalize_control_outputs (__main__.TestScript) ... ok (0.012s)
2022-06-22T07:52:08.7807375Z   test_cast (__main__.TestScript) ... skip: RuntimeError: VariableType::ID() not implemented (0.000s)
2022-06-22T07:52:08.8006007Z   test_cat (__main__.TestScript) ... ok (0.016s)
2022-06-22T07:52:08.8095184Z   test_cat_lifts (__main__.TestScript) ... ok (0.016s)
2022-06-22T07:52:08.8152321Z   test_chr (__main__.TestScript) ... ok (0.000s)
2022-06-22T07:52:08.8167964Z   test_circular_dependency (__main__.TestScript)
2022-06-22T07:52:08.8658539Z https://github.com/pytorch/pytorch/issues/25871 ... ok (0.061s)
2022-06-22T07:52:08.8889018Z   test_class_as_attribute (__main__.TestScript) ... ok (0.009s)
2022-06-22T07:52:08.8925387Z   test_class_attribute (__main__.TestScript) ... ok (0.016s)
2022-06-22T07:52:08.8964688Z   test_class_attribute_in_script (__main__.TestScript) ... ok (0.000s)
2022-06-22T07:52:08.9036035Z   test_class_with_comment_at_lower_indentation (__main__.TestScript) ... ok (0.000s)
2022-06-22T07:52:08.9046136Z   test_code_with_constants (__main__.TestScript)

pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (3/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-22T07:15:59.2322147Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory

2022-06-22T07:15:59.0948981Z + export TEST_DIR_WIN
2022-06-22T07:15:59.0949232Z + export PYTORCH_FINAL_PACKAGE_DIR=/c/2540234189/build-results/
2022-06-22T07:15:59.0949515Z + PYTORCH_FINAL_PACKAGE_DIR=/c/2540234189/build-results/
2022-06-22T07:15:59.1019670Z ++ cygpath -w /c/2540234189/build-results/
2022-06-22T07:15:59.1133421Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\2540234189\build-results\'
2022-06-22T07:15:59.1133715Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2022-06-22T07:15:59.1134017Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/build/win_tmp/build/torch
2022-06-22T07:15:59.1421388Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts
2022-06-22T07:15:59.1421797Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts
2022-06-22T07:15:59.1633303Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts/*'
2022-06-22T07:15:59.2322147Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts/*': No such file or directory
2022-06-22T07:15:59.2325700Z + '[' -n '' ']'
2022-06-22T07:15:59.2326097Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers
2022-06-22T07:15:59.2326511Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers
2022-06-22T07:15:59.2326829Z + [[ win-vs2019-cpu-py3 == *cuda11* ]]
2022-06-22T07:15:59.2327063Z + [[ default = \f\o\r\c\e\_\o\n\_\c\p\u ]]
2022-06-22T07:15:59.2327276Z + [[ win-vs2019-cpu-py3 == *cuda* ]]
2022-06-22T07:15:59.2327964Z + run_tests
2022-06-22T07:15:59.2328304Z + for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe
2022-06-22T07:15:59.2328654Z + [[ -x /c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe ]]
2022-06-22T07:15:59.2330909Z + '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe'

trunk / win-vs2019-cuda11.6-py3 / test (default, 1, 5, windows.8xlarge.nvidia.gpu) (4/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-22T07:38:43.1847592Z ls: cannot access ...d/win_tmp/ci_scripts/*': No such file or directory

2022-06-22T07:38:43.0425020Z + export TEST_DIR_WIN
2022-06-22T07:38:43.0425629Z + export PYTORCH_FINAL_PACKAGE_DIR=/c/2540316661/build-results/
2022-06-22T07:38:43.0426224Z + PYTORCH_FINAL_PACKAGE_DIR=/c/2540316661/build-results/
2022-06-22T07:38:43.0520378Z ++ cygpath -w /c/2540316661/build-results/
2022-06-22T07:38:43.0684680Z + PYTORCH_FINAL_PACKAGE_DIR_WIN='C:\2540316661\build-results\'
2022-06-22T07:38:43.0685368Z + export PYTORCH_FINAL_PACKAGE_DIR_WIN
2022-06-22T07:38:43.0686095Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/build/win_tmp/build/torch
2022-06-22T07:38:43.1184192Z + CI_SCRIPTS_DIR=/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts
2022-06-22T07:38:43.1184994Z + mkdir -p /c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts
2022-06-22T07:38:43.1463822Z ++ ls '/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts/*'
2022-06-22T07:38:43.1847592Z ls: cannot access '/c/actions-runner/_work/pytorch/pytorch/build/win_tmp/ci_scripts/*': No such file or directory
2022-06-22T07:38:43.1852865Z + '[' -n '' ']'
2022-06-22T07:38:43.1853654Z + export SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers
2022-06-22T07:38:43.1854292Z + SCRIPT_HELPERS_DIR=/c/actions-runner/_work/pytorch/pytorch/.jenkins/pytorch/win-test-helpers
2022-06-22T07:38:43.1854750Z + [[ win-vs2019-cuda11.6-py3 == *cuda11* ]]
2022-06-22T07:38:43.1855110Z + export BUILD_SPLIT_CUDA=ON
2022-06-22T07:38:43.1855571Z + BUILD_SPLIT_CUDA=ON
2022-06-22T07:38:43.1855997Z + [[ default = \f\o\r\c\e\_\o\n\_\c\p\u ]]
2022-06-22T07:38:43.1856289Z + [[ win-vs2019-cuda11.6-py3 == *cuda* ]]
2022-06-22T07:38:43.1856591Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2022-06-22T07:38:43.1856889Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda

pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (5/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-22T08:16:35.3452607Z test_add_done_ca...arg() takes 0 positional arguments but 1 was given

2022-06-22T08:16:35.3416161Z   C:\Jenkins\Miniconda3\lib\unittest\suite.py(122): run
2022-06-22T08:16:35.3416443Z   C:\Jenkins\Miniconda3\lib\unittest\suite.py(84): __call__
2022-06-22T08:16:35.3416736Z   C:\Jenkins\Miniconda3\lib\site-packages\xmlrunner\runner.py(67): run
2022-06-22T08:16:35.3417044Z   C:\Jenkins\Miniconda3\lib\unittest\main.py(271): runTests
2022-06-22T08:16:35.3417333Z   C:\Jenkins\Miniconda3\lib\unittest\main.py(101): __init__
2022-06-22T08:16:35.3417690Z   C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py(688): run_tests
2022-06-22T08:16:35.3418009Z   test_futures.py(331): <module>
2022-06-22T08:16:35.3418133Z 
2022-06-22T08:16:35.3418200Z ok (0.564s)
2022-06-22T08:16:35.3442250Z   test_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.016s)
2022-06-22T08:16:35.3452607Z   test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given
2022-06-22T08:16:35.3453832Z ok (0.001s)
2022-06-22T08:16:35.3469913Z   test_add_done_callback_simple (__main__.TestFuture) ... ok (0.001s)
2022-06-22T08:16:35.3519085Z   test_chained_then (__main__.TestFuture) ... ok (0.005s)
2022-06-22T08:16:35.4546226Z   test_collect_all (__main__.TestFuture) ... ok (0.103s)
2022-06-22T08:16:35.4560190Z   test_done (__main__.TestFuture) ... ok (0.001s)
2022-06-22T08:16:35.4578047Z   test_done_exception (__main__.TestFuture) ... ok (0.000s)
2022-06-22T08:16:35.4600759Z   test_interleaving_then_and_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.000s)
2022-06-22T08:16:35.4614980Z   test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: ValueError: Expected error
2022-06-22T08:16:35.4615311Z 
2022-06-22T08:16:35.4615360Z At:

trunk / macos-11-py3-x86-64 / test (default, 1, 2, macos-12) (6/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-22T09:10:16.6678810Z FAIL [0.108s]: tes...ntization.core.test_quantized_op.TestQuantizedOps)

2022-06-22T09:10:16.6675130Z     assert_equal(
2022-06-22T09:10:16.6675630Z   File "/Users/runner/miniconda3/envs/build/lib/python3.8/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal
2022-06-22T09:10:16.6675970Z     raise error_metas[0].to_error(msg)
2022-06-22T09:10:16.6676330Z AssertionError: Tensor-likes are not close!
2022-06-22T09:10:16.6676820Z 
2022-06-22T09:10:16.6676930Z Mismatched elements: 58 / 1044 (5.6%)
2022-06-22T09:10:16.6677370Z Greatest absolute difference: 1.0 at index (0, 0, 0, 0) (up to 1e-05 allowed)
2022-06-22T09:10:16.6678110Z Greatest relative difference: 1.0 at index (0, 0, 0, 0) (up to 1.3e-06 allowed) : torch results are off
2022-06-22T09:10:16.6678370Z 
2022-06-22T09:10:16.6678490Z ======================================================================
2022-06-22T09:10:16.6678810Z FAIL [0.108s]: test_qrelu6 (quantization.core.test_quantized_op.TestQuantizedOps)
2022-06-22T09:10:16.6679290Z ----------------------------------------------------------------------
2022-06-22T09:10:16.6679590Z Traceback (most recent call last):
2022-06-22T09:10:16.6679950Z   File "/Users/runner/work/pytorch/pytorch/test/quantization/core/test_quantized_op.py", line 277, in test_qrelu6
2022-06-22T09:10:16.6680430Z     self._test_activation_function(X, 'relu6', relu6_test_configs)
2022-06-22T09:10:16.6680840Z   File "/Users/runner/work/pytorch/pytorch/test/quantization/core/test_quantized_op.py", line 225, in _test_activation_function
2022-06-22T09:10:16.6681350Z     self.assertEqual(qY, qY_hat, msg='{} - {} failed: ({} vs. {})'.format(
2022-06-22T09:10:16.6681940Z   File "/Users/runner/miniconda3/envs/build/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 2238, in assertEqual
2022-06-22T09:10:16.6682280Z     assert_equal(
2022-06-22T09:10:16.6682780Z   File "/Users/runner/miniconda3/envs/build/lib/python3.8/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal
2022-06-22T09:10:16.6683140Z     raise error_metas[0].to_error(msg)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

… backend" [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: cc20254

… backend" [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: 24ce9c3

… backend" [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: 27f9e6c

… backend" [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: ac79a4b

… backend" [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: b98da3c

bdhirsh · 2022-04-17T01:02:25Z

@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… backend" Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: a7529b8

… backend" Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: 65dae1d

… backend" Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: 9941444

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: d57a0d2

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: ebe5d38

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: b9f718c

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

Pull Request resolved: #75527 ghstack-source-id: c9315ab

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

bdhirsh · 2022-06-17T03:20:18Z

torchgen/gen.py

                mapMaybe(gen_composite_view_copy_kernel, view_groups)
            ),
+            "SymIntViewCopyKernel_Definitions": list(
+                mapMaybe(lambda pair: gen_symint_view_copy_kernel(pair[0], pair[1]), view_copy_with_symint_pairs)


cc @ezyang I remember hearing that long term we'd like to have view*.SymInt fully subsume the existing view/view copy ops, so we can always rip this out later.

But for now, I'm codegen'ing {view}_copy.SymInt kernel overloads to call into their {view}_copy variants, which is what the existing expand_copy.SymInt kernel does today.

bdhirsh · 2022-06-17T03:23:20Z

torchgen/api/cpp.py

+            if remove_non_owning_ref_types:
+                return NamedCType(binds, VectorCType(BaseCType(SymIntT)))
+            else:
+                return NamedCType(binds, BaseCType(symIntArrayRefT))


Hey @Krovatkin if you're interested - the changes here + in translate.py are needed to get functionalization working with sym ints :). There are still a few other things that I need to fix, but this basically tells the codegen how to:

(1) convert SymIntArrayRef -> std::vector<SymInt> (needed because functionalization stashes SymInt argument inputs into a lambda, which can outlive the original SymIntArrayRef)
(2) convert std::vector<SymInt> -> SymIntArrayRef (going the other way)
(3) convert from SymIntArrayRef -> IntArrayRef (needed for the expand_copy.SymInt -> expand_copy kernel)

bdhirsh · 2022-06-17T13:16:14Z

@pytorchbot help

pytorch-bot · 2022-06-17T13:16:15Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase')

usage: @pytorchbot [-h] {merge,revert,rebase} ...

Try @pytorchbot --help for more info.

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

bdhirsh · 2022-06-22T06:06:48Z

@pytorchbot rebase

pytorchmergebot · 2022-06-22T06:08:11Z

@pytorchbot successfully started a rebase job. Check the current status here

This PR integrates functionalization into LazyTensorCore. The high level is: (1) LTC will no longer see view/aliasing operators directly. Instead, functionalization will run "above" LTC, which will only see non-aliasing *_copy variants of each view operator. It will also remove mutations, so (for the most part) LTC will only see "functional/out-of-place" operators. (2) At the C++ level, every lazy tensor is wrapped in a layer of indirection: we now have `FunctionalTensorWrapper(LazyTensorImpl)`. (3) A bunch of aliasing bugs are now fixed. The most significant one is that `mark_step()` no longer severs aliasing relationships between tensors. I included a test in the PR. ## What is the interface between functionalization and LTC? There needs to be some code that "promotes/demotes" a tensor from a functional wrapper to its inner LTC tensor. The places where that happens are: (a) factory functions (`LazyNativeFunctions::empty/empty_strided`). This is the main integration point - I updated those functions to return a wrapped `FunctionalTensorWrapper` object, which will cause every future usage of the returned tensor to pass through functionalization for every operator (which does the unwrapping) before hitting the LTC backend again. (b) converting between devices. When you call `ltc_tensor.to('cpu')`, we need to sync any updates and "unwrap" the tensor. When you call `cpu_tensor.to('lazy')`, we need to wrap the tensor up. (c) python bindings. Python bindings (like `mark_step()`) that don't go through the dispatcher. That means that they need to do the unwrapping themselves, instead of relying on functionalization kernels to do it automatically. ## What's the set of changes / what order should I look at things in? LTC folks can focus just on the LTC-specific changes. I'd recommend looking at the following: **(1) `ts_native_functions.yaml`** Here, I basically removed a bunch of view ops, and added corresponding "view_copy" variants that automatically get codegen'd. view_copy ops are "ordinary" out-of-place ops, so the codegen for them should just work. **(2) `ts_native_functions.cpp`** This is probably where the most important changes to LTC are. There are 4 major changes in this file: (a) I removed the hand-written kernels for the most of the view ops. (b) I added the wrapping/unwrapping logic for `empty`/ `empty_strided`, and `to.device` that I mentioned in the integration section above. (c) I added a lowering for the `at::lift` operator. This is a new op that's needed for the `torch.tensor()` constructor, where we need to explicitly "lift" LTC tensors into functional tensor objects. (d) There are a total of 10 aten operators that are problematic, that I had to add a bit of extra handling for. Why? The high level idea is that a few ops (like `block_diag`) are `CompositeExplicitAutograd`, which means that they run **underneath** functionalization. These ops are "functional" (no aliasing info), but they internally call view operators. To handle these ops, I added a helper function in core that lets you "functionalize" a composite kernel: `at::functionalization::functionalize_aten_op`. The change for LTC is basically that these ops used to work "for free", whereas now you need to manually write a (one-liner) kernel for them that explicitly calls into their decomposition. **(3) `lazy_ir.py`** Some codegen changes. There are two main changes in the codegen: (a) Fixed a use-after-free error with ops that take in a `std::string`. This was UB that only surfaced for some reason when I did the integration, but the codegen'd nodes for ops like `div.rounding_mode` were storing the string argument as a `c10::string_view`, and the constructed node was outlasting the life-time of the string. I added some logic to fix that by explicitly ensuring that we store a `std::string` on the node instead of a `c10::string_view` (b) Now that we're codegen'ing a bunch of `view_copy` nodes, I didn't want to have to write shape inference rules for all of them (since they don't have `at::meta::` implementations). However, every view op + view_copy actually supports meta tensors. You just need to run the composite implementation (`at::compositeexplicitautograd`), and plumb meta tensors through. I added some codegen support for this. **(4) `ts_eager_fallback.cpp`** I had to update the eager fallback to ensure that when converting from ltc -> non-ltc device and back, it unwraps/wraps properly. Also updated the check to error if it sees any view ops (since LTC should never see view ops, so we never expect the fallback to see one). **(5) `shape_inference.h/cpp`** Added some shape formulas for a few of the new view_copy ops. I also updated the formulas for some of the existing view ops to explicitly raise an error, since they should never be called. We should just delete them, but I figured we can make this PR just a bit smaller and fully rip out the LTC view infrastructure later. **(6) `init.cpp`** Updated the python bindings to "unwrap" functional wrapper tensor inputs, as mentioned in the integration section above. **(7) `test_ts_opinfo.py`** Some basic test cleanup. Also added a test explicitly for `mark_step()` preserving alias relationships. ## Other functionalization changes (not specific to LTC) This is basically the stuff in this PR inside of `aten`. The important changes are: (1) `detach()` support for functionalization (in `FunctionalTensorWrapper.h/cpp`). This is only actually relevant to LTC/XLA though, since they are the only context under which autograd will directly be called on a `FunctionalTensorWrapper` object. I ended up duplicating a bit of the detach logic from `TensorImpl.h` to get this to work, but I couldn't think of a better way to do it. (2) A helper function for "functionalization" `CompositeExplicitAutograd` kernel: `at::functionalization::functionalize_aten_op` (in `FunctionalTensorWrapper.h/cpp`). The idea here is LTC needs to add some special handling for ops like `block_diag` that are `CompositeExplicitAutograd`, but call into view operators "underneath" the functionalization pass. I wanted to add a helper function to make this case easy to handle. (3) some `native_functions.yaml` changes. This is mostly just me using the new `CompositeExplicitAutogradNonFunctional` to pre-emptively prevent XLA/LTC from accidentally using the "problematic" decompositions. This will also make XLA failures easier to spot. Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

pytorchmergebot · 2022-06-22T06:08:26Z

Successfully rebased gh/bdhirsh/199/orig onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/75527)

github-actions · 2022-08-23T04:16:40Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

This was referenced Apr 8, 2022

code-generate non-aliasing {view}_copy kernels #73442

Closed

split out functionalization codegen to use view_copy operators #75302

Closed

facebook-github-bot added the cla signed label Apr 8, 2022

This was referenced Apr 13, 2022

teach ivalue about List[Optional[Tensor]], fix fallbacks #75716

Closed

fix unfold for meta tensors #75717

Closed

Update on "[prototype] integrate functionalization <> LTC torchscript…

904b524

… backend" [ghstack-poisoned]

This was referenced Apr 14, 2022

fix out= op handling for functionalization #75818

Closed

functionalization: avoid some unnecessary view_copy calls #75819

Closed

Update on "[prototype] integrate functionalization <> LTC torchscript…

2d92624

… backend" [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Apr 14, 2022

[prototype] integrate functionalization <> LTC torchscript backend

b0d78d7

Pull Request resolved: #75527 ghstack-source-id: cc20254

Update on "[prototype] integrate functionalization <> LTC torchscript…

5e04e2b

… backend" [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Apr 14, 2022

[prototype] integrate functionalization <> LTC torchscript backend

f27d6b4

Pull Request resolved: #75527 ghstack-source-id: 24ce9c3

Update on "[prototype] integrate functionalization <> LTC torchscript…

1b9521f

… backend" [ghstack-poisoned]

bdhirsh mentioned this pull request Apr 15, 2022

functionalization: introduce a "zero()" aten op #75913

Closed

bdhirsh added a commit that referenced this pull request Apr 15, 2022

[prototype] integrate functionalization <> LTC torchscript backend

50bf6e1

Pull Request resolved: #75527 ghstack-source-id: 27f9e6c

Update on "[prototype] integrate functionalization <> LTC torchscript…

bb6c4c7

… backend" [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Apr 15, 2022

[prototype] integrate functionalization <> LTC torchscript backend

88c3fe0

Pull Request resolved: #75527 ghstack-source-id: ac79a4b

Update on "[prototype] integrate functionalization <> LTC torchscript…

2f89b30

… backend" [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Apr 17, 2022

[prototype] integrate functionalization <> LTC torchscript backend

f894e2e

Pull Request resolved: #75527 ghstack-source-id: b98da3c

Update on "[prototype] integrate functionalization <> LTC torchscript…

adbb547

… backend" Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

This was referenced Apr 20, 2022

functionalization: add a copy() native function #76083

Closed

functionalization: add native fill() op #76084

Closed

fix static init issue with JIT container types #76085

Closed

bdhirsh added a commit that referenced this pull request Apr 20, 2022

[prototype] integrate functionalization <> LTC torchscript backend

549d61e

Pull Request resolved: #75527 ghstack-source-id: a7529b8

Update on "[prototype] integrate functionalization <> LTC torchscript…

e8cd0b5

… backend" Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Apr 20, 2022

[prototype] integrate functionalization <> LTC torchscript backend

9ddaf33

Pull Request resolved: #75527 ghstack-source-id: 65dae1d

Update on "[prototype] integrate functionalization <> LTC torchscript…

a9d9664

… backend" Differential Revision: [D35705375](https://our.internmc.facebook.com/intern/diff/D35705375) [ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Jun 13, 2022

[prototype] integrate functionalization <> LTC torchscript backend

d138cea

Pull Request resolved: #75527 ghstack-source-id: 9941444

bdhirsh added a commit that referenced this pull request Jun 14, 2022

[prototype] integrate functionalization <> LTC torchscript backend

72f6bbb

Pull Request resolved: #75527 ghstack-source-id: d57a0d2

bdhirsh added a commit that referenced this pull request Jun 15, 2022

[prototype] integrate functionalization <> LTC torchscript backend

df02347

Pull Request resolved: #75527 ghstack-source-id: ebe5d38

bdhirsh added a commit that referenced this pull request Jun 15, 2022

[prototype] integrate functionalization <> LTC torchscript backend

bb46ac9

Pull Request resolved: #75527 ghstack-source-id: b9f718c

bdhirsh added a commit that referenced this pull request Jun 16, 2022

[prototype] integrate functionalization <> LTC torchscript backend

6b4cb83

Pull Request resolved: #75527 ghstack-source-id: c9315ab

bdhirsh commented Jun 17, 2022

View reviewed changes

bdhirsh added 5 commits June 17, 2022 06:20

JackCaoG mentioned this pull request Jun 21, 2022

Unexpected behavior/IR size growing when operate on view of view pytorch/xla#3619

Open

bdhirsh added 2 commits June 20, 2022 23:14

This was referenced Jun 22, 2022

functionalization <> LTC integration #80043

Closed

functionalization <> LTC integration (take 3) #80251

Closed

Conversation

bdhirsh commented Apr 8, 2022 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the interface between functionalization and LTC?

What's the set of changes / what order should I look at things in?

Other functionalization changes (not specific to LTC)

Uh oh!

facebook-github-bot commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 6 New Failures

🕵️ 6 new failures recognized by patterns

trunk / win-vs2019-cuda11.6-py3 / test (default, 2, 5, windows.8xlarge.nvidia.gpu) (1/6)

trunk / win-vs2019-cuda11.6-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (2/6)

pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (3/6)

trunk / win-vs2019-cuda11.6-py3 / test (default, 1, 5, windows.8xlarge.nvidia.gpu) (4/6)

pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (5/6)

trunk / macos-11-py3-x86-64 / test (default, 1, 2, macos-12) (6/6)

Uh oh!

bdhirsh commented Apr 17, 2022

Uh oh!

bdhirsh Jun 17, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh Jun 17, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Jun 17, 2022

Uh oh!

pytorch-bot bot commented Jun 17, 2022

Uh oh!

bdhirsh commented Jun 22, 2022

Uh oh!

pytorchmergebot commented Jun 22, 2022

Uh oh!

pytorchmergebot commented Jun 22, 2022

Uh oh!

github-actions bot commented Aug 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

bdhirsh commented Apr 8, 2022 •

edited by pytorchmergebot

Loading

facebook-github-bot commented Apr 8, 2022 •

edited

Loading