update#39563
Merged
zasdfgbnm merged 63 commits intoci-all/AT_DISPATCH_COMPLEX_TYPES-retryfrom Jun 5, 2020
Merged
Conversation
…39216) Summary: Pull Request resolved: #39216 The `rpc.functions.async_execution` decorator specifies that the wrapped function is guaranteed to return a `torch.futures.Future`. The decorator adds a `_wrapped_async_rpc_function` attribute to the wrapper function. The caller retrieves this information and then sets `isAsyncFunction` argument accordingly which is later added to PythonCall RPC message as a field. On the callee side, if the PythonCall carries an asynchronous function, it will cast the function's return value to a jit::PythonFutureWrapper object, and then install response creation and communication as a callback on the that jit::PythonFutureWrapper. For applications, this feature is useful when a function needs to wait for IO or additional singaling. In those cases, marking the user function as `rpc.functions.async_execution` will prevent it from blocking one thread on callee for too long. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D21779962 fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941
Summary: Fixes #32866, resubmit of #38970 The memory error in the issue is caused by int overflowing in col2vol. This version using mixed 32-bit and 64-bit indexing calculation lifts the maximum indexing possible without compromising the performance of ConvTranspose3d. vs 20-30% regression with pure 64-bit indexing. This requires that input.numel() <= UINT_MAX, and channels * kernel.numel() <= UINT_MAX otherwise it raises an error. Previously, the code would crash or give incorrect results unless input.numel() * kernel.numel() <= INT_MAX. Note that the test is a minimised reproducer for the issue. Pull Request resolved: #39198 Differential Revision: D21817836 Pulled By: ezyang fbshipit-source-id: b9adfe9f9dd00f04435be132966b33ac6b9efbef
Summary: Pull Request resolved: #39406 For now, just the RPC test (no dist autograd or dist optimizer). I removed the skipping decorator from all the tests except those that explicitly use the ProcessGroup options. Includes #39027. ghstack-source-id: 105159974 Test Plan: Ran the tests several hundred times, in various build modes. Saw some flakes, but at a rate of about 0.1% Differential Revision: D21716069 fbshipit-source-id: 9d2a99e112049a63745772c18e7a58266ed8e74e
Summary: ezyang, I have added the changes to DispatchKey, DeviceType, Backend to support the out-of-tree FPGA. cc. tataetae Pull Request resolved: #38938 Differential Revision: D21748955 Pulled By: ezyang fbshipit-source-id: fe76d9730818205961430d2a0e00727b5c547b32
Summary: Fixes gh-38966 If `THCTensor_(resizeAs)` fails to allocate, then these `free`s will never be reached. So, instead I use a wrapped tensor to do cleanup automatically. Pull Request resolved: #39347 Differential Revision: D21838933 Pulled By: ezyang fbshipit-source-id: 8c74ecdd720d6712a33ddef6126ea545761a269b
Summary: Pull Request resolved: #39440 After the RPC tests, re-enable the second test suite: dist autograd. ghstack-source-id: 105165393 Test Plan: Ran the tests, several times each, in different build configs. Differential Revision: D21858974 fbshipit-source-id: 409377d564c36fecae51b9e4c776d94187b434a2
Summary: Pull Request resolved: #39441 This is the last test suite to be enabled for TensorPipe. ghstack-source-id: 105166757 Test Plan: Ran the tests, hundreds of times each, in different build modes. Differential Revision: D21858975 fbshipit-source-id: ee0a7e64b77b4b1974f031207031cc14afb3a8c2
Summary: Pull Request resolved: #39337 In #39031 we made fake quantize respect device affinity of the original module. However, that PR only handled modules with parameters or buffers, and did not work properly for `ReLU`. Fixing the logic to also work for `ReLU` by passing the parent's device when adding observers. Test Plan: ``` python test/test_quantization.py TestDistributed.test_device_affinity ``` Imported from OSS Differential Revision: D21821243 fbshipit-source-id: cc6abda3694b80ce8ba0440dc6c1b5b58f3c0066
Summary: Fix #38410  Pull Request resolved: #38947 Differential Revision: D21765290 Pulled By: ezyang fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291
Summary: Pull Request resolved: #39378 Will initially only contain a label to trigger builds for binary tests Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D21864091 Pulled By: seemethere fbshipit-source-id: f69467ccc797b6b320dc8b7f2d50a8601c172a1f
Summary: Pull Request resolved: #39424 Reviewed By: Krovatkin Differential Revision: D21854870 Pulled By: ailzhang fbshipit-source-id: eb68f1775596e4c963169033444d6d6f4f818d4f
Summary: This patch removes call to run optimizations within freezing API. Only dead code elimination is invoked to clean up the frozen module. Pull Request resolved: #38499 Reviewed By: eellison Differential Revision: D21579607 Pulled By: bzinodev fbshipit-source-id: a6231754fea89296a3dcf07b5e37a1c43cb8d5dd
Summary: Pull Request resolved: #39321 Reviewed By: zhangguanheng66 Differential Revision: D21825488 Pulled By: Nayef211 fbshipit-source-id: 41ee09e683c4ae838cfd488a342088d591e806e4
Summary: The two bugs were: * Non-reduction axes were not added when inserting the new ReduceOp, meaning if a reduction with non-reduce axes was rfactored we'd produce bad outputs. There were no tests of Rfactor with non-reduce axis so I modified a test to do this. * The new statements were always prepended to the block, meaning writes to a buffer could be reordered after the usage of that buffer. This mostly happened in the case where we rfactor a previously rfactored reduction. There was a test of this, but since it only tested rfactoring the outer reduction axis there was never any other statements at the insertion point (the tests of the insertion point argument also do this). I added a new test which covers various rfactor-axis cases. Also cleaned up tests, removed some helper code we don't need etc. Pull Request resolved: #39268 Differential Revision: D21864489 Pulled By: nickgg fbshipit-source-id: d314d20997a8472ec96b72f7a9068d6da6d2399c
…zing API Test Plan: revert-hammer Differential Revision: D21579607 Original commit changeset: a6231754fea8 fbshipit-source-id: 277011605eedee1c3b44fbaf877233b239adf56b
Summary: Mainly, fix a bug in the HashProvider where it would not include LoopOptions in the hash, meaning two loops would be seen as identical even if they were bound to different thread/block axes. Also added symbolic names for the different axis options. Pull Request resolved: #39408 Differential Revision: D21864494 Pulled By: nickgg fbshipit-source-id: 9c28729984e7a3375e026c78294c9f75b9015123
Summary: Cut from #38994. This is a helper function for comparing torch and NumPy behavior. It updates the existing and increasingly popular _np_compare function and moves it to be a method on TestCase. Pull Request resolved: #39179 Differential Revision: D21855082 Pulled By: mruberry fbshipit-source-id: edca3b78ae392d32243b02bf61960898b6ba590f
Summary: Re-enable some test cases in `test_memory_format_operators` since their corresponding issue has been fixed. Pull Request resolved: #38648 Differential Revision: D21689085 Pulled By: VitalyFedyunin fbshipit-source-id: 0aa09e0bf31ba98c8ad0191ac3afd31dda0f1d42
Summary: `HTTPError` are raised when server is overloaded, while `URLError` is raised when network is not available And since `HTTPError` is an extension of `URLError`, `URLError` should catch both exceptions Pull Request resolved: #39477 Differential Revision: D21873560 Pulled By: malfet fbshipit-source-id: 11806671b768705465f562087521ad4887fd20f7
Summary: LayerNorm Fake FP16 Op debug. still seeing output mismatches. Pull Request resolved: #39476 Differential Revision: D21871748 Pulled By: hyuen fbshipit-source-id: ab308e3acff9ce21de41b0f006cbee767983f8e4
Summary: Pull Request resolved: #39452 Selective build works on training. * VariableType_?.cpp are now selectively generated based on the operator list. * Add a flag in pt_operator_library, "train". If it's True, an extra flag of "pt_train_operator_library" will be added to the labels. A query for "pt_train_operator_library" will be done to aggregate the training operators. With this flag we limit the generated VariableType to used training operators only, to conserve the code size. The models for inference only have train = False by default. * For testing purpose, caffe2/fb/pytorch_trainer is created. It's based on full jit but the operators are selectively built. * smartkeyboard_debug_model is used for test. Since the static code analysis is not applied for VariableType yet, the operators are manually added based on debugging error messages. * At build stage, make selective build optional for training code-gen library. The reason is that to make fb4a built, the generated VariableType.cpp needs to depend on torch_mobile_train. Torch_mobile_train is not needed for apps with inference only. In those cases training can be turned off to remove the dependency on torch_mobile_train to save size. It can also be used as a switch to check size regression introduced by training. ghstack-source-id: 105190037 (Note: this ignores all push blocking failures!) Test Plan: Training: ``` buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/pytorch_trainer:trainer ~/models/papaya/keyboard/smartkeyboard_debug_model.pt ``` Inference, with and without the new query-based feature: ``` buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` ``` buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` Reviewed By: ljk53 Differential Revision: D21459302 fbshipit-source-id: df71a46d74f8c7448cbf51990804104f1384594f
Summary: Pull Request resolved: #39065 Test Plan: Imported from OSS Differential Revision: D21803939 Pulled By: anjali411 fbshipit-source-id: c7313c527eb6b54d49ef46aa0a839a3418fa8d7e
Summary: Pull Request resolved: #39181 Create a python binding classes torch._C. LiteScriptModule for mobile::module, a python class called LiteScriptModule is created which wrap torch._C. LiteScriptModule. Python class LiteScriptModule contains preliminary functions including forward, run_method and __call__. Create a python api "load_for_lite_interpreter" under torch.jit.mobile where takes pre-saved mobile module in a file-like object as input and returns python class LiteScriptModule. Add a python binding method "_save_to_buffer_for_mobile" under ScriptModule, and python method "_save_to_buffer_for_lite_interpreter" under RecursiveScriptModule which saves mobile module into buffer instead of file. ghstack-source-id: 105215736 Test Plan: buck test caffe2/test:mobile Differential Revision: D21757474 fbshipit-source-id: 758b87497d65c4686459a567d41887c7a577aa4c
…39393) Summary: Pull Request resolved: #39393 Computing r_correction should be done only for radam . Otherwise can generate floating-point exceptions. Test Plan: buck test caffe2/caffe2/python/operator_test:adam_test -- test_sparse_adam with --caffe2_operator_throw_if_fp_exceptions=1 gflags option Differential Revision: D21834296 fbshipit-source-id: a9e6a93451423e76a99f6591d21cb65d4374b008
Summary: Pull Request resolved: #38173 - Introduce torch.types.Device representing all "device-like" types - Stubbed torch.device.__reduce__ - Stubbed all torch._C functions comprehensively - Deleted _safe_call which is unused throughout the codebase Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21497399 Pulled By: ezyang fbshipit-source-id: 1f534442b0ec9a70d556545d072f2c06a08b9d15
Summary: Pull Request resolved: #39456 Move aten::to.prim_dtype from full jit to lite interpreter Test Plan: verify TTS model can be used Reviewed By: iseeyuan Differential Revision: D21856104 fbshipit-source-id: 774981a5c04798e3a87cf7d6e6682f35e604944e
Summary: Enable Dropout and SoftmaxCrossEntropy tests in test_operators.py Pull Request resolved: #39431 Reviewed By: hl475 Differential Revision: D21877501 Pulled By: houseroad fbshipit-source-id: 1e9b1e5cf80dc1843bdbde2662f3339e357c6654
Summary: Pull Request resolved: #39331 Fixes gh-37590 Adds an extra `make coverage` to document building, which uses the built-in facility in sphinx to check docstring coverage. Also fixes a failure to import `torch/jit/supported_ops.py` which broke the [Torchscript Builtins](https://pytorch.org/docs/stable/jit_builtin_functions.html) page. This also adds the required `SPHINXOPTS` to turn warnings into error, but this is commented out. Note that since documentation of `torchvision` is merged in here, failures there would cause failures here if this is made active. Some thought might be needed about pinning the torchvision version merged into documentation. The first commit should fail, since the "ScriptModule" class is commented out. I did that in order to check that a CI failure is properly reported. Pull Request resolved: #38244 Differential Revision: D21640589 Pulled By: ezyang fbshipit-source-id: 1e240d81669b5f21404d596de4a27d192dc9fd8a
Summary: This PR aims to add `arcosh`, `arcsinh` and `arctanh` support. Please see issue #38349 for more details. **TODOs:** * [x] Add test cases for `arcosh`, `arcsinh` and `arctanh`. (need help) * [x] Overload ops if `std::op` does not work with `thrust::complex` types (like for `sinh`, `cosh`). Note: `std::acosh, std::asinh, std::atanh` do not support `thrust::complex` types. Added support for complex types for these 3 ops (`arccosh, arcsinh, arctanh`) cc: mruberry Pull Request resolved: #38388 Differential Revision: D21882055 Pulled By: mruberry fbshipit-source-id: d334590b47c5a89e491a002c3e41e6ffa89000e3
…es (#37531) Summary: Pull Request resolved: #37531 All of these definitions are no longer "legacy" as their CPU implementations have been ported to ATen. There are probably some layers of indirection that could be reduced here, but for now just do a minor but unlikely to break things cleanup. The last thing in LegacyNNDefinitions truly is still in THCUNN and can't be removed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21310913 Pulled By: ezyang fbshipit-source-id: 1ff4ff16abddf13f8d583df990242ac4b0461915
Summary: max_pool2d with ceil_mode calculates output size a little differently than what we get with xnnpack max_pool2d. Thus when ceil_mode=True, we disable this path. However if we get the same output size with ceil_mode and without ceil_mode, we should use xnnpack based max_pool2d. Pull Request resolved: #39447 Test Plan: CI Differential Revision: D21873334 Pulled By: kimishpatel fbshipit-source-id: b84abed1505e36e492cc87e7d40664ac63964909
Summary: Pull Request resolved: #39379 Moves binary builds into their own workflow and adds the ability to target specification on them. This allows you to run the binary build workflow on a pull request without the need to modify any configuration at all. Some notes about this implementation: * Upload jobs are still restricted to only the nightly branches and RC tags * Parameters for circleci are currently defined in .circleci/verbatim-sources/header-section.yml * Target specification configuration is currently located at .github/pytorch-circleci-labels.yml Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D21886341 Pulled By: seemethere fbshipit-source-id: 146ef5df2fea208d33e97862d52c170bf001bc98
Summary: Previously, on conversion from python -> c++ it was casted to double list through bad copy pasta. It's pretty unusual for someone to script a broadcasting list function directly since it's an internal api, so it was unlikely to affect anyone. Fix for #39450 Pull Request resolved: #39481 Reviewed By: jamesr66a Differential Revision: D21870557 Pulled By: eellison fbshipit-source-id: e704e5e87d2702a270b7d65c4df444246a134480
Summary: Pull Request resolved: #39483 I fixed all of the new errors that occurred because of the upgrade. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21884575 Pulled By: ezyang fbshipit-source-id: 45c8e1f1ecb410c8d7c46dd3922ad70e982a0685
Summary: Fix type casting for reduce ops in ONNX exporter. PyTorch promotes dtypes bool and all integer types to long for these ops. This fix only covers traced modules where dtype is present Pull Request resolved: #38829 Reviewed By: hl475 Differential Revision: D21833533 Pulled By: houseroad fbshipit-source-id: 00d9ff692cc0b09d6ca169f6c63913f04b56f182
Summary: Pull Request resolved: #39500 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21875091 Pulled By: jamesr66a fbshipit-source-id: 105875dd220a91bc4fcb8fcfb77fab8b626eb6cb
Summary: Add missing typing imports to some jit tests Add typing annotations to `torch.testing._compare_scalars_internal` and `torch.testing._internal.assertTrue` Pull Request resolved: #39075 Differential Revision: D21882468 Pulled By: malfet fbshipit-source-id: dd9858eb8e11a38411544cc64daf36fced807d76
Summary: It mainly reduces the time spent on allocating new TensorType object for Tensor, but comparing them directly. benchmark result before and after this PR: https://gist.github.com/ailzhang/db44d0a1911cae62e0bb794bff33f40a Pull Request resolved: #39098 Differential Revision: D21786678 Pulled By: ailzhang fbshipit-source-id: 2f61f0ac1dc8c529c45bef4e149be431ff1608b0
Summary: s/raise unittest.skip/raise unittest.SkipTest/ As `unittest.skip` is a decorator while `unittest.SkipTest` is an exception Pull Request resolved: #39532 Differential Revision: D21889152 Pulled By: malfet fbshipit-source-id: 27a03dbf065a1e2712a63c6c27e156bd13edbbdf
Summary: Misc updates to the fake FP16 tests. 1. seeding numpy with a random seed 2. test base class changed from unittest.TestCase=>serial.SerializedTestCase 3. Removed the hypothesis_test_util import Reviewer: Hector Yuen Pull Request resolved: #39405 Test Plan: Fake FP16 test Differential Revision: D21890212 Pulled By: hyuen fbshipit-source-id: 25e7e17f118655f32cdd06ea9db3cdac5277e649
Summary: Instead of copying to a buffer, then setting a tensor's storage with that buffer, create a storage directly from the file Pull Request resolved: #36362 Pulled By: driazati Differential Revision: D21889537 fbshipit-source-id: edbd430073c2bbf52332fe7b3b2590e7d936dedf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.