new#46
Merged
quickwritereader merged 1125 commits intoquickwritereader:masterfrom Jan 27, 2021
Merged
Conversation
Summary: Pull Request resolved: #50067 Fixes #49257 Using the `Callgrind` to test the performance. ```python import torch import timeit from torch.utils.benchmark import Timer timer = Timer("x.view({100, 5, 20});", setup="torch::Tensor x = torch::ones({10, 10, 100});", language="c++", timer=timeit.default_timer) res = timer.collect_callgrind(number=10) ``` ### Nightly ```python torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f7949138c40> x.view({100, 5, 20}); setup: torch::Tensor x = torch::ones({10, 10, 100}); All Noisy symbols removed Instructions: 42310 42310 Baseline: 0 0 10 runs per measurement, 1 thread Warning: PyTorch was not built with debug symbols. Source information may be limited. Rebuild with REL_WITH_DEB_INFO=1 for more detailed results. ``` ### Current ```python <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f78f271a580> x.view({100, 5, 20}); setup: torch::Tensor x = torch::ones({10, 10, 100}); All Noisy symbols removed Instructions: 42480 42480 Baseline: 0 0 10 runs per measurement, 1 thread Warning: PyTorch was not built with debug symbols. Source information may be limited. Rebuild with REL_WITH_DEB_INFO=1 for more detailed results. ``` ### Compare There are 170 instructions reduced ```python torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f7941b7a7c0> 970 ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, std::function<at::Tensor (at::Tensor const&)>, torch::autograd::CreationMeta, bool) 240 ???:torch::autograd::ViewInfo::~ViewInfo() 180 ???:torch::autograd::ViewInfo::ViewInfo(at::Tensor, std::function<at::Tensor (at::Tensor const&)>) 130 ???:torch::autograd::make_variable_differentiable_view(at::Tensor const&, c10::optional<torch::autograd::ViewInfo>, c10::optional<torch::autograd::ViewInfo>, torch::autograd::CreationMeta, bool) 105 /tmp/benchmark_utils_jit_build_69e2f1710544485588feeca0719a3a57/timer_cpp_4435526292782672407/timer_src.cpp:main 100 ???:std::function<at::Tensor (at::Tensor const&)>::function(std::function<at::Tensor (at::Tensor const&)> const&) 70 ???:torch::autograd::DifferentiableViewMeta::~DifferentiableViewMeta() 70 ???:torch::autograd::DifferentiableViewMeta::DifferentiableViewMeta(c10::TensorImpl*, c10::optional<torch::autograd::ViewInfo>, c10::optional<torch::autograd::ViewInfo>, torch::autograd::CreationMeta) -100 ???:c10::optional_base<torch::autograd::ViewInfo>::optional_base(c10::optional_base<torch::autograd::ViewInfo>&&) -105 /tmp/benchmark_utils_jit_build_2e75f38b553e42eba00523a86ad9aa05/timer_cpp_3360771523810516633/timer_src.cpp:main -120 ???:torch::autograd::ViewInfo::ViewInfo(at::Tensor, c10::optional<std::function<at::Tensor (at::Tensor const&)> >) -210 ???:c10::optional_base<std::function<at::Tensor (at::Tensor const&)> >::~optional_base() -240 ???:c10::optional_base<torch::autograd::ViewInfo>::~optional_base() -920 ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, c10::optional<std::function<at::Tensor (at::Tensor const&)> >, torch::autograd::CreationMeta, bool) ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D25900495 Pulled By: ejguan fbshipit-source-id: dedd30e69db6b48601a18ae98d6b28faeae30d90
Summary: Pull Request resolved: #50505 Even with +u set for the the conda install it still seems to fail out with an unbound variable error. Let's try and give it a default value instead. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25913692 Pulled By: seemethere fbshipit-source-id: 4b898f56bff25c7523f10b4933ea6cd17a57df80
Summary: Pull Request resolved: #46414 For loops are often written with mismatched data types which causes silent type and sign coercion in the absence of integer conversion warnings. Getting around this in templated code requires convoluted patterns such as ``` for(auto i=decltype(var){0};i<var;i++) ``` with this diff we can instead write ``` for(const auto i = c10::irange(var)) ``` Note that this loop is type-safe and const-safe. The function introduced here (`c10::irange`) allows for type-safety and const-ness within for loops, which prevents the accidental truncation or modification of integers and other types, improving code safety. Test Plan: ``` buck test //caffe2/c10:c10_test_0 ``` Reviewed By: ngimel Differential Revision: D24334732 fbshipit-source-id: fec5ebda3643ec5589f7ea3a8e7bbea4432ed771
…e is integral (#47647) Summary: torch.logspace doesn't seem to have explained how integers are handled. Add some clarification and some test when dtype is integral. The CUDA implementation is also updated to be consistent with CPU implementation. Pull Request resolved: #47647 Reviewed By: gchanan Differential Revision: D25843351 Pulled By: walterddr fbshipit-source-id: 45237574d04c56992c18766667ff1ed71be77ac3
Summary: Pull Request resolved: #50498 This change is mostly needed for the next diff in this stack, where rref._get_type() is called in the rpc_async/rpc_sync RRef proxy function and can block indefinitely if there is no timeout. It will also be useful to have a timeout argument when we publicize this API to keep it consistent with other RPC APIs. ghstack-source-id: 119859767 Test Plan: Added UT Reviewed By: pritamdamania87 Differential Revision: D25897588 fbshipit-source-id: 2e84aaf7e4faecf80005c78ee2ac8710f387503e
Summary: Pull Request resolved: #50499 Adds a timeout API to the following functions: ``` rref.rpc_sync() rref.rpc_async() rref.remote() ``` so that RPCs initiated by these proxy calls can be appropriately timed out similar to the regular RPC APIs. Timeouts are supported in the following use cases: 1. rpc.remote finishes in time and successfully, but function run by rref.rpc_async() is slow and times out. Timeout error will be raised 2. rref.rpc_async() function is fast, but rpc.remote() is slow/hanging. Then when rref.rpc_async() is called, it will still timeout with the passed in timeout (and won't block for the rpc.remote() to succeed, which is what happens currently). Although, the timeout will occur during the future creation itself (and not the wait) since it calls `rref._get_type` which blocks. We can consider making this nonblocking by modifying rref._get_type to return a future, although that is likely a larger change. Test Plan: Added UT Reviewed By: wanchaol Differential Revision: D25897495 fbshipit-source-id: f9ad5b8f75121f50537677056a5ab16cf262847e
Summary: Building on top of the work of anjali411 (#46640) Things added in this PR: 1. Modify backward and double-backward formulas 2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1) 3. Modify some existing tests to support complex Pull Request resolved: #49912 Reviewed By: zhangguanheng66 Differential Revision: D25853036 Pulled By: soulitzer fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad
Summary: This change improves perf by 3-4% on fastrnns. Pull Request resolved: #50392 Reviewed By: izdeby Differential Revision: D25891392 Pulled By: Krovatkin fbshipit-source-id: 44d9b6907d3975742c9d77102fe6a85aab2c08c0
Summary: Pull Request resolved: #50546 And fix the ROCm build ghstack-source-id: 119837166 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D25912464 fbshipit-source-id: 023e1f6c9fc131815c5a7a31f4860dfe271f7ae1
Summary: Fix build with llvm-trunk. With D25877605 (cb37709), we need to explicitly include `llvm/Support/Host.h` in `llvm_jit.cpp`. Test Plan: `buck build mode/opt-clang -j 56 sigrid/predictor/v2:sigrid_remote_predictor -c cxx.extra_cxxflags="-Wforce-no-error" -c cxx.modules=False -c cxx.use_default_autofdo_profile=False` Reviewed By: bertmaher Differential Revision: D25920968 fbshipit-source-id: 4b80d5072907f50d01e8fbef41cda8a89dd66a96
Summary: - Do not generate inline comments on PRs - Increase number of signals to wait until generating a comment to 5 (2 for codecov configs, 2 for onnx and 1 for windows_test1) Pull Request resolved: #50601 Reviewed By: albanD Differential Revision: D25928920 Pulled By: malfet fbshipit-source-id: 8a4ff70024c948cb65a4bdf31d269080d2cff945
Summary: Pull Request resolved: #50184 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25819832 Pulled By: jamesr66a fbshipit-source-id: ab16138ee26ef2f92f3478c56f0db1873fcc5dd0
…e behavior of logspace when dtype is integral Test Plan: revert-hammer Differential Revision: D25843351 (0ae0fac) Original commit changeset: 45237574d04c fbshipit-source-id: fb5343d509b277158b14d1b61e10433793889842
Summary: Fixes [https://github.com/pytorch/pytorch/issues/38681](https://github.com/pytorch/pytorch/issues/38681) for the CPU. Pull Request resolved: #50052 Reviewed By: mrshenli Differential Revision: D25900823 Pulled By: glaringlee fbshipit-source-id: 1a3fa336037d0aa2344d79f46dcacfd478a353d1
Summary: Pull Request resolved: #50646 Master build broke (see https://app.circleci.com/pipelines/github/pytorch/pytorch/260715/workflows/948c9235-8844-4747-b40d-c14ed33f8dbb/jobs/10195595) ghstack-source-id: 119906225 (Note: this ignores all push blocking failures!) Test Plan: CI? Reviewed By: malfet Differential Revision: D25935300 fbshipit-source-id: 549eba1af24305728a5a0a84cb84142ec4807d95
Summary: Pull Request resolved: #50648 Reviewed By: malfet Differential Revision: D25935513 Pulled By: walterddr fbshipit-source-id: 1a8419b4fdb25368975ac8e72181c2c4b6295278
Summary: Fixes `docstring of torch.distributed.rpc.RRef.remote:14: WARNING: Field list ends without a blank line; unexpected unindent.` by indenting multiline fieldlist Pull Request resolved: #50651 Reviewed By: SplitInfinity Differential Revision: D25935839 Pulled By: malfet fbshipit-source-id: e2613ae75334d01ab57f4b071cb0fddf80c6bd78
Summary: Adds the rest of the ops. Pull Request resolved: #50643 Reviewed By: pbelevich Differential Revision: D25936346 Pulled By: Chillee fbshipit-source-id: 4e2a7afbeabde51991c39d187a8c35e766950ffe
Summary: Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: #50629 Reviewed By: albanD Differential Revision: D25935005 Pulled By: rohan-varma fbshipit-source-id: e0969afecac2f319833189a7a8897d78068a2cda
Summary: Fixes #42588 The contiguity check used to be for memory format suggested by `grad_output->suggest_memory_format()`, but an invariant guaranteed by derivatives.yaml is `input->suggest_memory_format()` Pull Request resolved: #50659 Reviewed By: mruberry Differential Revision: D25938921 Pulled By: ngimel fbshipit-source-id: a945bfef6ce3d91b17e7ff96babe89ffd508939a
…st_recurrent (#50668) Summary: Pull Request resolved: #50668 GPU initialization sometimes is slow Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --exact 'caffe2/caffe2/python:hypothesis_test - test_recurrent (caffe2.caffe2.python.hypothesis_test.TestOperators)' --run-disabled Reviewed By: hl475 Differential Revision: D25939037 fbshipit-source-id: 832700cf42ece848cda66dd629a06ecda207f086
…ispatch for CPU min/max pointwise ops (#50465) Summary: Fixes #50064 **PROBLEM DESCRIPTION:** 1. Had not removed dtype checks for complex types in the previous PR (#50347) for this issue. These type-checks were added in #36377, but are no longer necessary, as we now rely upon dispatch macros to produce error messages. 2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either. 3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions. ### **FIX DESCRIPTION:** **FIX SUMMARY:** 1. Removed dtype checks added in #36377, and added 3 more in TensorCompare.cpp. 2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`. 3. Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp. 4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative. **REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS**: As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types. For example, 1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/678fe9f0771a5cd98ead214363d70480ba03000d)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types. 2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/678fe9f0771a5cd98ead214363d70480ba03000d)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either. 3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/678fe9f0771a5cd98ead214363d70480ba03000d)`, which doesn't have a case for complex types. Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types. **REASON FOR ADDING 3 dtype CHECKS:** There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions: 1. https://github.com/pytorch/pytorch/blob/52dcc7299925de055d330781d2fe0dad71182829/aten/src/ATen/native/TensorCompare.cpp#L342 2. https://github.com/pytorch/pytorch/blob/52dcc7299925de055d330781d2fe0dad71182829/aten/src/ATen/native/TensorCompare.cpp#L422 3. https://github.com/pytorch/pytorch/blob/52dcc7299925de055d330781d2fe0dad71182829/aten/src/ATen/native/TensorCompare.cpp#L389 The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`: ``` import unittest import torch class MyTestCase(unittest.TestCase): def test_1(self): t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128) with self.assertRaises(Exception): torch.max(t, dim=0) if __name__ == '__main__': unittest.main() ``` Pull Request resolved: #50465 Reviewed By: mruberry Differential Revision: D25938106 Pulled By: ngimel fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46
…50632) Summary: Pull Request resolved: #50632 I'll port the following method tests in follow-up PRs: `'baddbmm', 'addbmm', 'addmv', 'addr'` After the tests are ported to OpInfo based tests, it would also be much easier to add tests with complex alpha and beta values. Edit- it seems like it's hard to port the broadcasting variant tests because one ends up skipping `test_inplace_grad` and `test_variant_consistency_eager` even for the case when inputs are not required to be broadcasted. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25947471 Pulled By: anjali411 fbshipit-source-id: 9faa7f1fd55a1269bad282adac2b39d19bfa4591
Summary: - Related with #44937 - Use `resize_output` instead of `resize_as` - Tuning the `native_functions.yaml`, move the inplace variant `pow_` next to the other `pow` entries Pull Request resolved: #46830 Reviewed By: mrshenli Differential Revision: D24567702 Pulled By: anjali411 fbshipit-source-id: a352422c9d4e356574dbfdf21fb57f7ca7c6075d
Summary: Pull Request resolved: #50744 This PR adds a `check_batched_grad=True` option to CriterionTest and turns it on by default for all CriterionTest-generated tests Test Plan: - run tests Reviewed By: ejguan Differential Revision: D25997676 Pulled By: zou3519 fbshipit-source-id: cc730731e6fae2bddc01bc93800fd0e3de28b32d
Summary: Closes #40702, Fixes #40690 Currently wip. But I would appreciate some feedback. Functions should be double-differentiable. Contrary to https://github.com/pytorch/pytorch/blob/b35cdc5200af963e410c0a25400fd07f30b89bca/torch/nn/parallel/_functions.py This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct? Thanks! Pull Request resolved: #40762 Reviewed By: glaringlee Differential Revision: D24758889 Pulled By: mrshenli fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce
Summary:
Removed skipCUDAIfRocm to re-enable tests for
ROCM platform.
Initially, Only 4799 cases were being run.
Out of those, 882 cases were being skipped.
After removing skipCUDAIfRocm from two places
in test_ops.py, now more than 8000 cases are
being executed, out of which only 282 cases
are bing skipped, which are FFT related tests.
Signed-off-by: Arindam Roy <rarindam@gmail.com>
Fixes #{issue number}
Pull Request resolved: #50500
Reviewed By: albanD
Differential Revision: D25920303
Pulled By: mrshenli
fbshipit-source-id: b2d17b7e2d1de4f9fdd6f1660fb4cad5841edaa0
Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: pytorch/tensorpipe@f463e0e Pull Request resolved: #50946 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26018916 fbshipit-source-id: dc8aaa98d4e002e972d5c6783f2351c29f7db239
Summary:
This fixes the following flaky test on machine with gpus of different arch:
```
_________________________________________________________________________________________________________________ TestCppExtensionJIT.test_jit_cuda_archflags __________________________________________________________________________________________________________________
self = <test_cpp_extensions_jit.TestCppExtensionJIT testMethod=test_jit_cuda_archflags>
unittest.skipIf(not TEST_CUDA, "CUDA not found")
unittest.skipIf(TEST_ROCM, "disabled on rocm")
def test_jit_cuda_archflags(self):
# Test a number of combinations:
# - the default for the machine we're testing on
# - Separators, can be ';' (most common) or ' '
# - Architecture names
# - With/without '+PTX'
capability = torch.cuda.get_device_capability()
# expected values is length-2 tuple: (list of ELF, list of PTX)
# note: there should not be more than one PTX value
archflags = {
'': (['{}{}'.format(capability[0], capability[1])], None),
"Maxwell+Tegra;6.1": (['53', '61'], None),
"Pascal 3.5": (['35', '60', '61'], None),
"Volta": (['70'], ['70']),
}
if int(torch.version.cuda.split('.')[0]) >= 10:
# CUDA 9 only supports compute capability <= 7.2
archflags["7.5+PTX"] = (['75'], ['75'])
archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60'])
for flags, expected in archflags.items():
> self._run_jit_cuda_archflags(flags, expected)
test_cpp_extensions_jit.py:198:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_cpp_extensions_jit.py:158: in _run_jit_cuda_archflags
_check_cuobjdump_output(expected[0])
test_cpp_extensions_jit.py:134: in _check_cuobjdump_output
self.assertEqual(actual_arches, expected_arches,
../../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1211: in assertEqual
super().assertEqual(len(x), len(y), msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E AssertionError: 2 != 1 : Attempted to compare the lengths of [iterable] types: Expected: 2; Actual: 1.
E Flags: , Actual: ['sm_75', 'sm_86'], Expected: ['sm_86']
E Stderr:
E Output: ELF file 1: cudaext_archflags.1.sm_75.cubin
E ELF file 2: cudaext_archflags.2.sm_86.cubin
```
Pull Request resolved: #50405
Reviewed By: albanD
Differential Revision: D25920200
Pulled By: mrshenli
fbshipit-source-id: 1042a984142108f954a283407334d39e3ec328ce
Summary: `ResolutionCallback` returns `py::object` (i.e. `Any`) rather than `py::function` (i.e. `Callable`) Discovered while debugging test failures after updating pybind11 This also makes resolution code slightly faster, as it eliminates casts from object to function and back for every `py::object obj = rcb_(name);` statement. Pull Request resolved: #51089 Reviewed By: jamesr66a Differential Revision: D26069295 Pulled By: malfet fbshipit-source-id: 6876caf9b4653c8dc8e568aefb6778895decea05
) Summary: Closes #50513 by resolving all four checkboxes. If this PR is merged, I will also modify one or both of the following wiki pages to add instructions on how to use this `mypy` wrapper for VS Code editor integration: - [Guide for adding type annotations to PyTorch](https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch) - [Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type) Pull Request resolved: #50826 Test Plan: Unit tests for globbing function: ``` python test/test_testing.py TestMypyWrapper -v ``` Manual checks: - Uninstall `mypy` and run `python test/test_type_hints.py` to verify that it still works when `mypy` is absent. - Reinstall `mypy` and run `python test/test_type_hints.py` to verify that this didn't break the `TestTypeHints` suite. - Run `python test/test_type_hints.py` again (should finish quickly) to verify that this didn't break `mypy` caching. - Run `torch/testing/_internal/mypy_wrapper.py` on a few Python files in this repo to verify that it doesn't give any additional warnings when the `TestTypeHints` suite passes. Some examples (compare with the behavior of just running `mypy` on these files): ```sh torch/testing/_internal/mypy_wrapper.py $PWD/README.md torch/testing/_internal/mypy_wrapper.py $PWD/tools/fast_nvcc/fast_nvcc.py torch/testing/_internal/mypy_wrapper.py $PWD/test/test_type_hints.py torch/testing/_internal/mypy_wrapper.py $PWD/torch/random.py torch/testing/_internal/mypy_wrapper.py $PWD/torch/testing/_internal/mypy_wrapper.py ``` - Remove type hints from `torch.testing._internal.mypy_wrapper` and verify that running `mypy_wrapper.py` on that file gives type errors. - Remove the path to `mypy_wrapper.py` from the `files` setting in `mypy-strict.ini` and verify that running it again on itself no longer gives type errors. - Add `test/test_type_hints.py` to the `files` setting in `mypy-strict.ini` and verify that running the `mypy` wrapper on it again now gives type errors. - Change a return type in `torch/random.py` and verify that running the `mypy` wrapper on it again now gives type errors. - Add the suggested JSON from the docstring of `torch.testing._internal.mypy_wrapper.main` to your `.vscode/settings.json` and verify that VS Code gives the same results (inline, while editing any Python file in the repo) as running the `mypy` wrapper on the command line, in all the above cases. Reviewed By: walterddr Differential Revision: D26049052 Pulled By: samestep fbshipit-source-id: 0b35162fc78976452b5ea20d4ab63937b3c7695d
Summary: Pull Request resolved: #50630 Add a warning log to distributed optimizer, to warn user the optimizer is created without TorchScript support. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932777 Pulled By: wanchaol fbshipit-source-id: 8db3b98bdd27fc04c5a3b8d910b028c0c37f138d
Summary:
Fixes #{issue number}
Pull Request resolved: #50442
Reviewed By: bdhirsh
Differential Revision: D26044981
Pulled By: mruberry
fbshipit-source-id: 65c42f2c1de8d24e4852a1b5bd8f4b1735b2230e
Summary: Pull Request resolved: #50976 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26032531 fbshipit-source-id: 9725bab8f70ac79652e7bf9f94376917438d60e0
Test Plan: revert-hammer Differential Revision: D26018916 (5f297cc) Original commit changeset: dc8aaa98d4e0 fbshipit-source-id: cd81a7950c7141e0711faabf03292098a8cf14d3
Test Plan: buck test //caffe2/test:test_fx_experimental buck test //glow/fb/fx_nnpi_importer:test_importer Reviewed By: jfix71 Differential Revision: D25675618 fbshipit-source-id: 55636bb2d3d6102b400f2044118a450906954083
Summary: In Python-3.9 and above `inspect.getsource` of a local class does not work if it was marked as default, see https://bugs.python.org/issue42666 #49617 Workaround by defining `make_global` function that programmatically accomplishes the same Partially addresses issue raised in #49617 Pull Request resolved: #51088 Reviewed By: gmagogsfm Differential Revision: D26069189 Pulled By: malfet fbshipit-source-id: 7cf14b88ae5d2b95d2b0fd852717a9202b86356e
Summary: Pull Request resolved: #51113 toTensor() on an lvalue IValue returns a reference; no need to copy. ghstack-source-id: 120317233 Test Plan: fitsships Compared `perf stat` results before/after (was on top of a diff stack so don't take baseline as where master is) Before: ``` 74,178.77 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 17,125 context-switches # 0.231 K/sec ( +- 3.41% ) 3 cpu-migrations # 0.000 K/sec 109,535 page-faults # 0.001 M/sec ( +- 1.04% ) 146,803,364,372 cycles # 1.979 GHz ( +- 0.30% ) (50.03%) 277,726,600,254 instructions # 1.89 insn per cycle ( +- 0.02% ) (50.03%) 43,299,659,815 branches # 583.720 M/sec ( +- 0.03% ) (50.03%) 130,504,094 branch-misses # 0.30% of all branches ( +- 1.14% ) (50.03%) ``` After: ``` 72,695.01 msec task-clock # 0.999 CPUs utilized ( +- 1.18% ) 15,994 context-switches # 0.220 K/sec ( +- 5.21% ) 3 cpu-migrations # 0.000 K/sec 107,743 page-faults # 0.001 M/sec ( +- 1.55% ) 145,647,684,269 cycles # 2.004 GHz ( +- 0.30% ) (50.05%) 277,341,084,993 instructions # 1.90 insn per cycle ( +- 0.02% ) (50.04%) 43,200,717,263 branches # 594.273 M/sec ( +- 0.02% ) (50.05%) 143,873,086 branch-misses # 0.33% of all branches ( +- 0.59% ) (50.05%) ``` Looks like an 0.7% cycles win (barely outside the noise) and an 0.1% instructions win. Reviewed By: hlu1 Differential Revision: D26051766 fbshipit-source-id: 05f8d71d8120d79f7cd80aca747dfc537bf7d382
Summary: Pull Request resolved: #51047 If the environment variable `TORCH_VITAL` is set to a non-zero length string, the vitals a dumped at program end. The API is very similar to google's logging Test Plan: buck test //caffe2/aten:vitals Reviewed By: bitfort Differential Revision: D25791248 fbshipit-source-id: 0b40da7d22c31d2c4b2094f0dcb1229a35338ac2
Summary: Update pybind repo to include `gil_scoped_acquire::disarm()` methods In python_engine allocate scoped_acquire as unique_ptr and leak it if engine is finalizing for Python-3.9+ Fixes #50014 and #50893 Pull Request resolved: #50998 Reviewed By: ezyang Differential Revision: D26038314 Pulled By: malfet fbshipit-source-id: 035411e22825e8fdcf1348fed36da0bc33e16f60
Summary: Adding a set of benchmarks for key operators Test Plan: buck build mode/opt -c 'fbcode.caffe2_gpu_type=none' caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 numactl -C 3 ./buck-out/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench Reviewed By: ZolotukhinM Differential Revision: D25981260 fbshipit-source-id: 17681fc1527f43ccf9bcc80704415653a627b396
Summary: Pull Request resolved: #51093 Operator level benchmarks comparing eager-mode PyTorch to NNC-generated fused kernels. We wouldn't normally see these in isolation, but it points out where NNC is falling short (or doing well). I threw in a composed hardswish for fun, because it's my favorite activation function. Notably, it exposes a bug in our build process that's preventing vectorization from using `sleef`, so we're using scalar calls to libm with predictably lousy performance. Fix incoming. This benchmark is similar to the pure NNC approach in `microbenchmarks.py`, but will include the overhead of dispatching the fused kernel through TorchScript. ghstack-source-id: 120403675 Test Plan: ``` op eager nnc speedup hardswish 0.187 0.051 3.70 hardswish 0.052 0.052 1.00 sigmoid 0.148 1.177 0.13 reciprocal 0.049 0.050 0.98 neg 0.038 0.037 1.02 relu 0.037 0.036 1.03 isnan 0.119 0.020 5.86 log 0.082 1.330 0.06 log10 0.148 1.848 0.08 log1p 0.204 1.413 0.14 log2 0.285 1.167 0.24 exp 0.063 1.123 0.06 expm1 0.402 1.417 0.28 erf 0.167 0.852 0.20 erfc 0.181 1.098 0.16 cos 0.124 0.793 0.16 sin 0.126 0.838 0.15 tan 0.285 1.777 0.16 acos 0.144 1.358 0.11 asin 0.126 1.193 0.11 cosh 0.384 1.761 0.22 sinh 0.390 2.279 0.17 atan 0.240 1.564 0.15 tanh 0.320 2.259 0.14 sqrt 0.043 0.069 0.63 rsqrt 0.118 0.117 1.01 abs 0.038 0.037 1.03 ceil 0.038 0.038 1.01 floor 0.039 0.039 1.00 round 0.039 0.292 0.13 trunc 0.040 0.036 1.12 lgamma 2.045 2.721 0.75 ``` Reviewed By: zheng-xq Differential Revision: D26069791 fbshipit-source-id: 236e7287ba1b3f67fdcb938949a92bbbdfa13dba
) Summary: Fixes #50695. Rather than maintain a LICENSE_BUNDLED.txt by hand, this build it out of the subrepos. I ~copied and adapted the sdist handling from Numpy~ added a separate file, so the LICENSE.txt file of the repo remains in pristine condition and the GitHub website still recognizes it. If we modify the file, the website will no longer recognize the license. This is not enough, since the license in the ~wheel~ wheel and sdist is not modified. Numpy has a [separate step](https://github.com/MacPython/numpy-wheels/blob/master/patch_code.sh) when preparing the wheel to concatenate the licenses. I am not sure where/if the [conda-forge numpy-feedstock](https://github.com/conda-forge/numpy-feedstock/) also fixes up the license. ~Should~ I ~commit~ commited the artifact to the repo and ~add~ added a test that reproducing the file is consistent. Edit: now the file is part of the repo. Edit: rework the mention of sdist. After this is merged another PR is needed to make the sdist and wheel ship the proper merged license. Pull Request resolved: #50745 Reviewed By: seemethere, heitorschueroff Differential Revision: D26074974 Pulled By: walterddr fbshipit-source-id: bacd5d6870e9dbb419a31a3e3d2fdde286ff2c94
Test Plan: revert-hammer Differential Revision: D25675618 (c8a24eb) Original commit changeset: 55636bb2d3d6 fbshipit-source-id: 7b196f7c32830061eca9c89bbcb346cdd66a211e
Summary: Fixes #3307 Previously, `self.grad` was not ~cloned~ deepcopied to the returned tensor in `deepcopy`. Added a test and an implementation. Pull Request resolved: #50663 Reviewed By: heitorschueroff Differential Revision: D26074811 Pulled By: albanD fbshipit-source-id: 536dad36415f1d03714b4ce57453f406ad802b8c
Summary: In order to enable FC int8 quantization in P2C2, we are trying to run the caffe2 op Int8FCPackWeight in the model transformation pipeline. The net is being generated from the python side, and passed back into C++ and run here: https://fburl.com/diffusion/3zt1mp03, with these dependencies included: https://fburl.com/diffusion/rdjtdtcf However, when the net is executed, it errors out with: ``` Cannot create operator of type 'Int8FCPackWeight' on the device 'CPU' ``` This diff attempts to fix this issue. Test Plan: To reproduce, just this test without ``` buck test //aiplatform/modelstore/transformation/tests:pyper_to_caffe2_dispatcher_test ``` Reviewed By: jspark1105 Differential Revision: D25965167 fbshipit-source-id: a7414669abb8731177c14e8792de58f400970732
Summary: as in title resolves D25791248 (069602e) Test Plan: buck test //caffe2/aten:vitals Reviewed By: EscapeZero, malfet Differential Revision: D26090442 fbshipit-source-id: 07937f246ec0a6eb338d21208ada61758237ae42
Summary: Fixes #50378. Additionally, this has some minor fixes: - [x] Fix mean for half-cauchy to return `inf` instead of `nan`. - [x] Fix constraints/support for the relaxed categorical distribution. Pull Request resolved: #51053 Reviewed By: heitorschueroff Differential Revision: D26077966 Pulled By: neerajprad fbshipit-source-id: ca0213baa9bbdbc661aebbb901ab5e7fded38a5f
Summary: Pull Request resolved: #50884 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D26086963 fbshipit-source-id: f103f7f529d63d701c4f17862e30eafbab7d0c68
Summary: On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue. To fix this issue: - Most linear algebra methods, except for matmuls, should add `NoTF32Guard` - Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda. Pull Request resolved: #50453 Reviewed By: glaringlee Differential Revision: D26023005 Pulled By: ngimel fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5
Summary:
Fixes #{issue number}
This is not really a new issue, just a proposed minor fix to a recent previous issue (now closed) #50640 which was a fix for #50439.
That fix added inlining for vec_signed (and others) but in one case the return was accidentally omitted. This results in a build error:
``` from �[01m�[K../aten/src/ATen/cpu/vec256/vec256.h:19�[m�[K,
from �[01m�[Katen/src/ATen/native/cpu/FillKernel.cpp.VSX.cpp:3�[m�[K:
�[01m�[K../aten/src/ATen/cpu/vec256/vsx/vsx_helpers.h:�[m�[K In function ‘�[01m�[Kvint32 vec_signed(const vfloat32&)�[m�[K’:
�[01m�[K../aten/src/ATen/cpu/vec256/vsx/vsx_helpers.h:33:1:�[m�[K �[01;31m�[Kerror: �[m�[Kno return statement in function returning non-void [�[01;31m�[K-Werror=return-type�[m�[K]
```
I've confirmed that the error disappears after this one-line fix. (Note: There is another issue encountered later in the build unrelated to this particular fix, as I noted in a separate comment in the original issue. I'm trying to make some sense of that one, but in any event it would be a subject for another issue/PR).
Pull Request resolved: #51116
Reviewed By: heitorschueroff
Differential Revision: D26078213
Pulled By: malfet
fbshipit-source-id: 59b2ee19138fa1b8d8ec1d35ca4a5ef0a67bc123
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #{issue number}