Skip to content

new#46

Merged
quickwritereader merged 1125 commits intoquickwritereader:masterfrom
pytorch:master
Jan 27, 2021
Merged

new#46
quickwritereader merged 1125 commits intoquickwritereader:masterfrom
pytorch:master

Conversation

@quickwritereader
Copy link
Copy Markdown
Owner

Fixes #{issue number}

ejguan and others added 30 commits January 15, 2021 08:29
Summary:
Pull Request resolved: #50067

Fixes #49257

Using the `Callgrind` to test the performance.
```python
import torch
import timeit
from torch.utils.benchmark import Timer

timer = Timer("x.view({100, 5, 20});", setup="torch::Tensor x = torch::ones({10, 10, 100});", language="c++", timer=timeit.default_timer)
res = timer.collect_callgrind(number=10)
```
### Nightly
```python
torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f7949138c40>
x.view({100, 5, 20});
setup: torch::Tensor x = torch::ones({10, 10, 100});
                           All          Noisy symbols removed
    Instructions:        42310                      42310
    Baseline:                0                          0
10 runs per measurement, 1 thread
Warning: PyTorch was not built with debug symbols.
         Source information may be limited. Rebuild with
         REL_WITH_DEB_INFO=1 for more detailed results.
```
### Current
```python
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f78f271a580>
x.view({100, 5, 20});
setup: torch::Tensor x = torch::ones({10, 10, 100});
                           All          Noisy symbols removed
    Instructions:        42480                      42480
    Baseline:                0                          0
10 runs per measurement, 1 thread
Warning: PyTorch was not built with debug symbols.
         Source information may be limited. Rebuild with
         REL_WITH_DEB_INFO=1 for more detailed results.
```
### Compare
There are 170 instructions reduced
```python
torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f7941b7a7c0>
    970  ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, std::function<at::Tensor (at::Tensor const&)>, torch::autograd::CreationMeta, bool)
    240  ???:torch::autograd::ViewInfo::~ViewInfo()
    180  ???:torch::autograd::ViewInfo::ViewInfo(at::Tensor, std::function<at::Tensor (at::Tensor const&)>)
    130  ???:torch::autograd::make_variable_differentiable_view(at::Tensor const&, c10::optional<torch::autograd::ViewInfo>, c10::optional<torch::autograd::ViewInfo>, torch::autograd::CreationMeta, bool)
    105  /tmp/benchmark_utils_jit_build_69e2f1710544485588feeca0719a3a57/timer_cpp_4435526292782672407/timer_src.cpp:main
    100  ???:std::function<at::Tensor (at::Tensor const&)>::function(std::function<at::Tensor (at::Tensor const&)> const&)
     70  ???:torch::autograd::DifferentiableViewMeta::~DifferentiableViewMeta()
     70  ???:torch::autograd::DifferentiableViewMeta::DifferentiableViewMeta(c10::TensorImpl*, c10::optional<torch::autograd::ViewInfo>, c10::optional<torch::autograd::ViewInfo>, torch::autograd::CreationMeta)
   -100  ???:c10::optional_base<torch::autograd::ViewInfo>::optional_base(c10::optional_base<torch::autograd::ViewInfo>&&)
   -105  /tmp/benchmark_utils_jit_build_2e75f38b553e42eba00523a86ad9aa05/timer_cpp_3360771523810516633/timer_src.cpp:main
   -120  ???:torch::autograd::ViewInfo::ViewInfo(at::Tensor, c10::optional<std::function<at::Tensor (at::Tensor const&)> >)
   -210  ???:c10::optional_base<std::function<at::Tensor (at::Tensor const&)> >::~optional_base()
   -240  ???:c10::optional_base<torch::autograd::ViewInfo>::~optional_base()
   -920  ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, c10::optional<std::function<at::Tensor (at::Tensor const&)> >, torch::autograd::CreationMeta, bool)
```

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D25900495

Pulled By: ejguan

fbshipit-source-id: dedd30e69db6b48601a18ae98d6b28faeae30d90
Summary:
closes gh-49563

Pull Request resolved: #49564

Reviewed By: albanD

Differential Revision: D25917441

Pulled By: walterddr

fbshipit-source-id: 491dc06cfc1bbf694dfd9ccefca4f55488a931b2
Summary:
Pull Request resolved: #50505

Even with +u set for the the conda install it still seems to fail out
with an unbound variable error. Let's try and give it a default value
instead.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D25913692

Pulled By: seemethere

fbshipit-source-id: 4b898f56bff25c7523f10b4933ea6cd17a57df80
Summary:
Pull Request resolved: #46414

For loops are often written with mismatched data types which causes silent type and sign coercion in the absence of integer conversion warnings. Getting around this in templated code requires convoluted patterns such as
```
for(auto i=decltype(var){0};i<var;i++)
```
with this diff we can instead write
```
for(const auto i = c10::irange(var))
```
Note that this loop is type-safe and const-safe.

The function introduced here (`c10::irange`) allows for type-safety and const-ness within for loops, which prevents the accidental truncation or modification of integers and other types, improving code safety.

Test Plan:
```
buck test //caffe2/c10:c10_test_0
```

Reviewed By: ngimel

Differential Revision: D24334732

fbshipit-source-id: fec5ebda3643ec5589f7ea3a8e7bbea4432ed771
…e is integral (#47647)

Summary:
torch.logspace doesn't seem to have explained how integers are handled.
Add some clarification and some test when dtype is integral.

The CUDA implementation is also updated to be consistent with CPU implementation.

Pull Request resolved: #47647

Reviewed By: gchanan

Differential Revision: D25843351

Pulled By: walterddr

fbshipit-source-id: 45237574d04c56992c18766667ff1ed71be77ac3
….h (#50314)

Summary:
Pull Request resolved: #50314

It's unused.
ghstack-source-id: 119798800

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25858900

fbshipit-source-id: 16107acb3df0de18ed16d92f1e2c1b0a72e3e43d
#50315)

Summary:
Pull Request resolved: #50315

It's unused.
ghstack-source-id: 119798801

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25858937

fbshipit-source-id: fe4fdb33c1a443fdd17644c3f7f34c897abf383f
…er.h (#50316)

Summary:
Pull Request resolved: #50316

It's unused.
ghstack-source-id: 119798799

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D25858961

fbshipit-source-id: 0f214f93dcdf99d0c22e6d8032ed7a10604c714a
Summary:
Pull Request resolved: #50498

This change is mostly needed for the next diff in this stack, where
rref._get_type() is called in the rpc_async/rpc_sync RRef proxy function and
can block indefinitely if there is no timeout. It will also be useful to have a
timeout argument when we publicize this API to keep it consistent with other
RPC APIs.
ghstack-source-id: 119859767

Test Plan: Added UT

Reviewed By: pritamdamania87

Differential Revision: D25897588

fbshipit-source-id: 2e84aaf7e4faecf80005c78ee2ac8710f387503e
Summary:
Pull Request resolved: #50499

Adds a timeout API to the following functions:
```
rref.rpc_sync()
rref.rpc_async()
rref.remote()
```
so that RPCs initiated by these proxy calls can be appropriately timed out similar to the regular RPC APIs. Timeouts are supported in the following use cases:

1. rpc.remote finishes in time and successfully, but function run by rref.rpc_async() is slow and times out. Timeout error will be raised
2. rref.rpc_async() function is fast, but rpc.remote() is slow/hanging. Then when rref.rpc_async() is called, it will still timeout with the passed in timeout (and won't block for the rpc.remote() to succeed, which is what happens currently). Although, the timeout will occur during the future creation itself (and not the wait) since it calls `rref._get_type` which blocks. We can consider making this nonblocking by modifying rref._get_type to return a future, although that is likely a larger change.

Test Plan: Added UT

Reviewed By: wanchaol

Differential Revision: D25897495

fbshipit-source-id: f9ad5b8f75121f50537677056a5ab16cf262847e
Summary:
Building on top of the work of anjali411 (#46640)

Things added in this PR:
1. Modify backward and double-backward formulas
2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1)
3. Modify some existing tests to support complex

Pull Request resolved: #49912

Reviewed By: zhangguanheng66

Differential Revision: D25853036

Pulled By: soulitzer

fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad
Summary:
This change improves perf by 3-4% on fastrnns.

Pull Request resolved: #50392

Reviewed By: izdeby

Differential Revision: D25891392

Pulled By: Krovatkin

fbshipit-source-id: 44d9b6907d3975742c9d77102fe6a85aab2c08c0
Summary:
Pull Request resolved: #50546

And fix the ROCm build
ghstack-source-id: 119837166

Test Plan: CI

Reviewed By: ZolotukhinM

Differential Revision: D25912464

fbshipit-source-id: 023e1f6c9fc131815c5a7a31f4860dfe271f7ae1
Summary: Fix build with llvm-trunk. With D25877605 (cb37709), we need to explicitly include `llvm/Support/Host.h` in `llvm_jit.cpp`.

Test Plan: `buck build mode/opt-clang -j 56 sigrid/predictor/v2:sigrid_remote_predictor -c cxx.extra_cxxflags="-Wforce-no-error" -c cxx.modules=False -c cxx.use_default_autofdo_profile=False`

Reviewed By: bertmaher

Differential Revision: D25920968

fbshipit-source-id: 4b80d5072907f50d01e8fbef41cda8a89dd66a96
Summary:
- Do not generate inline comments on PRs
- Increase number of signals to wait until generating a comment to 5 (2 for codecov configs, 2 for onnx and 1 for windows_test1)

Pull Request resolved: #50601

Reviewed By: albanD

Differential Revision: D25928920

Pulled By: malfet

fbshipit-source-id: 8a4ff70024c948cb65a4bdf31d269080d2cff945
Summary: Pull Request resolved: #50184

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25819832

Pulled By: jamesr66a

fbshipit-source-id: ab16138ee26ef2f92f3478c56f0db1873fcc5dd0
…e behavior of logspace when dtype is integral

Test Plan: revert-hammer

Differential Revision:
D25843351 (0ae0fac)

Original commit changeset: 45237574d04c

fbshipit-source-id: fb5343d509b277158b14d1b61e10433793889842
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/38681](https://github.com/pytorch/pytorch/issues/38681) for the CPU.

Pull Request resolved: #50052

Reviewed By: mrshenli

Differential Revision: D25900823

Pulled By: glaringlee

fbshipit-source-id: 1a3fa336037d0aa2344d79f46dcacfd478a353d1
Summary:
Pull Request resolved: #50646

Master build broke (see https://app.circleci.com/pipelines/github/pytorch/pytorch/260715/workflows/948c9235-8844-4747-b40d-c14ed33f8dbb/jobs/10195595)
ghstack-source-id: 119906225

(Note: this ignores all push blocking failures!)

Test Plan: CI?

Reviewed By: malfet

Differential Revision: D25935300

fbshipit-source-id: 549eba1af24305728a5a0a84cb84142ec4807d95
Summary: Pull Request resolved: #50648

Reviewed By: malfet

Differential Revision: D25935513

Pulled By: walterddr

fbshipit-source-id: 1a8419b4fdb25368975ac8e72181c2c4b6295278
Summary:
Fixes `docstring of torch.distributed.rpc.RRef.remote:14: WARNING: Field list ends without a blank line; unexpected unindent.` by indenting multiline fieldlist

Pull Request resolved: #50651

Reviewed By: SplitInfinity

Differential Revision: D25935839

Pulled By: malfet

fbshipit-source-id: e2613ae75334d01ab57f4b071cb0fddf80c6bd78
Summary:
Adds the rest of the ops.

Pull Request resolved: #50643

Reviewed By: pbelevich

Differential Revision: D25936346

Pulled By: Chillee

fbshipit-source-id: 4e2a7afbeabde51991c39d187a8c35e766950ffe
Summary:
Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: #50629

Reviewed By: albanD

Differential Revision: D25935005

Pulled By: rohan-varma

fbshipit-source-id: e0969afecac2f319833189a7a8897d78068a2cda
Summary:
Fixes #42588
The contiguity check used to be for memory format suggested by `grad_output->suggest_memory_format()`, but an invariant guaranteed by derivatives.yaml is `input->suggest_memory_format()`

Pull Request resolved: #50659

Reviewed By: mruberry

Differential Revision: D25938921

Pulled By: ngimel

fbshipit-source-id: a945bfef6ce3d91b17e7ff96babe89ffd508939a
…st_recurrent (#50668)

Summary:
Pull Request resolved: #50668

GPU initialization sometimes is slow

Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --exact 'caffe2/caffe2/python:hypothesis_test - test_recurrent (caffe2.caffe2.python.hypothesis_test.TestOperators)' --run-disabled

Reviewed By: hl475

Differential Revision: D25939037

fbshipit-source-id: 832700cf42ece848cda66dd629a06ecda207f086
…ispatch for CPU min/max pointwise ops (#50465)

Summary:
Fixes #50064

**PROBLEM DESCRIPTION:**
1. Had not removed dtype checks for complex types in the previous PR (#50347) for this issue.
These type-checks were added in #36377, but are no longer necessary,
as we now rely upon dispatch macros to produce error messages.
2. dtype checks in `clamp_max()` and `clamp_min()` for complex inputs had not been removed either.
3. For min/max pointwise ops in TensorCompareKernel.cpp, complex dispatch had not been removed for min/max functions.

### **FIX DESCRIPTION:**
**FIX SUMMARY:**
1. Removed dtype checks added in #36377, and added 3 more in TensorCompare.cpp.
2. Removed dtype checks for complex inputs in `clamp_max()` and `clamp_min()`.
3.  Disabled complex dispatch for min/max pointwise ops in TensorCompareKernel.cpp.
4. Error messages in the exceptions raised due to min/max ops not being implemented are now checked for containing the text _not support_ (which can also be present in _not supported_), or _not implemented_, so one of them should be a part of error messages, in order for them to be informative.

**REASON FOR NOT CHANGING DISPATCH FOR CUDA AND CLAMP OPS**:

As for the CUDA min/max operations, their kernels do not seem to be compiled & dispatched for complex types anyway, so no further changes seem to be required. Basically, the dispatch macros currently being used don't have cases for complex types.

For example,

1. the reduce CUDA ops use [AT_DISPATCH_ALL_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/678fe9f0771a5cd98ead214363d70480ba03000d)](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L548-L575) in [ReduceMinMaxKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/ReduceMinMaxKernel.cu), and that macro doesn't allow complex types.

2. In [MinMaxElementwiseKernel.cu](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu), the CUDA pointwise ops use [`AT_DISPATCH_FLOATING_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/678fe9f0771a5cd98ead214363d70480ba03000d)`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Dispatch.h#L240-L263) for non-integral & non-boolean types, and this marco doesn't have a case for complex types either.

3. [clamp CUDA ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/UnaryOpsKernel.cu#L170-L211) use `AT_DISPATCH_ALL_TYPES_AND2 (https://github.com/pytorch/pytorch/commit/678fe9f0771a5cd98ead214363d70480ba03000d)`, which doesn't have a case for complex types.

Similarly, [CPU clamp min/max ops](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp#L428-L458) use the `AT_DISPATCH_ALL_TYPES_AND `dispatch macro, which doesn't have a case for complex types.

**REASON FOR ADDING 3 dtype CHECKS:**
There are a few cases in which the methods corresponding to `min_stub()` or `max_stub()` are not called, so dispatch macros don't get invoked, resulting in no exceptions being raised. Hence, `dtype` checks are necessary at 3 places to raise exceptions:

1. https://github.com/pytorch/pytorch/blob/52dcc7299925de055d330781d2fe0dad71182829/aten/src/ATen/native/TensorCompare.cpp#L342
2. https://github.com/pytorch/pytorch/blob/52dcc7299925de055d330781d2fe0dad71182829/aten/src/ATen/native/TensorCompare.cpp#L422
3. https://github.com/pytorch/pytorch/blob/52dcc7299925de055d330781d2fe0dad71182829/aten/src/ATen/native/TensorCompare.cpp#L389

The first dtype check requirement can be verified from the following example Python code based on `test_complex_unsupported()`:
```
import unittest
import torch

class MyTestCase(unittest.TestCase):

   def test_1(self):
      t = torch.tensor((1 + 1j), device='cpu', dtype=torch.complex128)
      with self.assertRaises(Exception):
         torch.max(t, dim=0)

if __name__ == '__main__':
    unittest.main()
```

Pull Request resolved: #50465

Reviewed By: mruberry

Differential Revision: D25938106

Pulled By: ngimel

fbshipit-source-id: 95e2df02ba8583fa3ce87d4a2fdcd60b912dda46
Summary:
Introduced operator variant to OpInfo

Context: Split of #49158

cc mruberry

Pull Request resolved: #50370

Reviewed By: mrshenli

Differential Revision: D25897821

Pulled By: mruberry

fbshipit-source-id: 4387ea10607dbd7209842b685f1794bcb31f434e
Summary:
Reopen PR for #46975

Pull Request resolved: #50007

Reviewed By: mruberry

Differential Revision: D25850808

Pulled By: ngimel

fbshipit-source-id: a232e02949182b7d3799448d24ad54a9e0bcf95c
…50632)

Summary:
Pull Request resolved: #50632

I'll port the following method tests in follow-up PRs:
`'baddbmm', 'addbmm', 'addmv', 'addr'`
After the tests are ported to OpInfo based tests, it would also be much easier to add tests with complex alpha and beta values.
Edit- it seems like it's hard to port the broadcasting variant tests because one ends up skipping `test_inplace_grad` and `test_variant_consistency_eager` even for the case when inputs are not required to be broadcasted.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D25947471

Pulled By: anjali411

fbshipit-source-id: 9faa7f1fd55a1269bad282adac2b39d19bfa4591
Summary:
- Related with #44937
- Use `resize_output` instead of `resize_as`
- Tuning the `native_functions.yaml`, move the inplace variant `pow_` next to the other `pow` entries

Pull Request resolved: #46830

Reviewed By: mrshenli

Differential Revision: D24567702

Pulled By: anjali411

fbshipit-source-id: a352422c9d4e356574dbfdf21fb57f7ca7c6075d
zou3519 and others added 29 commits January 26, 2021 07:37
Summary:
Pull Request resolved: #50744

This PR adds a `check_batched_grad=True` option to CriterionTest and
turns it on by default for all CriterionTest-generated tests

Test Plan: - run tests

Reviewed By: ejguan

Differential Revision: D25997676

Pulled By: zou3519

fbshipit-source-id: cc730731e6fae2bddc01bc93800fd0e3de28b32d
Summary:
Closes #40702, Fixes #40690

Currently wip. But I would appreciate some feedback. Functions should be double-differentiable.

Contrary to https://github.com/pytorch/pytorch/blob/b35cdc5200af963e410c0a25400fd07f30b89bca/torch/nn/parallel/_functions.py
This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct?

Thanks!

Pull Request resolved: #40762

Reviewed By: glaringlee

Differential Revision: D24758889

Pulled By: mrshenli

fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce
Summary:
Removed skipCUDAIfRocm to re-enable tests for
ROCM platform.

Initially, Only 4799 cases were being run.
Out of those, 882 cases were being skipped.
After removing skipCUDAIfRocm from two places
in test_ops.py, now more than 8000 cases are
being executed, out of which only 282 cases
are bing skipped, which are FFT related tests.

Signed-off-by: Arindam Roy <rarindam@gmail.com>

Fixes #{issue number}

Pull Request resolved: #50500

Reviewed By: albanD

Differential Revision: D25920303

Pulled By: mrshenli

fbshipit-source-id: b2d17b7e2d1de4f9fdd6f1660fb4cad5841edaa0
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe).

New submodule commit: pytorch/tensorpipe@f463e0e

Pull Request resolved: #50946

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: lw

Differential Revision: D26018916

fbshipit-source-id: dc8aaa98d4e002e972d5c6783f2351c29f7db239
Summary:
This fixes the following flaky test on machine with gpus of different arch:
```
_________________________________________________________________________________________________________________ TestCppExtensionJIT.test_jit_cuda_archflags __________________________________________________________________________________________________________________

self = <test_cpp_extensions_jit.TestCppExtensionJIT testMethod=test_jit_cuda_archflags>

    unittest.skipIf(not TEST_CUDA, "CUDA not found")
    unittest.skipIf(TEST_ROCM, "disabled on rocm")
    def test_jit_cuda_archflags(self):
        # Test a number of combinations:
        #   - the default for the machine we're testing on
        #   - Separators, can be ';' (most common) or ' '
        #   - Architecture names
        #   - With/without '+PTX'

        capability = torch.cuda.get_device_capability()
        # expected values is length-2 tuple: (list of ELF, list of PTX)
        # note: there should not be more than one PTX value
        archflags = {
            '': (['{}{}'.format(capability[0], capability[1])], None),
            "Maxwell+Tegra;6.1": (['53', '61'], None),
            "Pascal 3.5": (['35', '60', '61'], None),
            "Volta": (['70'], ['70']),
        }
        if int(torch.version.cuda.split('.')[0]) >= 10:
            # CUDA 9 only supports compute capability <= 7.2
            archflags["7.5+PTX"] = (['75'], ['75'])
            archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60'])

        for flags, expected in archflags.items():
>           self._run_jit_cuda_archflags(flags, expected)

test_cpp_extensions_jit.py:198:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_cpp_extensions_jit.py:158: in _run_jit_cuda_archflags
    _check_cuobjdump_output(expected[0])
test_cpp_extensions_jit.py:134: in _check_cuobjdump_output
    self.assertEqual(actual_arches, expected_arches,
../../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1211: in assertEqual
    super().assertEqual(len(x), len(y), msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E   AssertionError: 2 != 1 : Attempted to compare the lengths of [iterable] types: Expected: 2; Actual: 1.
E   Flags: ,  Actual: ['sm_75', 'sm_86'],  Expected: ['sm_86']
E   Stderr:
E   Output: ELF file    1: cudaext_archflags.1.sm_75.cubin
E   ELF file    2: cudaext_archflags.2.sm_86.cubin

```

Pull Request resolved: #50405

Reviewed By: albanD

Differential Revision: D25920200

Pulled By: mrshenli

fbshipit-source-id: 1042a984142108f954a283407334d39e3ec328ce
Summary:
`ResolutionCallback` returns `py::object` (i.e. `Any`) rather than `py::function` (i.e. `Callable`)

Discovered while debugging test failures after updating pybind11

This also makes resolution code slightly faster, as it eliminates casts from object to function and back for every `py::object obj = rcb_(name);` statement.

Pull Request resolved: #51089

Reviewed By: jamesr66a

Differential Revision: D26069295

Pulled By: malfet

fbshipit-source-id: 6876caf9b4653c8dc8e568aefb6778895decea05
)

Summary:
Closes #50513 by resolving all four checkboxes. If this PR is merged, I will also modify one or both of the following wiki pages to add instructions on how to use this `mypy` wrapper for VS Code editor integration:

- [Guide for adding type annotations to PyTorch](https://github.com/pytorch/pytorch/wiki/Guide-for-adding-type-annotations-to-PyTorch)
- [Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type)

Pull Request resolved: #50826

Test Plan:
Unit tests for globbing function:
```
python test/test_testing.py TestMypyWrapper -v
```

Manual checks:

- Uninstall `mypy` and run `python test/test_type_hints.py` to verify that it still works when `mypy` is absent.
- Reinstall `mypy` and run `python test/test_type_hints.py` to verify that this didn't break the `TestTypeHints` suite.
- Run `python test/test_type_hints.py` again (should finish quickly) to verify that this didn't break `mypy` caching.
- Run `torch/testing/_internal/mypy_wrapper.py` on a few Python files in this repo to verify that it doesn't give any additional warnings when the `TestTypeHints` suite passes. Some examples (compare with the behavior of just running `mypy` on these files):
  ```sh
  torch/testing/_internal/mypy_wrapper.py $PWD/README.md
  torch/testing/_internal/mypy_wrapper.py $PWD/tools/fast_nvcc/fast_nvcc.py
  torch/testing/_internal/mypy_wrapper.py $PWD/test/test_type_hints.py
  torch/testing/_internal/mypy_wrapper.py $PWD/torch/random.py
  torch/testing/_internal/mypy_wrapper.py $PWD/torch/testing/_internal/mypy_wrapper.py
  ```
- Remove type hints from `torch.testing._internal.mypy_wrapper` and verify that running `mypy_wrapper.py` on that file gives type errors.
- Remove the path to `mypy_wrapper.py` from the `files` setting in `mypy-strict.ini` and verify that running it again on itself no longer gives type errors.
- Add `test/test_type_hints.py` to the `files` setting in `mypy-strict.ini` and verify that running the `mypy` wrapper on it again now gives type errors.
- Change a return type in `torch/random.py` and verify that running the `mypy` wrapper on it again now gives type errors.
- Add the suggested JSON from the docstring of `torch.testing._internal.mypy_wrapper.main` to your `.vscode/settings.json` and verify that VS Code gives the same results (inline, while editing any Python file in the repo) as running the `mypy` wrapper on the command line, in all the above cases.

Reviewed By: walterddr

Differential Revision: D26049052

Pulled By: samestep

fbshipit-source-id: 0b35162fc78976452b5ea20d4ab63937b3c7695d
Summary:
Pull Request resolved: #50630

Add a warning log to distributed optimizer, to warn user the optimizer
is created without TorchScript support.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D25932777

Pulled By: wanchaol

fbshipit-source-id: 8db3b98bdd27fc04c5a3b8d910b028c0c37f138d
Summary:
Fixes #{issue number}

Pull Request resolved: #50442

Reviewed By: bdhirsh

Differential Revision: D26044981

Pulled By: mruberry

fbshipit-source-id: 65c42f2c1de8d24e4852a1b5bd8f4b1735b2230e
Summary: Pull Request resolved: #50976

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D26032531

fbshipit-source-id: 9725bab8f70ac79652e7bf9f94376917438d60e0
Test Plan: revert-hammer

Differential Revision:
D26018916 (5f297cc)

Original commit changeset: dc8aaa98d4e0

fbshipit-source-id: cd81a7950c7141e0711faabf03292098a8cf14d3
Test Plan:
buck test //caffe2/test:test_fx_experimental
buck test //glow/fb/fx_nnpi_importer:test_importer

Reviewed By: jfix71

Differential Revision: D25675618

fbshipit-source-id: 55636bb2d3d6102b400f2044118a450906954083
Summary:
In Python-3.9 and above `inspect.getsource` of a local class does not work if it was marked as default, see https://bugs.python.org/issue42666 #49617
Workaround by defining `make_global` function that programmatically accomplishes the same

Partially addresses issue raised in #49617

Pull Request resolved: #51088

Reviewed By: gmagogsfm

Differential Revision: D26069189

Pulled By: malfet

fbshipit-source-id: 7cf14b88ae5d2b95d2b0fd852717a9202b86356e
Summary:
Pull Request resolved: #51113

toTensor() on an lvalue IValue returns a reference; no need to copy.
ghstack-source-id: 120317233

Test Plan:
fitsships

Compared `perf stat` results before/after (was on top of a diff stack
so don't take baseline as where master is)

Before:
```
         74,178.77 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
            17,125      context-switches          #    0.231 K/sec                    ( +-  3.41% )
                 3      cpu-migrations            #    0.000 K/sec
           109,535      page-faults               #    0.001 M/sec                    ( +-  1.04% )
   146,803,364,372      cycles                    #    1.979 GHz                      ( +-  0.30% )  (50.03%)
   277,726,600,254      instructions              #    1.89  insn per cycle           ( +-  0.02% )  (50.03%)
    43,299,659,815      branches                  #  583.720 M/sec                    ( +-  0.03% )  (50.03%)
       130,504,094      branch-misses             #    0.30% of all branches          ( +-  1.14% )  (50.03%)
```

After:
```
         72,695.01 msec task-clock                #    0.999 CPUs utilized            ( +-  1.18% )
            15,994      context-switches          #    0.220 K/sec                    ( +-  5.21% )
                 3      cpu-migrations            #    0.000 K/sec
           107,743      page-faults               #    0.001 M/sec                    ( +-  1.55% )
   145,647,684,269      cycles                    #    2.004 GHz                      ( +-  0.30% )  (50.05%)
   277,341,084,993      instructions              #    1.90  insn per cycle           ( +-  0.02% )  (50.04%)
    43,200,717,263      branches                  #  594.273 M/sec                    ( +-  0.02% )  (50.05%)
       143,873,086      branch-misses             #    0.33% of all branches          ( +-  0.59% )  (50.05%)
```

Looks like an 0.7% cycles win (barely outside the noise) and an 0.1%
instructions win.

Reviewed By: hlu1

Differential Revision: D26051766

fbshipit-source-id: 05f8d71d8120d79f7cd80aca747dfc537bf7d382
Summary:
Pull Request resolved: #51047

If the environment variable `TORCH_VITAL` is set to a non-zero length string, the vitals a dumped at program end.

The API is very similar to google's logging

Test Plan: buck test //caffe2/aten:vitals

Reviewed By: bitfort

Differential Revision: D25791248

fbshipit-source-id: 0b40da7d22c31d2c4b2094f0dcb1229a35338ac2
Summary:
Update pybind repo to include `gil_scoped_acquire::disarm()` methods
In python_engine allocate scoped_acquire as unique_ptr and leak it if engine is finalizing for Python-3.9+

Fixes #50014 and #50893

Pull Request resolved: #50998

Reviewed By: ezyang

Differential Revision: D26038314

Pulled By: malfet

fbshipit-source-id: 035411e22825e8fdcf1348fed36da0bc33e16f60
Summary: Adding a set of benchmarks for key operators

Test Plan:
buck build mode/opt -c 'fbcode.caffe2_gpu_type=none' caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 numactl -C 3 ./buck-out/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench

Reviewed By: ZolotukhinM

Differential Revision: D25981260

fbshipit-source-id: 17681fc1527f43ccf9bcc80704415653a627b396
Summary:
Pull Request resolved: #51093

Operator level benchmarks comparing eager-mode PyTorch to
NNC-generated fused kernels.  We wouldn't normally see these in isolation, but
it points out where NNC is falling short (or doing well).

I threw in a composed hardswish for fun, because it's my favorite activation
function.

Notably, it exposes a bug in our build process that's preventing vectorization
from using `sleef`, so we're using scalar calls to libm with predictably lousy
performance.  Fix incoming.

This benchmark is similar to the pure NNC approach in `microbenchmarks.py`, but
will include the overhead of dispatching the fused kernel through TorchScript.
ghstack-source-id: 120403675

Test Plan:
```
op                        eager        nnc    speedup
hardswish                 0.187      0.051       3.70
hardswish                 0.052      0.052       1.00
sigmoid                   0.148      1.177       0.13
reciprocal                0.049      0.050       0.98
neg                       0.038      0.037       1.02
relu                      0.037      0.036       1.03
isnan                     0.119      0.020       5.86
log                       0.082      1.330       0.06
log10                     0.148      1.848       0.08
log1p                     0.204      1.413       0.14
log2                      0.285      1.167       0.24
exp                       0.063      1.123       0.06
expm1                     0.402      1.417       0.28
erf                       0.167      0.852       0.20
erfc                      0.181      1.098       0.16
cos                       0.124      0.793       0.16
sin                       0.126      0.838       0.15
tan                       0.285      1.777       0.16
acos                      0.144      1.358       0.11
asin                      0.126      1.193       0.11
cosh                      0.384      1.761       0.22
sinh                      0.390      2.279       0.17
atan                      0.240      1.564       0.15
tanh                      0.320      2.259       0.14
sqrt                      0.043      0.069       0.63
rsqrt                     0.118      0.117       1.01
abs                       0.038      0.037       1.03
ceil                      0.038      0.038       1.01
floor                     0.039      0.039       1.00
round                     0.039      0.292       0.13
trunc                     0.040      0.036       1.12
lgamma                    2.045      2.721       0.75
```

Reviewed By: zheng-xq

Differential Revision: D26069791

fbshipit-source-id: 236e7287ba1b3f67fdcb938949a92bbbdfa13dba
)

Summary:
Fixes #50695.

Rather than maintain a LICENSE_BUNDLED.txt by hand, this build it out of the subrepos.

I ~copied and adapted the sdist handling from Numpy~ added a separate file, so the LICENSE.txt file of the repo remains in pristine condition and the GitHub website still recognizes it. If we modify the file, the website will no longer recognize the license.

This is not enough, since the license in the ~wheel~ wheel and sdist is not modified. Numpy has a [separate step](https://github.com/MacPython/numpy-wheels/blob/master/patch_code.sh) when preparing the wheel to concatenate the licenses. I am not sure where/if the [conda-forge numpy-feedstock](https://github.com/conda-forge/numpy-feedstock/) also fixes up the license.

~Should~ I ~commit~ commited the artifact to the repo and ~add~ added a test that reproducing the file is consistent.

Edit: now the file is part of the repo.

Edit: rework the mention of sdist. After this is merged another PR is needed to make the sdist and wheel ship the proper merged license.

Pull Request resolved: #50745

Reviewed By: seemethere, heitorschueroff

Differential Revision: D26074974

Pulled By: walterddr

fbshipit-source-id: bacd5d6870e9dbb419a31a3e3d2fdde286ff2c94
Test Plan: revert-hammer

Differential Revision:
D25675618 (c8a24eb)

Original commit changeset: 55636bb2d3d6

fbshipit-source-id: 7b196f7c32830061eca9c89bbcb346cdd66a211e
Summary:
Introduced by D25981260 (f08464f)

Pull Request resolved: #51157

Reviewed By: bwasti

Differential Revision: D26090008

Pulled By: malfet

fbshipit-source-id: b63f1bb1683c7261902de7eaab24a05a5159ce7e
Summary:
Fixes #3307

Previously, `self.grad` was not ~cloned~ deepcopied to the returned tensor in `deepcopy`. Added a test and an implementation.

Pull Request resolved: #50663

Reviewed By: heitorschueroff

Differential Revision: D26074811

Pulled By: albanD

fbshipit-source-id: 536dad36415f1d03714b4ce57453f406ad802b8c
Summary:
In order to enable FC int8 quantization in P2C2, we are trying to run the caffe2 op Int8FCPackWeight in the model transformation pipeline.

The net is being generated from the python side, and passed back into C++ and run here: https://fburl.com/diffusion/3zt1mp03,  with these dependencies included: https://fburl.com/diffusion/rdjtdtcf

However, when the net is executed, it errors out with:
```
Cannot create operator of type 'Int8FCPackWeight' on the device 'CPU'
```

This diff attempts to fix this issue.

Test Plan:
To reproduce, just this test without
```
buck test //aiplatform/modelstore/transformation/tests:pyper_to_caffe2_dispatcher_test
```

Reviewed By: jspark1105

Differential Revision: D25965167

fbshipit-source-id: a7414669abb8731177c14e8792de58f400970732
Summary:
as in title

resolves D25791248 (069602e)

Test Plan: buck test //caffe2/aten:vitals

Reviewed By: EscapeZero, malfet

Differential Revision: D26090442

fbshipit-source-id: 07937f246ec0a6eb338d21208ada61758237ae42
Summary:
Fixes #50378.

Additionally, this has some minor fixes:
 - [x] Fix mean for half-cauchy to return `inf` instead of `nan`.
 - [x] Fix constraints/support for the relaxed categorical distribution.

Pull Request resolved: #51053

Reviewed By: heitorschueroff

Differential Revision: D26077966

Pulled By: neerajprad

fbshipit-source-id: ca0213baa9bbdbc661aebbb901ab5e7fded38a5f
Summary: Pull Request resolved: #50884

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D26086963

fbshipit-source-id: f103f7f529d63d701c4f17862e30eafbab7d0c68
Summary:
On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`:  https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue.

To fix this issue:
- Most linear algebra methods, except for matmuls, should add `NoTF32Guard`
- Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda.

Pull Request resolved: #50453

Reviewed By: glaringlee

Differential Revision: D26023005

Pulled By: ngimel

fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5
Summary:
Fixes #{issue number}
This is not really a new issue, just a proposed minor fix to a recent previous issue (now closed) #50640 which was a fix for #50439.

That fix added inlining for vec_signed (and others) but in one case the return was accidentally omitted.  This results in a build error:
```                 from �[01m�[K../aten/src/ATen/cpu/vec256/vec256.h:19�[m�[K,
                 from �[01m�[Katen/src/ATen/native/cpu/FillKernel.cpp.VSX.cpp:3�[m�[K:
�[01m�[K../aten/src/ATen/cpu/vec256/vsx/vsx_helpers.h:�[m�[K In function ‘�[01m�[Kvint32 vec_signed(const vfloat32&)�[m�[K’:
�[01m�[K../aten/src/ATen/cpu/vec256/vsx/vsx_helpers.h:33:1:�[m�[K �[01;31m�[Kerror: �[m�[Kno return statement in function returning non-void [�[01;31m�[K-Werror=return-type�[m�[K]
```

I've confirmed that the error disappears after this one-line fix.  (Note: There is another issue encountered later in the build unrelated to this particular fix, as I noted in a separate comment in the original issue.  I'm trying to make some sense of that one, but in any event it would be a subject for another issue/PR).

Pull Request resolved: #51116

Reviewed By: heitorschueroff

Differential Revision: D26078213

Pulled By: malfet

fbshipit-source-id: 59b2ee19138fa1b8d8ec1d35ca4a5ef0a67bc123
…51162)

Summary:
Pull Request resolved: #51162

It's unused.
ghstack-source-id: 120427120

Test Plan: CI

Reviewed By: bhosmer

Differential Revision: D25859010

fbshipit-source-id: 7bb21312843debaedaa6a969727c171b2bb0e6b2
@quickwritereader quickwritereader merged commit 6605f7a into quickwritereader:master Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.