Skip to content

update#39563

Merged
zasdfgbnm merged 63 commits intoci-all/AT_DISPATCH_COMPLEX_TYPES-retryfrom
debug-div
Jun 5, 2020
Merged

update#39563
zasdfgbnm merged 63 commits intoci-all/AT_DISPATCH_COMPLEX_TYPES-retryfrom
debug-div

Conversation

@zasdfgbnm
Copy link
Copy Markdown
Collaborator

No description provided.

houseroad and others added 30 commits June 2, 2020 21:08
…5376 (#39372)

Summary:
Pull Request resolved: #39372

we only bump the submodule in oss to unblock some works

Test Plan: ci

Reviewed By: hl475

Differential Revision: D21830800

fbshipit-source-id: fb4a716992efcd71926f7bba24a7c24422c17e38
…39216)

Summary:
Pull Request resolved: #39216

The `rpc.functions.async_execution` decorator specifies that the
wrapped function is guaranteed to return a `torch.futures.Future`.
The decorator adds a `_wrapped_async_rpc_function` attribute to
the wrapper function. The caller retrieves this information and
then sets `isAsyncFunction` argument accordingly which is later
added to PythonCall RPC message as a field. On the callee side,
if the PythonCall carries an asynchronous function, it will cast
the function's return value to a jit::PythonFutureWrapper object,
and then install response creation and communication as a callback
on the that jit::PythonFutureWrapper.

For applications, this feature is useful when a function needs to
wait for IO or additional singaling. In those cases, marking the
user function as `rpc.functions.async_execution` will prevent it
from blocking one thread on callee for too long.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D21779962

fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941
Summary:
Fixes #32866, resubmit of #38970

The memory error in the issue is caused by int overflowing in col2vol. This version using mixed 32-bit and 64-bit indexing calculation lifts the maximum indexing possible without compromising the performance of ConvTranspose3d. vs 20-30% regression with pure 64-bit indexing.

This requires that input.numel() <= UINT_MAX, and channels * kernel.numel() <= UINT_MAX otherwise it raises an error. Previously, the code would crash or give incorrect results unless input.numel() * kernel.numel() <= INT_MAX.

Note that the test is a minimised reproducer for the issue.
Pull Request resolved: #39198

Differential Revision: D21817836

Pulled By: ezyang

fbshipit-source-id: b9adfe9f9dd00f04435be132966b33ac6b9efbef
Summary:
Pull Request resolved: #39406

For now, just the RPC test (no dist autograd or dist optimizer).

I removed the skipping decorator from all the tests except those that explicitly use the ProcessGroup options.

Includes #39027.
ghstack-source-id: 105159974

Test Plan: Ran the tests several hundred times, in various build modes. Saw some flakes, but at a rate of about 0.1%

Differential Revision: D21716069

fbshipit-source-id: 9d2a99e112049a63745772c18e7a58266ed8e74e
Summary:
ezyang,

I have added the changes to DispatchKey, DeviceType, Backend to support the out-of-tree FPGA.

cc. tataetae
Pull Request resolved: #38938

Differential Revision: D21748955

Pulled By: ezyang

fbshipit-source-id: fe76d9730818205961430d2a0e00727b5c547b32
Summary:
Fixes gh-38966

If `THCTensor_(resizeAs)` fails to allocate, then these `free`s will never be reached. So, instead I use a wrapped tensor to do cleanup automatically.
Pull Request resolved: #39347

Differential Revision: D21838933

Pulled By: ezyang

fbshipit-source-id: 8c74ecdd720d6712a33ddef6126ea545761a269b
Summary:
Pull Request resolved: #39440

After the RPC tests, re-enable the second test suite: dist autograd.
ghstack-source-id: 105165393

Test Plan: Ran the tests, several times each, in different build configs.

Differential Revision: D21858974

fbshipit-source-id: 409377d564c36fecae51b9e4c776d94187b434a2
Summary:
Pull Request resolved: #39441

This is the last test suite to be enabled for TensorPipe.
ghstack-source-id: 105166757

Test Plan: Ran the tests, hundreds of times each, in different build modes.

Differential Revision: D21858975

fbshipit-source-id: ee0a7e64b77b4b1974f031207031cc14afb3a8c2
Summary:
Fixes #39281
Pull Request resolved: #39301

Differential Revision: D21849082

Pulled By: gchanan

fbshipit-source-id: 5d30ef10767c4d35c6cb59c5e6a9acbfe0270a40
Summary:
Pull Request resolved: #39337

In #39031 we made fake quantize respect device affinity of the
original module. However, that PR only handled modules with parameters
or buffers, and did not work properly for `ReLU`.

Fixing the logic to also work for `ReLU` by passing the parent's
device when adding observers.

Test Plan:
```
python test/test_quantization.py TestDistributed.test_device_affinity
```

Imported from OSS

Differential Revision: D21821243

fbshipit-source-id: cc6abda3694b80ce8ba0440dc6c1b5b58f3c0066
Summary:
Fix #38410

![image](https://user-images.githubusercontent.com/6421097/82724121-74b26880-9c99-11ea-9b63-e92de2dccdf2.png)
Pull Request resolved: #38947

Differential Revision: D21765290

Pulled By: ezyang

fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291
Summary:
Pull Request resolved: #39378

Will initially only contain a label to trigger builds for binary tests

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21864091

Pulled By: seemethere

fbshipit-source-id: f69467ccc797b6b320dc8b7f2d50a8601c172a1f
#38968)

Summary:
Pull Request resolved: #38968

As title

Reviewed By: glaringlee

Differential Revision: D21711684

fbshipit-source-id: c340360b29849fe9ab0e7be376918c92ba3629be
Summary: Pull Request resolved: #39424

Reviewed By: Krovatkin

Differential Revision: D21854870

Pulled By: ailzhang

fbshipit-source-id: eb68f1775596e4c963169033444d6d6f4f818d4f
Summary:
This patch removes call to run optimizations within freezing API.
Only dead code elimination is invoked to clean up the frozen module.
Pull Request resolved: #38499

Reviewed By: eellison

Differential Revision: D21579607

Pulled By: bzinodev

fbshipit-source-id: a6231754fea89296a3dcf07b5e37a1c43cb8d5dd
Summary: Pull Request resolved: #39321

Reviewed By: zhangguanheng66

Differential Revision: D21825488

Pulled By: Nayef211

fbshipit-source-id: 41ee09e683c4ae838cfd488a342088d591e806e4
Summary:
The two bugs were:
* Non-reduction axes were not added when inserting the new ReduceOp, meaning if a reduction with non-reduce axes was rfactored we'd produce bad outputs. There were no tests of Rfactor with non-reduce axis so I modified a test to do this.
* The new statements were always prepended to the block, meaning writes to a buffer could be reordered after the usage of that buffer. This mostly happened in the case where we rfactor a previously rfactored reduction. There was a test of this, but since it only tested rfactoring the outer reduction axis there was never any other statements at the insertion point (the tests of the insertion point argument also do this). I added a new test which covers various rfactor-axis cases.

Also cleaned up tests, removed some helper code we don't need etc.
Pull Request resolved: #39268

Differential Revision: D21864489

Pulled By: nickgg

fbshipit-source-id: d314d20997a8472ec96b72f7a9068d6da6d2399c
…zing API

Test Plan: revert-hammer

Differential Revision:
D21579607

Original commit changeset: a6231754fea8

fbshipit-source-id: 277011605eedee1c3b44fbaf877233b239adf56b
Summary:
Mainly, fix a bug in the HashProvider where it would not include LoopOptions in the hash, meaning two loops would be seen as identical even if they were bound to different thread/block axes. Also added symbolic names for the different axis options.
Pull Request resolved: #39408

Differential Revision: D21864494

Pulled By: nickgg

fbshipit-source-id: 9c28729984e7a3375e026c78294c9f75b9015123
Summary:
Cut from #38994.

This is a helper function for comparing torch and NumPy behavior. It updates the existing and increasingly popular _np_compare function and moves it to be a method on TestCase.
Pull Request resolved: #39179

Differential Revision: D21855082

Pulled By: mruberry

fbshipit-source-id: edca3b78ae392d32243b02bf61960898b6ba590f
Summary:
Re-enable some test cases in `test_memory_format_operators` since their corresponding issue has been fixed.
Pull Request resolved: #38648

Differential Revision: D21689085

Pulled By: VitalyFedyunin

fbshipit-source-id: 0aa09e0bf31ba98c8ad0191ac3afd31dda0f1d42
Summary:
`HTTPError` are raised when server is overloaded, while `URLError` is
raised when network is not available
And since `HTTPError` is an extension of `URLError`, `URLError` should catch both exceptions
Pull Request resolved: #39477

Differential Revision: D21873560

Pulled By: malfet

fbshipit-source-id: 11806671b768705465f562087521ad4887fd20f7
Summary:
LayerNorm Fake FP16 Op debug.
still seeing output mismatches.
Pull Request resolved: #39476

Differential Revision: D21871748

Pulled By: hyuen

fbshipit-source-id: ab308e3acff9ce21de41b0f006cbee767983f8e4
Summary:
Pull Request resolved: #39452

Selective build works on training.
* VariableType_?.cpp are now selectively generated based on the operator list.
* Add a flag in pt_operator_library, "train". If it's True, an extra flag of "pt_train_operator_library" will be added to the labels. A query for "pt_train_operator_library" will be done to aggregate the training operators. With this flag we limit the generated VariableType to used training operators only, to conserve the code size. The models for inference only have train = False by default.
* For testing purpose, caffe2/fb/pytorch_trainer is created. It's based on full jit but the operators are selectively built.
* smartkeyboard_debug_model is used for test. Since the static code analysis is not applied for VariableType yet, the operators are manually added based on debugging error messages.
* At build stage, make selective build optional for training code-gen library.
The reason is that to make fb4a built, the generated VariableType.cpp needs to depend on torch_mobile_train. Torch_mobile_train is not needed for apps with inference only. In those cases training can be turned off to remove the dependency on torch_mobile_train to save size. It can also be used as a switch to check size regression introduced by training.
ghstack-source-id: 105190037

(Note: this ignores all push blocking failures!)

Test Plan:
Training:
```
buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/pytorch_trainer:trainer ~/models/papaya/keyboard/smartkeyboard_debug_model.pt
```

Inference, with and without the new query-based feature:
```
buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
```
buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```

Reviewed By: ljk53

Differential Revision: D21459302

fbshipit-source-id: df71a46d74f8c7448cbf51990804104f1384594f
Summary: Pull Request resolved: #39065

Test Plan: Imported from OSS

Differential Revision: D21803939

Pulled By: anjali411

fbshipit-source-id: c7313c527eb6b54d49ef46aa0a839a3418fa8d7e
Summary:
Pull Request resolved: #39181

Create a python binding classes torch._C. LiteScriptModule for mobile::module, a python class called LiteScriptModule is created which wrap torch._C. LiteScriptModule.
Python class LiteScriptModule contains preliminary functions including forward, run_method and __call__.

Create a python api "load_for_lite_interpreter" under torch.jit.mobile where takes pre-saved mobile module in a file-like object as input and returns python class LiteScriptModule.

Add a python binding method "_save_to_buffer_for_mobile" under ScriptModule, and python method "_save_to_buffer_for_lite_interpreter" under RecursiveScriptModule which saves mobile module into buffer instead of file.
ghstack-source-id: 105215736

Test Plan: buck test caffe2/test:mobile

Differential Revision: D21757474

fbshipit-source-id: 758b87497d65c4686459a567d41887c7a577aa4c
…39393)

Summary:
Pull Request resolved: #39393

Computing r_correction should be done only for radam . Otherwise can generate floating-point exceptions.

Test Plan:
buck test caffe2/caffe2/python/operator_test:adam_test -- test_sparse_adam
with --caffe2_operator_throw_if_fp_exceptions=1 gflags option

Differential Revision: D21834296

fbshipit-source-id: a9e6a93451423e76a99f6591d21cb65d4374b008
Summary:
Pull Request resolved: #38173

- Introduce torch.types.Device representing all "device-like" types
- Stubbed torch.device.__reduce__
- Stubbed all torch._C functions comprehensively
- Deleted _safe_call which is unused throughout the codebase

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21497399

Pulled By: ezyang

fbshipit-source-id: 1f534442b0ec9a70d556545d072f2c06a08b9d15
Summary:
Pull Request resolved: #39456

Move aten::to.prim_dtype from full jit to lite interpreter

Test Plan: verify TTS model can be used

Reviewed By: iseeyuan

Differential Revision: D21856104

fbshipit-source-id: 774981a5c04798e3a87cf7d6e6682f35e604944e
Summary:
Enable Dropout and SoftmaxCrossEntropy tests in test_operators.py
Pull Request resolved: #39431

Reviewed By: hl475

Differential Revision: D21877501

Pulled By: houseroad

fbshipit-source-id: 1e9b1e5cf80dc1843bdbde2662f3339e357c6654
mattip and others added 19 commits June 4, 2020 10:49
Summary:
Pull Request resolved: #39331

Fixes gh-37590

Adds an extra `make coverage` to document building, which uses the built-in facility in sphinx to check docstring coverage. Also fixes a failure to import `torch/jit/supported_ops.py` which broke the [Torchscript Builtins](https://pytorch.org/docs/stable/jit_builtin_functions.html) page.

This also adds the required `SPHINXOPTS` to turn warnings into error, but this is commented out. Note that since documentation of `torchvision` is merged in here, failures there would cause failures here if this is made active. Some thought might be needed about pinning the torchvision version merged into documentation.

The first commit should fail, since the "ScriptModule" class is commented out. I did that in order to check that a CI failure is properly reported.
Pull Request resolved: #38244

Differential Revision: D21640589

Pulled By: ezyang

fbshipit-source-id: 1e240d81669b5f21404d596de4a27d192dc9fd8a
Summary:
This PR aims to add `arcosh`, `arcsinh` and `arctanh` support. Please see issue #38349 for more details.

**TODOs:**

* [x] Add test cases for `arcosh`, `arcsinh` and `arctanh`. (need help)
* [x] Overload ops if `std::op` does not work with `thrust::complex` types (like for `sinh`, `cosh`).

Note: `std::acosh, std::asinh, std::atanh` do not support `thrust::complex` types. Added support for complex types for these 3 ops (`arccosh, arcsinh, arctanh`)

cc: mruberry
Pull Request resolved: #38388

Differential Revision: D21882055

Pulled By: mruberry

fbshipit-source-id: d334590b47c5a89e491a002c3e41e6ffa89000e3
…es (#37531)

Summary:
Pull Request resolved: #37531

All of these definitions are no longer "legacy" as their CPU
implementations have been ported to ATen.  There are probably some
layers of indirection that could be reduced here, but for now just do a
minor but unlikely to break things cleanup.

The last thing in LegacyNNDefinitions truly is still in THCUNN and can't
be removed.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21310913

Pulled By: ezyang

fbshipit-source-id: 1ff4ff16abddf13f8d583df990242ac4b0461915
Summary:
max_pool2d with ceil_mode calculates output size a little differently
than what we get with xnnpack max_pool2d. Thus when ceil_mode=True, we
disable this path. However if we get the same output size with ceil_mode
and without ceil_mode, we should use xnnpack based max_pool2d.
Pull Request resolved: #39447

Test Plan: CI

Differential Revision: D21873334

Pulled By: kimishpatel

fbshipit-source-id: b84abed1505e36e492cc87e7d40664ac63964909
Summary:
Pull Request resolved: #39379

Moves binary builds into their own workflow and adds the ability to
target specification on them. This allows you to run the binary build
workflow on a pull request without the need to modify any configuration
at all.

Some notes about this implementation:
* Upload jobs are still restricted to only the nightly branches and RC tags
* Parameters for circleci are currently defined in
  .circleci/verbatim-sources/header-section.yml
* Target specification configuration is currently located at
  .github/pytorch-circleci-labels.yml

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21886341

Pulled By: seemethere

fbshipit-source-id: 146ef5df2fea208d33e97862d52c170bf001bc98
Summary:
Previously, on conversion from python -> c++ it was casted to double list through bad copy pasta. It's pretty unusual for someone to script a broadcasting list function directly since it's an internal api, so it was unlikely to affect anyone.

Fix for #39450
Pull Request resolved: #39481

Reviewed By: jamesr66a

Differential Revision: D21870557

Pulled By: eellison

fbshipit-source-id: e704e5e87d2702a270b7d65c4df444246a134480
Summary:
Resolve #33111

relax the overflow and precision lost checks when unpacking doubles.

Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com>
Pull Request resolved: #39140

Differential Revision: D21885217

Pulled By: ezyang

fbshipit-source-id: e2bbe90d719443ea2e1c6b7b2c637f9a943fa5c0
Summary:
Pull Request resolved: #39483

I fixed all of the new errors that occurred because of the upgrade.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21884575

Pulled By: ezyang

fbshipit-source-id: 45c8e1f1ecb410c8d7c46dd3922ad70e982a0685
Summary:
Fix type casting for reduce ops in ONNX exporter. PyTorch promotes dtypes bool and all integer types to long for these ops.

This fix only covers traced modules where dtype is present
Pull Request resolved: #38829

Reviewed By: hl475

Differential Revision: D21833533

Pulled By: houseroad

fbshipit-source-id: 00d9ff692cc0b09d6ca169f6c63913f04b56f182
Summary: Pull Request resolved: #39500

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D21875091

Pulled By: jamesr66a

fbshipit-source-id: 105875dd220a91bc4fcb8fcfb77fab8b626eb6cb
Summary:
Add missing typing imports to some jit tests
Add typing annotations to `torch.testing._compare_scalars_internal` and `torch.testing._internal.assertTrue`
Pull Request resolved: #39075

Differential Revision: D21882468

Pulled By: malfet

fbshipit-source-id: dd9858eb8e11a38411544cc64daf36fced807d76
Summary:
It mainly reduces the time spent on allocating new TensorType object for Tensor, but comparing them directly.
benchmark result before and after this PR: https://gist.github.com/ailzhang/db44d0a1911cae62e0bb794bff33f40a
Pull Request resolved: #39098

Differential Revision: D21786678

Pulled By: ailzhang

fbshipit-source-id: 2f61f0ac1dc8c529c45bef4e149be431ff1608b0
Summary:
s/raise unittest.skip/raise unittest.SkipTest/
As `unittest.skip` is a decorator while `unittest.SkipTest` is an exception
Pull Request resolved: #39532

Differential Revision: D21889152

Pulled By: malfet

fbshipit-source-id: 27a03dbf065a1e2712a63c6c27e156bd13edbbdf
Summary:
Misc updates to the fake FP16 tests.
1. seeding numpy with a random seed
2. test base class changed from unittest.TestCase=>serial.SerializedTestCase
3. Removed the hypothesis_test_util import
Reviewer: Hector Yuen
Pull Request resolved: #39405

Test Plan: Fake FP16 test

Differential Revision: D21890212

Pulled By: hyuen

fbshipit-source-id: 25e7e17f118655f32cdd06ea9db3cdac5277e649
Summary:
Instead of copying to a buffer, then setting a tensor's storage with that buffer, create a storage directly from the file

Pull Request resolved: #36362

Pulled By: driazati

Differential Revision: D21889537

fbshipit-source-id: edbd430073c2bbf52332fe7b3b2590e7d936dedf
…r if (#39380)

Summary:
Pull Request resolved: #39380

Test for inserting observers for if statement for ops that propagate quantization parameters

Test Plan: Imported from OSS

Differential Revision: D21832477

fbshipit-source-id: 6e0b4ce4a89f847af161bb22338525802adb8b41
@zasdfgbnm zasdfgbnm merged commit 7afdc0d into ci-all/AT_DISPATCH_COMPLEX_TYPES-retry Jun 5, 2020
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jun 5, 2020
@zasdfgbnm zasdfgbnm deleted the debug-div branch June 5, 2020 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.