Skip to content

Integrate from upstream#296

Merged
iotamudelta merged 75 commits intoROCm:masterfrom
iotamudelta:ifu
Oct 30, 2018
Merged

Integrate from upstream#296
iotamudelta merged 75 commits intoROCm:masterfrom
iotamudelta:ifu

Conversation

@iotamudelta
Copy link
Copy Markdown

No description provided.

goldsborough and others added 30 commits October 25, 2018 13:52
Summary:
We currently don't check names in `register_module` and `register_parameter` as thoroughly as we do in Python. This PR fixes this.

Python checks are e.g. in https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L108

ezyang ebetica apaszke
Pull Request resolved: pytorch#13016

Differential Revision: D10853800

Pulled By: goldsborough

fbshipit-source-id: 765357875e90a5046e72351a7a47a86511633ab6
Summary: Pull Request resolved: pytorch#13088

Differential Revision: D10856067

Pulled By: anderspapitto

fbshipit-source-id: cfbf0f6cad3953e1ee1c55482c00a3db9f140594
Summary:
Depends on pytorch#12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1))

* Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph
* Adds `torch._jit_internal.weak_module` tags to modules that already work
  * `Sigmoid`
  * `Tanh`
  * `Hardshrink`
  * `PReLU`
  * `Softsign`
  * `Tanhshrink`
  * `PairwiseDistance`
Pull Request resolved: pytorch#12966

Differential Revision: D10559557

Pulled By: driazati

fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0
Summary:
Pull Request resolved: pytorch#13024

There's a TensorList type in ivalue.h and one in ScalarType.h, and they are different.
This diff moves IValue types into an ivalue namespace so we can merge the namespaces without conflicts.

Reviewed By: ezyang

Differential Revision: D10518929

fbshipit-source-id: cb760b6804a399880d2bff3acf9a3422d99fc0b8
Summary:
Pull Request resolved: pytorch#12950

For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten.
When we move classes from at/caffe2 to c10, this
 1. allow keeping backwards compatibility with third paty code we can't control
 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces.

Reviewed By: ezyang

Differential Revision: D10496244

fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e
Summary:
In this pr we added operator randn in onnx symbolic. Also, related tests are added.
Pull Request resolved: pytorch#12880

Reviewed By: houseroad

Differential Revision: D10501788

Pulled By: zrphercule

fbshipit-source-id: ba8bb00ca848c4b95decabf638a1bc13fe11d03e
Summary:
clang-format-6 run on all cpp,cc,c,cu,cxx,hpp,hxx,h files under /c10d and /thd
Pull Request resolved: pytorch#13138

Differential Revision: D10857742

Pulled By: teng-li

fbshipit-source-id: f99bc62f56019c05acdfa8e8c4f0db34d23b4c52
Summary:
Pull Request resolved: pytorch#13141

This is an example diff to show what lint rules are being applied.

Reviewed By: mingzhe09088

Differential Revision: D10858478

fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
Summary:
Pull Request resolved: pytorch#12991

Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage

Reviewed By: ezyang, Yangqing

Differential Revision: D10511254

fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202
Summary:
Pull Request resolved: pytorch#13082

Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away.

Reviewed By: ezyang, Yangqing

Differential Revision: D10844117

fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
Summary:
Pull Request resolved: pytorch#13134

For tensor, we plan to do the following renaming:
```
* t.ndim() → t.dim()
* t.size() → t.numel()
* dims() → t.sizes()
* t.meta() → t.dtype()
* t.dim(d) → t.size(d)
```
This diff adds new APIs in caffe2::Tensor so we can start codemod,
we'll remove old API after the codemod

Reviewed By: ezyang

Differential Revision: D10856028

fbshipit-source-id: 1638997e234d7b3113ef8be65a16246f902273c7
Summary:
Pull Request resolved: pytorch#12949

Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp.

Reviewed By: mraway, xsh6528

Differential Revision: D10454037

fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes pytorch#9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.

Follow up:

remove python_default_init completely.
Pull Request resolved: pytorch#12582

Differential Revision: D10417423

Pulled By: wanchaol

fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
Summary:
The existing default timeout was set at 10 seconds, which is too low
for asynchronous tasks that depend on a barrier to resynchronize.
Having a single timeout for all operations is not ideal and this will
be addressed in future commits.
Pull Request resolved: pytorch#13056

Reviewed By: teng-li

Differential Revision: D10558746

Pulled By: pietern

fbshipit-source-id: d857ea55b1776fc7d0baf2efd77951b5d98beabb
Summary:
just a sanity check to make sure everything is in order
Pull Request resolved: pytorch#13037

Differential Revision: D10854563

Pulled By: michaelsuo

fbshipit-source-id: 409303c4cbf058b75e24bf2213b49e9d79cb862e
Summary:
Pull Request resolved: pytorch#13140

This is an example about the benefit of proper facebook linter. The old code
was not python 2.x (actually, pre-python 3.3) compatible. Note that FileExistsError
is added in Python 3.3:

https://stackoverflow.com/questions/20790580/python-specifically-handle-file-exists-exception

Reviewed By: mingzhe09088

Differential Revision: D10858804

fbshipit-source-id: a4c995aef9f720cb8b0ce463f0a51db667fc42f2
Summary:
cc Yangqing mingzhe09088 anderspapitto mingzhe09088
Pull Request resolved: pytorch#12990

Differential Revision: D10862301

Pulled By: orionr

fbshipit-source-id: 62ba09cf0725f29692fac71bc30173469283390b
Summary:
Now we have everything from c10::optional, we can delete this and keep a single version in c10.
Pull Request resolved: pytorch#12965

Differential Revision: D10504042

Pulled By: wanchaol

fbshipit-source-id: c0ec3892e92968cca264ae8924c19111674631ba
…torch#13004)

Summary:
Pull Request resolved: pytorch#13004

Implement BucketWeighted model layer, which learns a weight for each possible score in an IdScoreList. Here, we assume that the scores in the IdScoreList have already been converted into the appropriate 'buckets'. If this is not done, then essentially each score represents its own bucket.

We assume that the scores/buckets are integers, and if max_score is not set, we assume that the maximum cardinality of the score is less than or equal to the cardinality of the ids.

Reviewed By: chonglinsun

Differential Revision: D10413186

fbshipit-source-id: 743e643a1b36adf124502a8b6b29976158cdb130
…atches (pytorch#12841)

Summary:
`tensor.get_device()` went through two dispatches: once to the native
function
`get_device()`, and another when `get_device` calls `_th_get_device()`.
This PR avoids the dispatch by directly implementing the `get_device`
function
as a method on Tensor.

Future Work:
- Investigate caching Device on TensorImpl. This will probably bring the
  tensor.get_device down to 2ns, but I'm not sure it's worth it.

before:
```
------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BM_TensorTypeId                           0 ns          0 ns 1000000000
BM_TensorType                             8 ns          8 ns   89407911
BM_TensorIsCuda                          24 ns         24 ns   29313017
BM_TensorIsSparse                        27 ns         27 ns   26083160
BM_TensorTypeIsCuda                      11 ns         11 ns   65128120
BM_TensorNumel                           11 ns         11 ns   68314492
BM_TensorGetDevice                       71 ns         71 ns    9633125
BM_DeviceGuardCtor                      173 ns        173 ns    4067173
BM_DeviceGuard                          232 ns        232 ns    3009690
```

after:
```
------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BM_TensorTypeId                           0 ns          0 ns 1000000000
BM_TensorType                            10 ns         10 ns   69803872
BM_TensorIsCuda                           2 ns          2 ns  321626683
BM_TensorIsSparse                         6 ns          6 ns  177045382
BM_TensorNumel                           12 ns         12 ns   58770533
BM_TensorGetDevice                        4 ns          4 ns  128113396
BM_DeviceGuardCtor                       52 ns         52 ns   14997278
BM_DeviceGuard                          158 ns        158 ns    5767248

```
Pull Request resolved: pytorch#12841

Differential Revision: D10489353

Pulled By: zou3519

fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e
Summary:
Temporarily disable upsample tests.
Pull Request resolved: pytorch#13135

Reviewed By: bddppq

Differential Revision: D10859926

Pulled By: houseroad

fbshipit-source-id: 9eb068198d43ba0939d81a9e41eb6f24ff19cb6d
)

Summary:
Pull Request resolved: pytorch#13144

The intention of this diff is to prevent prevent predictor service from crashing by the "Check failed: timestep >= 0 && timestep < _T" error, as a bandage, before D10848803 can be landed (assuming D10848803 replaces the CHECKs into CAFFE_ENFORCEs, too).

Reviewed By: ilia-cher

Differential Revision: D10857963

fbshipit-source-id: bb56ad83aa867a2d25953aa7ffd84b078f8bf84a
…12824)

Summary:
While using gbenchmark, I found `tensor.resize_({0})` would take 300ns
if tensor already has the correct size. This is important for
`at::empty({0})` perf because `at::empty` always calls `resize_`, which
in turn is a important for JIT perf: the fusion compiler creates empty
tensors and then `resize_`s them to computed sizes. Most of the 300ns is
due to DeviceGuard (200ns)

Summary of findings:
- `at::empty({0}, cuda)`: 851ns
- `empty_tensor.resize({0})`: 308ns
- `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this
  next because it impacts `resize_` perf).
- vdispatch overhead (`tensor.resize_()` vs
  `at::native::resize__cuda(tensor)`): ~10ns

This PR rips out the TH `resize_` implementation and adds it to ATen
with the following modifications:
- DeviceGuard used only after the same-size check.
- Same-size check rewritten for simplicity. The new check doesn't
affect perf.
- empty_cpu / empty_cuda avoid the dispatch overhead to
tensor.resize_.

Timing with this PR:
- `at::empty({0}, cuda)`: 363ns
- `empty_tensor.resize_({0})`: 17ns

Future:
- Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes`
- Should tell resize_as_ to use the new resize_ implementation...
(because resize_as_ is in TH, it is calling the old TH resize_)
Pull Request resolved: pytorch#12824

Differential Revision: D10449209

Pulled By: zou3519

fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6
Summary: Pull Request resolved: pytorch#13145

Differential Revision: D10860849

Pulled By: Maratyszcza

fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5
Summary:
- Speed up the case of pytorch#12006 in the forward
- The backward still isn't as fast as one might hope (factor 2-3 in the pytorch#12006 case).
- More extensive benchmarking shows not so great performance compared
  to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to
  maintain reasonable precision.

Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated.
Pull Request resolved: pytorch#12368

Differential Revision: D10559696

Pulled By: SsnL

fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419
Summary:
Pull Request resolved: pytorch#13080

This is the first step to untangle this logic:
- moves stream id to thread local mechanically
- relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs

Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom.

The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much.

Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse.

Reviewed By: ezyang

Differential Revision: D10525134

fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e
Summary:
Does

```cpp
namespace torch {
using c10::optional;
using c10::nullopt;
}
```

So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing).

ezyang
Pull Request resolved: pytorch#12927

Differential Revision: D10510630

Pulled By: goldsborough

fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4
Summary:
See D10380678 for the discussion.

Caffe2 serialization code was able to handle dtype uninitalized tensor as long as their numel was 0 O_O.

For safety to unblock the push I'm preserving this behavior with critical. As we fix all occurrences of old API, we can delete this test.

Reviewed By: kennyhorror

Differential Revision: D10866562

fbshipit-source-id: e172bd045fdfca660ff05b426e001f5f2f03f408
)

Summary:
This is the same as pytorch#12889 with the addmm changes stripped out, since that appears to cause onnx broadcasting issues I don't understand.
Pull Request resolved: pytorch#13128

Reviewed By: ezyang

Differential Revision: D10853911

Pulled By: gchanan

fbshipit-source-id: 08ec8629331972f0c332ccd036980fd9c87562b0
Summary:
Pull Request resolved: pytorch#13151

No longer needed.

Reviewed By: ezyang

Differential Revision: D10862319

fbshipit-source-id: 01405d7cf2553f59ff7d3dce33755a5fdd8a8f05
gchanan and others added 25 commits October 26, 2018 15:55
Summary: Pull Request resolved: pytorch#13003

Differential Revision: D10515654

Pulled By: gchanan

fbshipit-source-id: c3f2809fdb7daeea2209ef1bcdea60266dc4854d
Summary: Pull Request resolved: pytorch#13167

Reviewed By: abadams

Differential Revision: D11296189

fbshipit-source-id: 7e49c7a78d26f4af39d50b40f70372272debb34a
Summary:
Pull Request resolved: pytorch#13114

Using one thread pool creator for all device types

Reviewed By: manojkris, wesolwsk

Differential Revision: D10851533

fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157
Summary:
Add new methods to move a node before/after another node while preserving data data dependencies.

Any suggestions for a pithier name for the methods would be appreciated 😃
Pull Request resolved: pytorch#13026

Differential Revision: D10854574

Pulled By: QueryConnectionException

fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e
Summary: Codemod generated with clangr shard mode, 50 files per diff

Reviewed By: li-roy

Differential Revision: D10866391

fbshipit-source-id: 3badc4e86edaac376918fca8d09dbfa396ac3a2c
Summary:
Pull Request resolved: pytorch#13173

Opitmize LayerNormOp

Reviewed By: houseroad

Differential Revision: D12398163

fbshipit-source-id: 6b76bc4bd9f34e623f8e385dd07d4ce99490badf
Summary:
Pull Request resolved: pytorch#13160

Reduces pytorch_core build from 2 hours to 30 minutes

Reviewed By: soumith, dzhulgakov

Differential Revision: D10524261

fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
Summary:
Pull Request resolved: pytorch#13203

Minor changes in the test workflow to run the model on CPUs

Reviewed By: stephenyan1231

Differential Revision: D9925797

fbshipit-source-id: b7b1fb2658ab68b1ffc2b1f7b314958ea4732b32
Summary:
For attention: bddppq
Pull Request resolved: pytorch#13181

Differential Revision: D12811207

Pulled By: bddppq

fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3
Summary: Pull Request resolved: pytorch#13198

Reviewed By: bddppq

Differential Revision: D12812909

Pulled By: houseroad

fbshipit-source-id: f448e0d7957c316099a6b565d129eabb7ef81e59
Summary:
Here is my stab at ```dense.to_sparse```
Pull Request resolved: pytorch#12171

Differential Revision: D10859078

Pulled By: weiyangfb

fbshipit-source-id: 5df72f72ba4f8f10e283402ff7731fd535682664
…torch#13191)

Summary:
Revert pytorch#12368 since it's causing onnx related test cases failing.
pytorch#12368

SsnL The controller you requested could not be found.
Pull Request resolved: pytorch#13191

Reviewed By: BIT-silence

Differential Revision: D12810778

Pulled By: houseroad

fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
Summary:
This pull request contains changes for:
1. Adding a generalized MIOpen activation class to be used by activation operators
2. Refactoring MIOpen ReLU op to use the new class
3. Adding ELU, Tanh and Sigmoid MIOpen ops

Differential Revision: D12810112

Pulled By: bddppq

fbshipit-source-id: 9519b3a0cd733b906bcba5d8948be089029c43ac
Summary:
Pull Request resolved: pytorch#13199

D10524381 removed inclusion of int8_simd.h in Caffe2 Int8 operators, and although the resuling code still compiles and works, it is up to 50% end-to-end slower (no SIMD!) on some models

Reviewed By: bertmaher

Differential Revision: D12813095

fbshipit-source-id: 03a713a4c070c0ad1e79e71e91d09eaddc0751eb
Summary:
Future now is an IValue. prim::Wait now is replaced by aten::wait

This PR is built on top of pytorch#12925
Pull Request resolved: pytorch#12976

Differential Revision: D10861483

Pulled By: highker

fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7
Summary:
Complete billing of changes:

Related to Batch Inverse:
- [x] Add batched inverse (CPU)
- [x] Add batched inverse (CUDA)
- [x] Modify autograd entry
- [x] Add tests
  - [x] test_autograd
  - [x] test_cuda
  - [x] test_torch
- [x] Modify docs
- [x] Remove `_batch_inverse` in `MultivariateNormal`.
- [x] Allow batch matrices as inputs for negative powers in `matrix_power`

Miscellaneous modifications:
- [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops.
- [x] Add a RAII structure for MAGMA queue management.
Pull Request resolved: pytorch#9949

Differential Revision: D10559089

Pulled By: zou3519

fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4
Summary:
Previously, the move constructor performed a swap
between the item being moved in, and the uninitialized
garbage from the object itself.

I didn't bother adding a test because I shortly intend
to kill this class entirely.  But the fix is so easy that
I wanted to put it in in case I don't get around to doing
this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#13183

Reviewed By: pietern

Differential Revision: D12809062

Pulled By: ezyang

fbshipit-source-id: 0d94bb9796fb7d30621256bfb401a4f89ba8ddc8
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#13230

Differential Revision: D12818863

Pulled By: ezyang

fbshipit-source-id: 371337ca4b9d8f8e71eb78d6a53085e1c3619631
Summary:
Pull Request resolved: pytorch#12904

Enabling support for saving exceptions in async parts of CPU ops via
event().SaveException(). The error contract for CPU ops becomes:
 - return false in sync part -> net->Run() returns false
 - throw in sync part -> net->Run() rethrows the same exception
 - SetFinished("error msg") in async part -> net->Run() returns false
 - event().SetFinishedWithException() in async part -> net->Run() rethrows the same
   exception

Reviewed By: andrewwdye

Differential Revision: D10479130

fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0
Summary: Pull Request resolved: pytorch#13237

Reviewed By: ezyang

Differential Revision: D12818917

Pulled By: gchanan

fbshipit-source-id: 0ff55ccac3459edd3b28068a0378e9dae085eda0
Summary:
The old test took 2min to run.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

See pytorch#13233
Pull Request resolved: pytorch#13236

Differential Revision: D12823474

Pulled By: ezyang

fbshipit-source-id: c800492a96e41a4cd18d41901f411d9d4e978613
…al<CUDAStream>> (pytorch#13125)

Summary:
Pull Request resolved: pytorch#13125

Previously, it returned a vector of THCStream*, which we eventually turned
into CUDAStream.  No need to spatter the conversion code everywhere: just
do it correctly to begin with.  An important side effect of doing it this
way is that we no longer pass nullptr to CUDAStream; instead, we create
the default stream.  I will rely on this in a later patch.

Reviewed By: gchanan

Differential Revision: D10853224

fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a
Summary:
Pull Request resolved: pytorch#13021

Let's make nullptr CUDAStream an illegal state.

Reviewed By: gchanan

Differential Revision: D10520421

fbshipit-source-id: 723c1f5130b2c92ec97411a958707fac4a90173f
Summary:
AsyncDBConnMarkedDownDBException As we discussed, this changes the backward pass profiler annotations such that 1. they're demangled and 2. if they came from a custom Python-side autograd function, they show a unique name based on the name of that Python-side function.
Pull Request resolved: pytorch#13154

Differential Revision: D12808952

Pulled By: colesbury

fbshipit-source-id: 4119dbaed7714b87c440a81d3a1835c5b24c7e68
@iotamudelta iotamudelta requested a review from ezyang as a code owner October 29, 2018 16:15
@iotamudelta iotamudelta merged commit 1827bda into ROCm:master Oct 30, 2018
amd-sriram added a commit that referenced this pull request Feb 4, 2026
[REDUX] Refactor Apex build process to use the PyTorch JIT extension flow (#291) (#296)
jithunnair-amd pushed a commit that referenced this pull request Feb 4, 2026
[REDUX] Refactor Apex build process to use the PyTorch JIT extension
flow ([#291](ROCm/apex#291))
([#296](ROCm/apex#296))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.