Integrate from upstream#296
Merged
iotamudelta merged 75 commits intoROCm:masterfrom Oct 30, 2018
Merged
Conversation
Summary: We currently don't check names in `register_module` and `register_parameter` as thoroughly as we do in Python. This PR fixes this. Python checks are e.g. in https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L108 ezyang ebetica apaszke Pull Request resolved: pytorch#13016 Differential Revision: D10853800 Pulled By: goldsborough fbshipit-source-id: 765357875e90a5046e72351a7a47a86511633ab6
Summary: Pull Request resolved: pytorch#13088 Differential Revision: D10856067 Pulled By: anderspapitto fbshipit-source-id: cfbf0f6cad3953e1ee1c55482c00a3db9f140594
Summary: Depends on pytorch#12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1)) * Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph * Adds `torch._jit_internal.weak_module` tags to modules that already work * `Sigmoid` * `Tanh` * `Hardshrink` * `PReLU` * `Softsign` * `Tanhshrink` * `PairwiseDistance` Pull Request resolved: pytorch#12966 Differential Revision: D10559557 Pulled By: driazati fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0
Summary: Pull Request resolved: pytorch#13024 There's a TensorList type in ivalue.h and one in ScalarType.h, and they are different. This diff moves IValue types into an ivalue namespace so we can merge the namespaces without conflicts. Reviewed By: ezyang Differential Revision: D10518929 fbshipit-source-id: cb760b6804a399880d2bff3acf9a3422d99fc0b8
Summary: Pull Request resolved: pytorch#12950 For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten. When we move classes from at/caffe2 to c10, this 1. allow keeping backwards compatibility with third paty code we can't control 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces. Reviewed By: ezyang Differential Revision: D10496244 fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e
Summary: In this pr we added operator randn in onnx symbolic. Also, related tests are added. Pull Request resolved: pytorch#12880 Reviewed By: houseroad Differential Revision: D10501788 Pulled By: zrphercule fbshipit-source-id: ba8bb00ca848c4b95decabf638a1bc13fe11d03e
Summary: clang-format-6 run on all cpp,cc,c,cu,cxx,hpp,hxx,h files under /c10d and /thd Pull Request resolved: pytorch#13138 Differential Revision: D10857742 Pulled By: teng-li fbshipit-source-id: f99bc62f56019c05acdfa8e8c4f0db34d23b4c52
Summary: Pull Request resolved: pytorch#13141 This is an example diff to show what lint rules are being applied. Reviewed By: mingzhe09088 Differential Revision: D10858478 fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
Summary: Pull Request resolved: pytorch#12991 Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage Reviewed By: ezyang, Yangqing Differential Revision: D10511254 fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202
Summary: Pull Request resolved: pytorch#13082 Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away. Reviewed By: ezyang, Yangqing Differential Revision: D10844117 fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
Summary: Pull Request resolved: pytorch#13134 For tensor, we plan to do the following renaming: ``` * t.ndim() → t.dim() * t.size() → t.numel() * dims() → t.sizes() * t.meta() → t.dtype() * t.dim(d) → t.size(d) ``` This diff adds new APIs in caffe2::Tensor so we can start codemod, we'll remove old API after the codemod Reviewed By: ezyang Differential Revision: D10856028 fbshipit-source-id: 1638997e234d7b3113ef8be65a16246f902273c7
Summary: Pull Request resolved: pytorch#12949 Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp. Reviewed By: mraway, xsh6528 Differential Revision: D10454037 fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104
Summary: This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes pytorch#9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack. Follow up: remove python_default_init completely. Pull Request resolved: pytorch#12582 Differential Revision: D10417423 Pulled By: wanchaol fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
Summary: The existing default timeout was set at 10 seconds, which is too low for asynchronous tasks that depend on a barrier to resynchronize. Having a single timeout for all operations is not ideal and this will be addressed in future commits. Pull Request resolved: pytorch#13056 Reviewed By: teng-li Differential Revision: D10558746 Pulled By: pietern fbshipit-source-id: d857ea55b1776fc7d0baf2efd77951b5d98beabb
Summary: just a sanity check to make sure everything is in order Pull Request resolved: pytorch#13037 Differential Revision: D10854563 Pulled By: michaelsuo fbshipit-source-id: 409303c4cbf058b75e24bf2213b49e9d79cb862e
Summary: Pull Request resolved: pytorch#13140 This is an example about the benefit of proper facebook linter. The old code was not python 2.x (actually, pre-python 3.3) compatible. Note that FileExistsError is added in Python 3.3: https://stackoverflow.com/questions/20790580/python-specifically-handle-file-exists-exception Reviewed By: mingzhe09088 Differential Revision: D10858804 fbshipit-source-id: a4c995aef9f720cb8b0ce463f0a51db667fc42f2
Summary: cc Yangqing mingzhe09088 anderspapitto mingzhe09088 Pull Request resolved: pytorch#12990 Differential Revision: D10862301 Pulled By: orionr fbshipit-source-id: 62ba09cf0725f29692fac71bc30173469283390b
Summary: Now we have everything from c10::optional, we can delete this and keep a single version in c10. Pull Request resolved: pytorch#12965 Differential Revision: D10504042 Pulled By: wanchaol fbshipit-source-id: c0ec3892e92968cca264ae8924c19111674631ba
…torch#13004) Summary: Pull Request resolved: pytorch#13004 Implement BucketWeighted model layer, which learns a weight for each possible score in an IdScoreList. Here, we assume that the scores in the IdScoreList have already been converted into the appropriate 'buckets'. If this is not done, then essentially each score represents its own bucket. We assume that the scores/buckets are integers, and if max_score is not set, we assume that the maximum cardinality of the score is less than or equal to the cardinality of the ids. Reviewed By: chonglinsun Differential Revision: D10413186 fbshipit-source-id: 743e643a1b36adf124502a8b6b29976158cdb130
…atches (pytorch#12841) Summary: `tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 2ns, but I'm not sure it's worth it. before: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns 3009690 ``` after: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 ``` Pull Request resolved: pytorch#12841 Differential Revision: D10489353 Pulled By: zou3519 fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e
Summary: Temporarily disable upsample tests. Pull Request resolved: pytorch#13135 Reviewed By: bddppq Differential Revision: D10859926 Pulled By: houseroad fbshipit-source-id: 9eb068198d43ba0939d81a9e41eb6f24ff19cb6d
) Summary: Pull Request resolved: pytorch#13144 The intention of this diff is to prevent prevent predictor service from crashing by the "Check failed: timestep >= 0 && timestep < _T" error, as a bandage, before D10848803 can be landed (assuming D10848803 replaces the CHECKs into CAFFE_ENFORCEs, too). Reviewed By: ilia-cher Differential Revision: D10857963 fbshipit-source-id: bb56ad83aa867a2d25953aa7ffd84b078f8bf84a
…12824) Summary: While using gbenchmark, I found `tensor.resize_({0})` would take 300ns if tensor already has the correct size. This is important for `at::empty({0})` perf because `at::empty` always calls `resize_`, which in turn is a important for JIT perf: the fusion compiler creates empty tensors and then `resize_`s them to computed sizes. Most of the 300ns is due to DeviceGuard (200ns) Summary of findings: - `at::empty({0}, cuda)`: 851ns - `empty_tensor.resize({0})`: 308ns - `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this next because it impacts `resize_` perf). - vdispatch overhead (`tensor.resize_()` vs `at::native::resize__cuda(tensor)`): ~10ns This PR rips out the TH `resize_` implementation and adds it to ATen with the following modifications: - DeviceGuard used only after the same-size check. - Same-size check rewritten for simplicity. The new check doesn't affect perf. - empty_cpu / empty_cuda avoid the dispatch overhead to tensor.resize_. Timing with this PR: - `at::empty({0}, cuda)`: 363ns - `empty_tensor.resize_({0})`: 17ns Future: - Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes` - Should tell resize_as_ to use the new resize_ implementation... (because resize_as_ is in TH, it is calling the old TH resize_) Pull Request resolved: pytorch#12824 Differential Revision: D10449209 Pulled By: zou3519 fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6
Summary: Pull Request resolved: pytorch#13145 Differential Revision: D10860849 Pulled By: Maratyszcza fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5
Summary: - Speed up the case of pytorch#12006 in the forward - The backward still isn't as fast as one might hope (factor 2-3 in the pytorch#12006 case). - More extensive benchmarking shows not so great performance compared to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated. Pull Request resolved: pytorch#12368 Differential Revision: D10559696 Pulled By: SsnL fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419
Summary: Pull Request resolved: pytorch#13080 This is the first step to untangle this logic: - moves stream id to thread local mechanically - relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom. The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much. Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse. Reviewed By: ezyang Differential Revision: D10525134 fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e
Summary:
Does
```cpp
namespace torch {
using c10::optional;
using c10::nullopt;
}
```
So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing).
ezyang
Pull Request resolved: pytorch#12927
Differential Revision: D10510630
Pulled By: goldsborough
fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4
Summary: See D10380678 for the discussion. Caffe2 serialization code was able to handle dtype uninitalized tensor as long as their numel was 0 O_O. For safety to unblock the push I'm preserving this behavior with critical. As we fix all occurrences of old API, we can delete this test. Reviewed By: kennyhorror Differential Revision: D10866562 fbshipit-source-id: e172bd045fdfca660ff05b426e001f5f2f03f408
) Summary: This is the same as pytorch#12889 with the addmm changes stripped out, since that appears to cause onnx broadcasting issues I don't understand. Pull Request resolved: pytorch#13128 Reviewed By: ezyang Differential Revision: D10853911 Pulled By: gchanan fbshipit-source-id: 08ec8629331972f0c332ccd036980fd9c87562b0
Summary: Pull Request resolved: pytorch#13151 No longer needed. Reviewed By: ezyang Differential Revision: D10862319 fbshipit-source-id: 01405d7cf2553f59ff7d3dce33755a5fdd8a8f05
Summary: Pull Request resolved: pytorch#13003 Differential Revision: D10515654 Pulled By: gchanan fbshipit-source-id: c3f2809fdb7daeea2209ef1bcdea60266dc4854d
Summary: Pull Request resolved: pytorch#13167 Reviewed By: abadams Differential Revision: D11296189 fbshipit-source-id: 7e49c7a78d26f4af39d50b40f70372272debb34a
Summary: Pull Request resolved: pytorch#13114 Using one thread pool creator for all device types Reviewed By: manojkris, wesolwsk Differential Revision: D10851533 fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157
Summary: Add new methods to move a node before/after another node while preserving data data dependencies. Any suggestions for a pithier name for the methods would be appreciated 😃 Pull Request resolved: pytorch#13026 Differential Revision: D10854574 Pulled By: QueryConnectionException fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e
Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866391 fbshipit-source-id: 3badc4e86edaac376918fca8d09dbfa396ac3a2c
Summary: Pull Request resolved: pytorch#13173 Opitmize LayerNormOp Reviewed By: houseroad Differential Revision: D12398163 fbshipit-source-id: 6b76bc4bd9f34e623f8e385dd07d4ce99490badf
Summary: Pull Request resolved: pytorch#13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
Summary: Pull Request resolved: pytorch#13203 Minor changes in the test workflow to run the model on CPUs Reviewed By: stephenyan1231 Differential Revision: D9925797 fbshipit-source-id: b7b1fb2658ab68b1ffc2b1f7b314958ea4732b32
Summary: For attention: bddppq Pull Request resolved: pytorch#13181 Differential Revision: D12811207 Pulled By: bddppq fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3
Summary: Pull Request resolved: pytorch#13198 Reviewed By: bddppq Differential Revision: D12812909 Pulled By: houseroad fbshipit-source-id: f448e0d7957c316099a6b565d129eabb7ef81e59
Summary: Here is my stab at ```dense.to_sparse``` Pull Request resolved: pytorch#12171 Differential Revision: D10859078 Pulled By: weiyangfb fbshipit-source-id: 5df72f72ba4f8f10e283402ff7731fd535682664
…torch#13191) Summary: Revert pytorch#12368 since it's causing onnx related test cases failing. pytorch#12368 SsnL The controller you requested could not be found. Pull Request resolved: pytorch#13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
Summary: This pull request contains changes for: 1. Adding a generalized MIOpen activation class to be used by activation operators 2. Refactoring MIOpen ReLU op to use the new class 3. Adding ELU, Tanh and Sigmoid MIOpen ops Differential Revision: D12810112 Pulled By: bddppq fbshipit-source-id: 9519b3a0cd733b906bcba5d8948be089029c43ac
Summary: Pull Request resolved: pytorch#13199 D10524381 removed inclusion of int8_simd.h in Caffe2 Int8 operators, and although the resuling code still compiles and works, it is up to 50% end-to-end slower (no SIMD!) on some models Reviewed By: bertmaher Differential Revision: D12813095 fbshipit-source-id: 03a713a4c070c0ad1e79e71e91d09eaddc0751eb
Summary: Future now is an IValue. prim::Wait now is replaced by aten::wait This PR is built on top of pytorch#12925 Pull Request resolved: pytorch#12976 Differential Revision: D10861483 Pulled By: highker fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7
Summary: Complete billing of changes: Related to Batch Inverse: - [x] Add batched inverse (CPU) - [x] Add batched inverse (CUDA) - [x] Modify autograd entry - [x] Add tests - [x] test_autograd - [x] test_cuda - [x] test_torch - [x] Modify docs - [x] Remove `_batch_inverse` in `MultivariateNormal`. - [x] Allow batch matrices as inputs for negative powers in `matrix_power` Miscellaneous modifications: - [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops. - [x] Add a RAII structure for MAGMA queue management. Pull Request resolved: pytorch#9949 Differential Revision: D10559089 Pulled By: zou3519 fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4
Summary: Previously, the move constructor performed a swap between the item being moved in, and the uninitialized garbage from the object itself. I didn't bother adding a test because I shortly intend to kill this class entirely. But the fix is so easy that I wanted to put it in in case I don't get around to doing this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#13183 Reviewed By: pietern Differential Revision: D12809062 Pulled By: ezyang fbshipit-source-id: 0d94bb9796fb7d30621256bfb401a4f89ba8ddc8
Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#13230 Differential Revision: D12818863 Pulled By: ezyang fbshipit-source-id: 371337ca4b9d8f8e71eb78d6a53085e1c3619631
Summary: Pull Request resolved: pytorch#12904 Enabling support for saving exceptions in async parts of CPU ops via event().SaveException(). The error contract for CPU ops becomes: - return false in sync part -> net->Run() returns false - throw in sync part -> net->Run() rethrows the same exception - SetFinished("error msg") in async part -> net->Run() returns false - event().SetFinishedWithException() in async part -> net->Run() rethrows the same exception Reviewed By: andrewwdye Differential Revision: D10479130 fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0
Summary: Pull Request resolved: pytorch#13237 Reviewed By: ezyang Differential Revision: D12818917 Pulled By: gchanan fbshipit-source-id: 0ff55ccac3459edd3b28068a0378e9dae085eda0
Summary: The old test took 2min to run. Signed-off-by: Edward Z. Yang <ezyang@fb.com> See pytorch#13233 Pull Request resolved: pytorch#13236 Differential Revision: D12823474 Pulled By: ezyang fbshipit-source-id: c800492a96e41a4cd18d41901f411d9d4e978613
…al<CUDAStream>> (pytorch#13125) Summary: Pull Request resolved: pytorch#13125 Previously, it returned a vector of THCStream*, which we eventually turned into CUDAStream. No need to spatter the conversion code everywhere: just do it correctly to begin with. An important side effect of doing it this way is that we no longer pass nullptr to CUDAStream; instead, we create the default stream. I will rely on this in a later patch. Reviewed By: gchanan Differential Revision: D10853224 fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a
Summary: Pull Request resolved: pytorch#13021 Let's make nullptr CUDAStream an illegal state. Reviewed By: gchanan Differential Revision: D10520421 fbshipit-source-id: 723c1f5130b2c92ec97411a958707fac4a90173f
Summary: AsyncDBConnMarkedDownDBException As we discussed, this changes the backward pass profiler annotations such that 1. they're demangled and 2. if they came from a custom Python-side autograd function, they show a unique name based on the name of that Python-side function. Pull Request resolved: pytorch#13154 Differential Revision: D12808952 Pulled By: colesbury fbshipit-source-id: 4119dbaed7714b87c440a81d3a1835c5b24c7e68
jithunnair-amd
pushed a commit
that referenced
this pull request
Feb 4, 2026
[REDUX] Refactor Apex build process to use the PyTorch JIT extension flow ([#291](ROCm/apex#291)) ([#296](ROCm/apex#296))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.