Bf16 dummy by rohithkrn · Pull Request #1 · rohithkrn/pytorch

rohithkrn · 2019-10-17T23:12:47Z

No description provided.

…torch#26556) Summary: Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK. Pull Request resolved: pytorch#26556 Test Plan: _Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch. _Performance_ - All measurements below on Pixel 2 **Before**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25" > > Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > > Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671 **After**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > > Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425 > Differential Revision: D17533311 Pulled By: AshkanAliabadi fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d

Summary: Closes pytorch#24562 Pull Request resolved: pytorch#26598 Differential Revision: D17531503 Pulled By: VitalyFedyunin fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48

Summary: Pull Request resolved: pytorch#26583 Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be used for logging NCCL version in exception messages. Test Plan: See above Differential Revision: D17473200 fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64

…ytorch#26816) Summary: Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain. BEFORE ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` AFTER ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: pytorch#26816 Differential Revision: D17573455 Pulled By: VitalyFedyunin fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2

Summary: Pull Request resolved: pytorch#26751 ### Summary We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below: 1. Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading. 2. Verify the binary locally - Run tests on both arm64 and simulator 3. Publish the cocoapods officially ### Test plan - podspec lint command succeeds - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` Test Plan: Imported from OSS Differential Revision: D17577131 Pulled By: xta0 fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac

Summary: Pull Request resolved: pytorch#26496 It is a BAD BAD idea to deploy Docker versions which are not deployed (per ossci-job-dsl) because those versions will get GC'ed after two weeks. At the moment, there is no verification that your Docker version is deployed. This adds an Azure job to check this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17575100 Pulled By: ezyang fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a

Summary: Pull Request resolved: pytorch#26704 nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :( ghstack-source-id: 90714191 Test Plan: build docker images on Jenkins Differential Revision: D17543120 fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25

Summary: Added ONNX export for baddbmm in opset9 Pull Request resolved: pytorch#25738 Reviewed By: hl475 Differential Revision: D17565828 Pulled By: houseroad fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5

Summary: Pull Request resolved: pytorch#26739 Test Plan: Imported from OSS Differential Revision: D17577908 Pulled By: bwasti fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8

Summary: We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR. For nhwc performance improvement: import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('****', str(dtype), '*****') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) # torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ===========without nhwc handling=========== **** torch.qint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 1.999044418334961 2.5860953330993652 1.2936657681940702 GB/s float GB/s quant 1.6192056416115257 0.3129103516188541 **** torch.quint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.02730655670166 2.6061582565307617 1.2855274639721328 GB/s float GB/s quant 1.596632728927902 0.3105014816242217 **** torch.qint32 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.0180463790893555 2.4047350883483887 1.1916153728010588 GB/s float GB/s quant 1.603959172365819 1.3460376636426636 ===========with nhwc handling=========== **** torch.qint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.0913314819335938 0.09696483612060547 0.04636512047863123 GB/s float GB/s quant 1.5477527249803915 8.345458337015 **** torch.quint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.1065664291381836 0.09959936141967773 0.04728042754408879 GB/s float GB/s quant 1.5365591871338384 8.124710725706763 **** torch.qint32 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.044203281402588 0.6003522872924805 0.29368521846837126 GB/s float GB/s quant 1.5834354779917448 5.391607675216635 Pull Request resolved: pytorch#26631 Differential Revision: D17521498 Pulled By: llyfacebook fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103

…ytorch#26453) Summary: Pull Request resolved: pytorch#26453 Previously, schema matching would incorrectly widen typevar bindings when later occurrences were supertypes of earlier ones. This allowed callsites like `floatlist.append(tensor.item())` to pass the typechecker, causing a runtime assert (issue pytorch#24856). An earlier, reverted fix (pytorch#25136) insisted on strict equality across all occurrences of a typevar, necessitating explicit casts around Scalar-typed arguments to int- or float-typed parameters, like `tensor.item()` above. This was per the original type system design, but turned out to break existing user code that relied on the de facto dynamic downcast. (The error required a specialized list representation.) The current fix includes the prevention of typevar widening, but adds logic to insert implicit conversions from Scalar to float or int as needed to satisfy a matched schema. Test Plan: Imported from OSS Differential Revision: D17470598 Pulled By: bhosmer fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659

Summary: This PR makes the following improvements: 1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`) 2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically). 3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`. Pull Request resolved: pytorch#26521 Differential Revision: D17507358 Pulled By: yf225 fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9

Test Plan: revert-hammer Differential Revision: D17565828 Original commit changeset: 85f605a7b3fa fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37

Summary: test run: pytorch#26732 Pull Request resolved: pytorch#26823 Reviewed By: soumith Differential Revision: D17576095 Pulled By: mingbowan fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b

… to std::function. (pytorch#26592) Summary: function_ref is pulled over from LLVM. It is to callables what StringRef is to strings. This allows it to be substantially lighter weight, particularly in code size. That comes at the cost of not being usable in situations where the callable's lifetime is shorter than the function_ref. This means it is suitable for callback-like scenarios, but not for situations where the callable needs to be stored. In converting TensorIterator, I only encountered one situation that required refactoring to comply with function_ref's constraints. In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB. Pull Request resolved: pytorch#26592 Differential Revision: D17516202 fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48

…sion for logging Test Plan: revert-hammer Differential Revision: D17473200 Original commit changeset: 4881ed5221b3 fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576

… more (pytorch#26815) Summary: Pull Request resolved: pytorch#26815 This PR adds named tensor support for: - any, all, `bitwise_not(_)`, cumprod, cumsum, `logical_not` In addition, it adds smoke tests for a variety of tensor attributes and fns: - is_shared, is_signed - retain_grad, register_hook Test Plan: - [namedtensor ci] Differential Revision: D17575905 Pulled By: zou3519 fbshipit-source-id: 37bfa327e68112c5bf0f6bf1f467a527f50fa1c4

Summary: Default encoding when using torch.load to 'utf-8' This commit provides changes for cases where user tries to torch.load a pickled module with non-ASCII characters in the docstring as discussed in pytorch#21743. The default encoding was changed from 'ascii' to 'utf-8'. Documentation for `torch.load` was updated and two tests (loading py2 unicode module with unicode in it; error throwing when user explicitly sets wrong encoding) were written. ~~This commit provides changes for better error handling in cases where user tries to `torch.load` a pickled module with non-ASCII characters in the docstring as discussed in pytorch#21743 Ping ezyang Pull Request resolved: pytorch#26421 Differential Revision: D17581633 Pulled By: yf225 fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620

Summary: We find a bug about `std::tuple` with nvcc. In C++11, `std::tuple` constructor is constexpr in libstdc++, but is not constexpr in libc++. https://github.com/pytorch/pytorch/blob/c36b77fcdad3d54227cf0fd51693eb57035002c0/aten/src/ATen/native/cuda/Loops.cuh#L109-L111 The lines have occurred crashes in CUDA with a message `scan failed with synchronize`. It is a error message of cuda initialization. The purpose of this PR is fixed for loop in nvcc and libc++ by not using `std::tuple`. Pull Request resolved: pytorch#25553 Differential Revision: D17582118 Pulled By: yf225 fbshipit-source-id: d6f62ed46c2415b48eb49f8a051cf3c0e7cb23ce

Summary: cpuinfo_initialize() was not implemented for s390 arch. cpuinfo calls are x86 specific to determine vector extensions AVX, AVX512 etc. Without this patch an unnecessary error log is printed in s390 arch: Error in cpuinfo: processor architecture is not supported in cpuinfo Pull Request resolved: pytorch#26265 Differential Revision: D17452301 Pulled By: izdeby fbshipit-source-id: 9ca485550385c26dec18aac5953c887f1ffbfb7a

Summary: Support IterableValue expressions and rangevalue in list comprehensions. Just as with supporting list comprehensions where the expression changes the input list types, we need to correctly type the list we create and it works. Fixes pytorch#26693 Fixes pytorch#22483 Pull Request resolved: pytorch#26768 Differential Revision: D17562762 Pulled By: eellison fbshipit-source-id: 7ce8bf8605758dfd99057bc0376b4b724c4f9251

Summary: Pull Request resolved: pytorch#26829 The TensorIterator loop for `copy_` uses operations that are currently unsupported by named tensors. The solution is to wrap `copy_` in a function that does the name propagation and ignore names when running the implementation of `copy_`. There is no test case because I'm not sure how to trigger the incorrect behavior, but there is definitely code in CUDA copy that doesn't support named tensors (expand_as isn't supported): https://github.com/pytorch/pytorch/blob/aaf30cdf36839bc3f21b1622fb91ff3e2983e8ea/aten/src/ATen/native/cuda/Copy.cu#L141-L148 Test Plan: - [namedtensor ci] Differential Revision: D17577310 Pulled By: zou3519 fbshipit-source-id: e11c52243800e1331fad738084304badcfd51ae2

Summary: Pull Request resolved: pytorch#26735 Test Plan: Imported from OSS Differential Revision: D17558505 Pulled By: vincentqb fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4

Summary: Pull Request resolved: pytorch#25187 The bytecode export flow: dump the bytecode format for the light weighted interpreter. * The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested). * Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false). * Both bytecode and module object are exported in pickle format. * The module object (in data.pkl) is the same as the original JIT model. * The serializer is dependent on pickle only (no protobuf or Json). * The major functionality is forked in ScriptModuleSerializer2::serialize(). * The test loader is test_bc_export.cpp. * Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants). * Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (pytorch#25151) . * Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (pytorch#25148). The output layout looks like: * folders of methods. * In each method folder (for example, forward/): * bytecode.pkl: instructions and operators * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder. * data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript. Test Plan: Imported from OSS Differential Revision: D17076411 fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046

Summary: Pull Request resolved: pytorch#26494 Close pytorch#24586 Test Plan: Imported from OSS Differential Revision: D17572497 Pulled By: VitalyFedyunin fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945

Summary: An attempt to enable double backward for non-cudnn LSTM and GRU (see pytorch#25315, pytorch#20449). RNN works already because it does not rely on fused kernels. This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically. The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used. The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement. The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit. cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN. Thanks to mcarilli whose approach to double backward in weight norm I copied. Pull Request resolved: pytorch#26660 Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled. Differential Revision: D17581489 Pulled By: ngimel fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa

Summary: pytorch#24604 Pull Request resolved: pytorch#26481 Differential Revision: D17489859 Pulled By: ifedan fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3

Summary: Pull Request resolved: pytorch#26855 Test Plan: Imported from OSS Differential Revision: D17589837 Pulled By: IvanKobzarev fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7

Summary: Pull Request resolved: pytorch#26705 Test Plan: Imported from OSS Differential Revision: D17543281 Pulled By: ZolotukhinM fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9

…d in one place. Summary: Pull Request resolved: pytorch#26703 Test Plan: Imported from OSS Differential Revision: D17543131 Pulled By: ZolotukhinM fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941

Summary: Pull Request resolved: pytorch#27179 Reviewed By: hl475 Differential Revision: D17698364 Pulled By: houseroad fbshipit-source-id: 8fddd1c13e7af026962cf2d9c05fd7c957d8526e

Summary: Pull Request resolved: pytorch#27027 Test Plan: Imported from OSS Differential Revision: D17682402 Pulled By: pbelevich fbshipit-source-id: 2008ce405176c174cdba88b4f25cd77a82bb13ea

Summary: Pull Request resolved: pytorch#27028 Test Plan: Imported from OSS Differential Revision: D17682406 Pulled By: pbelevich fbshipit-source-id: 9c313237cb93b9870c6fcf8d01b3dbe4af4c6f2a

Summary: Pull Request resolved: pytorch#26933 Differential Revision: D17685153 Pulled By: ezyang fbshipit-source-id: e402a12dc9a172649f153903a3a7834004b5667a

Summary: Pull Request resolved: pytorch#26927 When we build a "normal" copy of PyTorch, we internally build a copy of libtorch. If we want to test libtorch: we have a choice: test against the regular PyTorch build, or test against the libtorch only build. All of our libtorch tests require Python-side PyTorch to run. So it makes more sense to test the regular PyTorch build. There is probably still utility in making sure that it is still possible to build libtorch only, but in that case we should endeavour to run tests that ONLY require libtorch build, and not Python side stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17695384 Pulled By: ezyang fbshipit-source-id: 02522a8be0f5944f2b6255a8f1281e53ce2dcc6f

Summary: Pull Request resolved: pytorch#27214 There appears to be some convention that if you have a .python2 file in your directory, it indicates all of the downstream files are Python 2 files (and our internal linter will stop complaining about lack of __future__ compatibility imports). When we switch to Python 3 only, move this to .python3 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zertosh Differential Revision: D17712308 Pulled By: ezyang fbshipit-source-id: 69b1dd720bc20a22264ebdc0ce56b7a5dbfc5f34

Summary: Pull Request resolved: pytorch#27035 Test Plan: Imported from OSS Differential Revision: D17682403 Pulled By: pbelevich fbshipit-source-id: 186377fe577abfdd53acc95751a7ed845b51af95

Summary: Pull Request resolved: pytorch#27140 Reserve will issue a (non-binding) request to shrink stack size to the amount you requested to be reserved. But the whole point of a stack is that you can extra space if you need to do nested calls. Getting rid of the space is a bad idea! Another failure mode: if we repeatedly push lists of size one, we'll have quadratic behavior. Bad. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17712765 Pulled By: ezyang fbshipit-source-id: cf3a08b0212304b679256fb8637e311beb3ff3a8

Summary: Pull Request resolved: pytorch#26861 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17712801 Pulled By: ezyang fbshipit-source-id: 504594452e6594d79e41856ce5177ab370dc26f1

Summary: The mesh plugin is now supported by default TensorBoard install, so removing this comment. cc sanekmelnikov lanpa natalialunova Pull Request resolved: pytorch#27146 Differential Revision: D17717610 Pulled By: orionr fbshipit-source-id: 516efad5b800f7261b1dc6728e798c830d88b6ca

Summary: [jit] String default args get printed as ascii values pytorch#25804 pytorch#25804 Pull Request resolved: pytorch#27088 Differential Revision: D17689732 Pulled By: Krovatkin fbshipit-source-id: f385b2fe44c5a2387bfcb6484edf7faa92bc8edf

Summary: test_nn.py will still require significant work to make generic, however I'm trying to break up the PRs into more manageable chunks. Pull Request resolved: pytorch#27137 Differential Revision: D17718488 Pulled By: mruberry fbshipit-source-id: 4d9359414838a1d2a957d7a334f6a5df6cb00aeb

Summary: Pull Request resolved: pytorch#27193 Test Plan: Imported from OSS Differential Revision: D17704958 Pulled By: zafartahirov fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb

Summary: Pull Request resolved: pytorch#26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Test Plan: export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear Differential Revision: D17540567 fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406

Summary: updated export for topk and sort as part of opset11 Pull Request resolved: pytorch#25739 Reviewed By: hl475 Differential Revision: D17467131 Pulled By: houseroad fbshipit-source-id: 653be138455728ec8e9bb81ae63dd7ce0c4d0793

Summary: Pull Request resolved: pytorch#26984 closes pytorch#26944 In the existing implementation, each worker exits when it sees no send/recv tasks. However, as we adding support for nested calls, one RPC could trigger more RPCs in the UDF or in the response callback. As a result, even if the worker does not see any send/recv tasks for now, it does not mean there won't be any in the future. In this commit, we added a counters for all sent and received messages between each pair of nodes, and then use allgather to collect those counters, i.e., all workers would have the same view on the global states. The workers would only exit when all sends are received and processed. Test Plan: Imported from OSS Differential Revision: D17633456 Pulled By: mrshenli fbshipit-source-id: 813a155d3b2daf2226612eb17f6c698512e9beca

Summary: Fixing this [issue](pytorch#27077). Tested via unit tests Pull Request resolved: pytorch#27162 Differential Revision: D17694187 Pulled By: izdeby fbshipit-source-id: 939017c91605c89a0e08e0c3f8fe21de93bba95b

Summary: Pull Request resolved: pytorch#27161 If the base class starts with "Test", pytest would detect it as a test class to run. However, the base class does not have a proper `setUp` method to launch processes. As a result, when I execute the following command, I run into test failures: py.test test_dist_autograd_fork.py -k test -vs outputs: ``` test/test_dist_autograd_fork.py::TestDistAutograd::test_autograd_context FAILED test/test_dist_autograd_fork.py::TestDistAutograd::test_autograd_send_function FAILED test/test_dist_autograd_fork.py::TestDistAutograd::test_rpc_complex_args FAILED test/test_dist_autograd_fork.py::DistAutogradTestWithFork::test_autograd_context PASSED test/test_dist_autograd_fork.py::DistAutogradTestWithFork::test_autograd_send_function PASSED test/test_dist_autograd_fork.py::DistAutogradTestWithFork::test_rpc_complex_args PASSED ``` Test Plan: Imported from OSS Differential Revision: D17694165 Pulled By: mrshenli fbshipit-source-id: 0b8fcb99c76b5139b765831079f083c3122f618a

Summary: Fixing this [issue1](pytorch#27074) and [issue2](pytorch#27073) Tested via unit tests Pull Request resolved: pytorch#27221 Differential Revision: D17716235 Pulled By: izdeby fbshipit-source-id: c7bafd16b469c91924ebc3dba77ca56424d4c93c

Summary: Pull Request resolved: pytorch#27118 att Test Plan: test_jit.py Imported from OSS Differential Revision: D17717637 fbshipit-source-id: 83c94ff12e6a2137e0161a338fcdd100514c452f

…ch#492) * enable few torch tests and ops * fix typo * fix typo

* miopen conv bf16 support * fix typo * update error message

* Implement C++ API version of torch.nn.functional.one_hot (#27081) (#27177) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27177 Add support for F::one_hot C++ function. Test Plan: Added 3 new tests to verify API is working Imported from OSS Differential Revision: D17697934 fbshipit-source-id: a8127fb87c00daa119bb92a5702bc4bbba48290d * Refactor torch::jit::script::Module::register_* API. (#27189) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27189 Conceptually, Module is just a view over ClassType and ivalue::object. register_ methods are the only methods that are exception from this: they provide an API not available on ClassType or object directly. This PR ports this API to ClassType and makes Module truly just a view over those two. Test Plan: Imported from OSS Differential Revision: D17703533 Pulled By: ZolotukhinM fbshipit-source-id: 2cdb9fb486b3fb8527986483c7f34be7bd59fabf * Add c10_experimental ops to BC check white list (#27235) Summary: experimental ops doesn't provide bc guarantee. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27235 Reviewed By: hl475 Differential Revision: D17723292 Pulled By: houseroad fbshipit-source-id: 644ae34d130418a810e0f9d802fa25f6e34c5ccf * Rename _intrinsic to intrinsic Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27194 Test Plan: Imported from OSS Differential Revision: D17704957 Pulled By: zafartahirov fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc * Allow set for qconfig for dynamic_quantize Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27181 Test Plan: Imported from OSS Differential Revision: D17717482 Pulled By: jamesr66a fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81 * Fix reprs for _intrinsic modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27184 Test Plan: Imported from OSS Differential Revision: D17717481 Pulled By: jamesr66a fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda * skip all rpc and dist autograd spawn tests for <PY36 (#27191) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27191 skip rpc and distautograd spawns tests for <python 3.6 ghstack-source-id: 91231565 close #27157 Test Plan: unit tests Differential Revision: D17697368 fbshipit-source-id: bb8cf1f47de41f9d350fd60afe37fece293d8680 * Add send and recv backward functions for builtin operators RPC. (#25527) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527 Master GH issue: https://github.com/pytorch/pytorch/issues/23110. This change builds upon https://github.com/pytorch/pytorch/pull/24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. ghstack-source-id: 91240466 Test Plan: unit tests. Differential Revision: D17148077 fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233 * Rename jit Function to ScriptFunction Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27219 Test Plan: Imported from OSS Differential Revision: D17715306 Pulled By: albanD fbshipit-source-id: d11a7634dbee6a885c7177b240958e5aed2544f3 * Make cpp-backed jit classes appear as being in torch.jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27220 Test Plan: Imported from OSS Differential Revision: D17715305 Pulled By: albanD fbshipit-source-id: 574704ad23ece6da7aa2780b78867307bef523cc * Avoid configuring ROCm if USE_CUDA is on. (#26910) Summary: Move the resolution of conflict between `USE_CUDA` and `USE_ROCM` to CMake as to effectuate: - `USE_CUDA=ON` and CUDA is found, `USE_ROCM=ON` and ROCM is found --> fatal error - Either `USE_CUDA=ON` and CUDA is found or `USE_ROCM=ON` and ROCM is found --> The respective GPU feature is ON - Otherwise no GPU support Pull Request resolved: https://github.com/pytorch/pytorch/pull/26910 Differential Revision: D17738652 Pulled By: ezyang fbshipit-source-id: 8e07cc7e922e0abda24a6518119c28952276064e * Revert "Add std::variant backport as c10::variant (#26836)" (#27277) Summary: This reverts commit 0cd188035a27fc38ce1e8eee205f6d47cd7650e6. As reported by jerryzh168 and pritamdamania87, mpark::variant doesn’t compile with gcc 7.3.1 on fb devserver and throws error similar to https://github.com/mpark/variant/issues/43. (However, it doesn’t fail with gcc 7.3.1 in OSS CI, based on https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2995606/output/107/0?file=true) A plausible workaround is to upgrade devserver to devtoolset-8, but that would in turn causes CUDA build to complain: ``` /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 7 are not supported! #error -- unsupported GNU version! gcc versions later than 7 are not supported! ``` (Thanks pritamdamania87 for the report!) The solution for now is to revert the mpark::variant addition, and I will find alternatives that will work with gcc 7.3.1 on fb devserver. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27277 Differential Revision: D17739804 fbshipit-source-id: ad945b3d86ab7ddbff58f4ecab95e0e1ac725ae9 * Implement LpNorm regularizer to be used on the inputs for feature importance (#26376) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26376 * Create the new dense_feature_reg (FCInputLpNorm) for feature importance to be applied to the fully-connected layer for feature-importance. Test Plan: * Unit test located in: `caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test.py` Reviewed By: un-disclosed Differential Revision: D17360361 fbshipit-source-id: 1a0e119eeb17199a13dfffe58b3036ea4255e301 * Provide (but skip) 3.5 job by default on all PRs. (#27293) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27293 This doesn't turn on 3.5 signal, but it makes it so that [test all] will include it if you do request it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17738741 Pulled By: ezyang fbshipit-source-id: 2b1af4d7bf26fd84a593fde292d6bfa2aabc1148 * more profiler changes in C++ before enabling checkScript changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26909 Differential Revision: D17683632 Pulled By: Krovatkin fbshipit-source-id: 5d36c3c4cf7411c56485ef19fe59262b9f8b45b2 * Fix segfault while printing value type for an error msg in emitListComprehension Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27261 Differential Revision: D17740159 Pulled By: Krovatkin fbshipit-source-id: 90439282aea14d8634eb41ffece5b6320d615fa7 * Factored out the default mappings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27164 Test Plan: Imported from OSS Differential Revision: D17694475 Pulled By: zafartahirov fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961 * Add memory format argument to the `clone` operator (#27106) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27106 Adds memory_format option to the `clone` operator. Introduce new `clone` behavior if used with `input_t.clone(memory_format=torch.preserve_format)`: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17699357 Pulled By: VitalyFedyunin fbshipit-source-id: 5ae1537c2aca1abf0bf1eec4416846129c156f66 * Extract version to version.txt (#27149) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27149 Extract version to version.txt and add reading version logic to setup.py and fb/torch_version.py ghstack-source-id: 91271883 Test Plan: N/A Reviewed By: gchanan, ezyang Differential Revision: D17689307 fbshipit-source-id: 21899502027cec71b63d9dc151e09ff5ff3f279d * add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27274 This is yet another fix to address #26764. PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place thus it's most logically sound place to do such tweaks. However, we observed nontrivial perf regression due to this fix. Turns out the numel() tensor method gets called in several for-loops thus incurs ~7M thread_local updates in a single forward call: ``` 7173330 numel 558 size 416 q_scale 302 _empty_affine_quantized 288 contiguous 257 q_zero_point 216 qscheme 173 empty 110 set_ 105 as_strided 104 permute ... ``` As numel() is not called from a single place so a natural workaround is to update function_wrapper.py so that it only adds the guard on gen_namespace_function() case and ignore the gen_tensor_method() case. But some tensor methods are actually being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the only "band aid" left on the table is to insert guard on JIT->aten path as originally did on #26868 - this is a simplified version of it as it doesn't hurt to extend the NonVariableMode scope a little bit to also cover stack drop/pack calls. On Android we only expose JIT API so we don't need worry about TensorMethods being called directly. On iOS we don't provide a wrapper yet but we can mention this caveat in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor unification and remove all these hacks. Test Plan: - Verified it runs quantized/fp32 MobileNetV2 models; - Verified it fixes the perf regression (revert #26908 separately); Differential Revision: D17732489 Pulled By: ljk53 fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5 * Updating submodules Summary: GitHub commits: https://github.com/pytorch/fbgemm/commit/8786c0819029c076b0e28320e880ba3ac192ea8b Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 9c04a2ba7cc2166db0203f186ece261ca8b186dd * Avoid calling tensor.numel() in for loops (#27298) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27298 PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place. This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined. Also the thread local state is expensive to read/write so many times and this kills perf. PR #27274 is another approach to fix this and has more details. Test Plan: Quantized mobilenetV2 perf before this change Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696 Perf after this change Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267 Imported from OSS Differential Revision: D17742565 fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff * Fix circle CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27307 Test Plan: Imported from OSS Differential Revision: D17746444 Pulled By: xta0 fbshipit-source-id: ed37f91921f1ea7db6c63ba69f04883856341c39 * Update the link for iOS demo app in README.md (#27145) Summary: Update the link for iOS demo app in README.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/27145 Differential Revision: D17746591 Pulled By: xta0 fbshipit-source-id: 6f49a0daddc8b79804e1b8487ba1db3807a3f481 * Allow use cpu_serial_kernel with void-lambda (#27271) Summary: Currently we use CPU_tensor_apply1 to loop through the tensor in single thread and aggregate data: ``` // compute variance per input accscalar_t var_sum = 0; CPU_tensor_apply1<scalar_t>(in, [&] (const scalar_t& i) { var_sum += (i - mean) * (i - mean); }); ``` and we don't have the ability to use TensorIterator for this. ``` accscalar_t var_sum = 0; auto iter = TensorIterator::unary_op(self, self); cpu_serial_kernel(iter, [&](scalar_t i) -> scalar_t { var_sum += (i - mean) * (i - mean); return a; //Unable to set value back, because self should be const }); ``` This PR should resolve this problem and allow to use void-lambda: ``` auto iter = at::TensorIterator(); iter.add_input(in); iter.build(); accscalar_t var_sum = 0; \ at::native::cpu_serial_kernel(iter, [&](scalar_t i) -> void { var_sum += (i - mean) * (i - mean); }); ``` In the future it make sense to change Reduction part and allow to reduce to a scalar, not just to a tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/27271 Differential Revision: D17743310 Pulled By: ifedan fbshipit-source-id: a149751f2d671aefd3ed84bd50b2c0543a63b701 * Move the CUDA implementation of log10 to ATen. (#26733) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26733 Close #24587 Test Plan: Imported from OSS Differential Revision: D17606981 Pulled By: VitalyFedyunin fbshipit-source-id: 732f07b981287da3ca235b272b7b6f78144f8ebe * Mention magma-cuda101 package in install instructions (#27325) Summary: There is a magma package for the newest CUDA verson (10.1), mention it here lest someone try to mistakenly use the version for CUDA 10.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27325 Differential Revision: D17749535 Pulled By: soumith fbshipit-source-id: 2d34a7af1218e6157935bfd5e03f4d2c0f00f200 * C++ API parity: TensorTest.BackwardNonScalarOutputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27314 Test Plan: Imported from OSS Differential Revision: D17746371 Pulled By: pbelevich fbshipit-source-id: 246fae22a60ed9a6d7b9843239b4b3391cc9dc3e * Fix build (#27318) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27318 Fix TBB build USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake Test Plan: Imported from OSS Differential Revision: D17747449 Pulled By: ilia-cher fbshipit-source-id: 421f362bd10f3be34bffe86ae4f26e8f1c15f1a4 * Relax restrictions on set_num_threads (#27190) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27190 Allow set_num_threads to be called multiple times in case of TBB parallel backend Test Plan: BUILD_BINARY=1 USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake ./build/bin/test_parallel ./build/bin/thread_init_test Reviewed By: kostmo Differential Revision: D17704236 Pulled By: ilia-cher fbshipit-source-id: 274380795e78ba417301c5faa18c9e9d3198bd5e * Migrate the cpu and gpu implementations of resize nearest 3D from vision to caffe2 Summary: As title. Fix the build failures in unicorn-build-restrictions as discussed in D17330625 Test Plan: buck test mode/opt caffe2/caffe2/quantization/server:resize_nearest_3d_dnnlowp_op_test In vision libs, no need to explicitly add dep to resize 3d op as the caffe2_cpu dep is added by default. Reviewed By: stephenyan1231 Differential Revision: D17676082 fbshipit-source-id: c034ab67a9078f72077b396991ffb9e54e6ab40b * Add method add_hparams to API doc (#27344) Summary: Adds the method `add_hparams` to `torch.utils.tensorboard` API docs. Will want to have this in PyTorch 1.3 release. cc sanekmelnikov lanpa natalialunova Pull Request resolved: https://github.com/pytorch/pytorch/pull/27344 Differential Revision: D17753689 Pulled By: orionr fbshipit-source-id: cc8636e0bdcf3f434444cd29471c62105491039d * Support interface python assignment as an attribute (#26734) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26734 This PR added the python assignment for interface as an attribute in the module, it enables any object that implicitly inheriting the specific interface to be able to be assigned to the interface type in python. Serialization support for interface/class assignment will be done in the follow up PR Test Plan: Imported from OSS Differential Revision: D17742708 Pulled By: wanchaol fbshipit-source-id: a0a2d8c74b60ed3fa6c05e1b0d49b7ad1abc670b * Skip tests that use numpy if it's not present Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27165 Pulled By: driazati Differential Revision: D17695078 fbshipit-source-id: d25c920f4c43285028537f88761d47a2c9db7b8f * Add Python RRef as args and return value (#25499) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499 See #23110 for model parallel design details, and #26759 for the RRef protocol. This commit add support for using RRef as Python UDF arguments and return value. RRefs can now be shared from owner to user, from user to owner, or from user to user. Limitations: 1. No implicit type conversion yet. (#27099) 2. No failure handling and retry. (#26116) 3. UDF is not yet blocked until all RRefs are confirmed. (#27098) 4. Internal RRef control messages are not idempotent yet. (#26116) 5. Cannot delete RRefs correctly when there are circular dependencies. (#27096) Main changes: 1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations. 2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages. 3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`. 4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure. 5. Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs. 6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`. Test Plan: Imported from OSS buck test mode/dev-nosan //caffe2/test:rpc_fork Differential Revision: D17184146 Pulled By: mrshenli fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265 * Set MINIZ_NO_TIME to avoid computing localtime on each pickle/unpickle (#27268) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27268 For small pickle/unpickle, we spend a disproportionate amount of time in time functions - roughly 23% in __tzset() for unpickle case. We're currently not using the .m_time currently, though we can add this feature back if it's ever needed. An alternative would be to -DMINIZ_NO_TIME in compiler_flags, but we would need to also consistently # define MINIZ_NO_TIME in any .cpp including this .h, since this # define modifies the struct length in an unfortunate manner. Test Plan: buck test mode/dev-nosan caffe2/test/... Run benchmark: buck-out/opt/gen/caffe2/torch/fb/distributed/thriftRpcBackend/test/ThriftRpcAgentBench Differential Revision: D17724198 fbshipit-source-id: b44a0217b1d9f8ce6c0f24297f59045c7cadf4b1 * Add a test case to RpcTest, check src/dst (#27322) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27322 # Problem Existing test cases are too symmetric, so that didn't detect this error, request sent to the wrong worker. Because of wrong `worker_names` setup, worker0 sends request to itself, while it should had sent to worker1. # Solution Add a test case, letting the dst side to check if it's an request from the expected src. ghstack-source-id: 91299312 Reviewed By: satgera Differential Revision: D17069062 fbshipit-source-id: ef7a532dd497bfc0f0ee8446fcd5d29656aaf175 * Update to ROCm 2.8 (#27337) Summary: New docker images built with tag 324. Related jenkins changes: https://github.com/pytorch/ossci-job-dsl/commit/83ec81335742e66b02af90b7c74021b8792fc63f https://github.com/pytorch/ossci-job-dsl/commit/aa235a14c82db69d0544cd8fc1da03ef9a50096e Triggered CI runs: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/48682/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/55638/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/27337 Differential Revision: D17753827 Pulled By: bddppq fbshipit-source-id: 2c3f77b0b7c680013c7cc6d7953fe0da4922fe48 * add sdk support for xcodebuild script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27358 Test Plan: Imported from OSS Differential Revision: D17757389 Pulled By: xta0 fbshipit-source-id: ed8e470b9c6329b96297ee7c65ba08759251baad * export remainder (#24410) Summary: Added ONNX export support for torch.remainder and torch.fmod Pull Request resolved: https://github.com/pytorch/pytorch/pull/24410 Reviewed By: hl475 Differential Revision: D17466791 Pulled By: houseroad fbshipit-source-id: afe6519e5f370824e3b4a45b69036a7260fb72cf * Replacing the skip_list with white_list in the qconfig propagation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183 Test Plan: Imported from OSS Differential Revision: D17700548 Pulled By: zafartahirov fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16 * Show a warning that not all dir members of quantized work. (#27339) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27339 This PR just shows a warning message. Eventually we will show a correct __dir__ Test Plan: Imported from OSS Differential Revision: D17751333 Pulled By: zafartahirov fbshipit-source-id: e9bc62fd8dd0147979291d0aac3f1afe5b8c7a9f * improve error messages when a method or attribute is missing (#27110) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27110 Previously missing methods on some types like tensors would talk about 'builtins' which are only a thing inside of the compiler. Furthermore, the error would only occur when the builtin was applied and it was discovered that no builtin existed. This changes the error message so that it discovers that method on our builtin types does not exist on attribute lookup. Test Plan: Imported from OSS Differential Revision: D17677616 Pulled By: zdevito fbshipit-source-id: 2f7cf6c6093a9c832569c44f4b1044a2e56fe205 * refactor extra sugared values (#26270) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26270 We've accumulated a lot of sugared values whose only purpose is to be instanced-checked against in emitApplyExpr. I need to add another one to insert an unchecked_cast, and do not want to continue the pattern. This creates an abstraction for this concept (SpecialFormValue), and removes all the unneeded sugared values. There is no functionality change here just a bunch of code movement in compiler.cpp Test Plan: Imported from OSS Differential Revision: D17412854 Pulled By: zdevito fbshipit-source-id: 15877c91decaea5a00d1fe737ed2d0f0f8a79a28 * Minor readability fixes to C++ documentation (#27338) Summary: Changed `yieldings` to `yielding`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27338 Differential Revision: D17758406 Pulled By: yf225 fbshipit-source-id: 1633834a6ad80449c061ebc330ac24f3e42f5506 * Choose num_threads in parallel_for based on GRAIN_SIZE (#26963) Summary: Fixes https://github.com/pytorch/pytorch/issues/24080, Continuation of https://github.com/pytorch/pytorch/issues/26886 What soumith said in https://github.com/pytorch/pytorch/pull/26886#issuecomment-535760635 seems plausible > I wonder if it has to do with `#pragma omp parallel num_threads(num_threads)` which has unintended consequences, where even if `num_threads=1`, entering an omp block inside an omp block results in bad behavior. I know for a fact that gcc's openmp doesn't start the thread pool when given `num_threads(1)` but it seems clang behaves differently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26963 Differential Revision: D17626981 Pulled By: soumith fbshipit-source-id: 484ffe6cc172382bb5ff49ce1fceda7eba20a512 * Enable Python3.6 PyTorch ROCm CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27353 Differential Revision: D17758495 Pulled By: bddppq fbshipit-source-id: 95e329bc30f092e4093a33c408f1647b803d9983 * Fixes PackedSequence.to (and unifies PackedSequence conversions) (#27245) Summary: PackedSequence.to(device) incorrectly places one of three tensors on the device and leaves the other two tensors where they are. If these devices are distinct then further operations on PackedSequence will fail. This behavior is inconsistent with Tensor.to and PackedSequence's behavior when .cuda() is called. Additionally, PackedSequence defines multiple other conversion functions that were independently and inconsistently implemented. This PR unifies all implementations and makes the PackedSequence.to behavior more consistent with Tensor.to. It is not completely consistent per comments. test_device_mask in test_nn.py is updated to validate the new functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27245 Differential Revision: D17757850 Pulled By: mruberry fbshipit-source-id: 58f0bd40f1aa300fb0a91ee743483d645f977dc5 * Makes test_cuda.py's generated tensor op tests generic (#27210) Summary: - The tensor op tests generated in test_cuda.py are now generic and appear in test_torch,py - Data previously held in auxiliary data structures and files, like test_cuda_ignores.txt, is inlined Previously the tensor op tests used several auxiliary data structures, a file, and exception handling to filter the test suite. If a function wasn't implemented, for example, that exception would be caught. This let functions like trigamma, which isn't callable, appear to be tested. See https://github.com/pytorch/pytorch/issues/27230. Filtering from additional data stores is error prone, too. It requires developers understand what data stores are used and how they're used. The existing sources are also sometimes incorrect. The txt file claims that dist_ doesn't work on half tensors, for example, but the updated tests verify it does. In addition to making these tests generic, this PR removes those auxiliary data structures and does not catch any exceptions. Exceptions are errors. (This also means that if something implemented breaks it will now report as an error. Previously the test suite would have reported a pass.) The test infrastructure was also simplified to not perform computations with CPU half tensors since they do not support many operations. This introduces a float<->half conversion quirk but eliminates awkward functions that would first convert cpu tensors to float, perform an operation, and convert them back. With this change test_cuda.py is almost entirely CUDA-specific. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27210 Differential Revision: D17757907 Pulled By: mruberry fbshipit-source-id: b3c191c379667b1a7d5361087bdf82f397f77f65 * Remove six dependency (#27282) Summary: https://github.com/pytorch/pytorch/pull/27136 added a dependency on `six`, which is not available by default and is not marked as a dependency on PyTorch binaries, causing torchvision CI to break, see https://circleci.com/gh/pytorch/vision/20778?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link for example. This PR use `torch._six` instead of `six` as a replacement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27282 Reviewed By: lerks Differential Revision: D17737561 Pulled By: fmassa fbshipit-source-id: 7dcd0cc2c8bab27b8f4535f664f60388818d3497 * Make `align_to` method-only. (#27304) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27304 The ellipsis version of `align_to` only works if it is called as a method. To prevent any confusion, this PR disables `torch.align_to` (but keeps `Tensor.align_to`. Test Plan: - [namedtensor ci] Differential Revision: D17743809 Pulled By: zou3519 fbshipit-source-id: cf5c53dcf45ba244f61bb1e00e4853de5db6c241 * Remove CUDA_VERSION from Python script (which has already been detected in CMake) (#27316) Summary: (Intentionally left blank) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27316 Differential Revision: D17762715 Pulled By: ezyang fbshipit-source-id: 044c0ea6e8c2d12912c946a9a50b934b5253d8c8 * Revert D17743310: [pytorch][PR] Allow use cpu_serial_kernel with void-lambda Test Plan: revert-hammer Differential Revision: D17743310 Original commit changeset: a149751f2d67 fbshipit-source-id: 043240201d67966dd08b7b1bc2f9bf4897923e00 * Implement pickle support for sparse tensors and torch.layout instances (#27062) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/16667 and https://github.com/OpenMined/PySyft/issues/2326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27062 Differential Revision: D17762932 Pulled By: ezyang fbshipit-source-id: dd99c1f4ac8eb2286eb55aa20ce973f60ce7b7e1 * move new_zeros to core from THP (#26511) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/25831 ezyang can you please have a look? Pull Request resolved: https://github.com/pytorch/pytorch/pull/26511 Differential Revision: D17763037 Pulled By: ezyang fbshipit-source-id: 3596c01c4ab421e7785d6055cc813806f840a5c7 * autograd: double backwards function for binary_cross_entropy loss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26983 Reviewed By: albanD Differential Revision: D17714357 Pulled By: anjali411 fbshipit-source-id: cebfe09a9048c4be457b7f2718bc396c06ecabee * Change schedulers to chainable form (#26423) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423 Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208). * Changing the behavior of schedulers to the chainable formula when available * Using the closed form whenever epoch is different from None until the next release with a deprecation warning * Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax) * Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release. * `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch * `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. # #20527 ### Before The user calls scheduler with a constant epoch either across loops or in the same loop. ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) # Scheduler with sometimes-constant epoch number for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: lr_scheduler.step(epoch) print(optimizer.param_groups[0]['lr']) ``` ### After If the user wants to step ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) last_epoch = -1 for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: # Check if epoch number has changed manually if epoch-last_epoch > 0: lr_scheduler.step() last_epoch = epoch print(epoch, scheduler.get_computed_values()) ``` # #22107 ### Before ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Scheduler computes and returns new learning rate, leading to unexpected behavior print(i, scheduler.get_lr()) scheduler.step() ``` ### After ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Returns last computed learning rate by scheduler print(i, lr_scheduler.get_computed_values()) lr_scheduler.step() ``` # ghstack This contains the changes from #24352. Opening again since they were reverted. This reverts commit 1c477b7e1f378e9c1f8efed296241f68a8a4372b. Test Plan: Imported from OSS Differential Revision: D17460427 Pulled By: vincentqb fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9 * Make RpcTest re-usable by other RPC backends by using init_method to initialize a RPC backend (#27320) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27320 https://github.com/pytorch/pytorch/pull/27208/ # Problem Other RPC backends take init_method. # Solution Set up init_method in rpc tests. ghstack-source-id: 91335127 Differential Revision: D17709219 fbshipit-source-id: 3184c6e9b922a6ff9f4d1cb9abfa118b23f43eeb * Add OPN instruction and vararg operator table (#27104) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104 * The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter. * (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic"). * A vararg operator table is built with void(int input_size, Stack& stack) functions. ## Unit test LiteInterpreterConv covers OPN instruction and conv operator. Test Plan: Imported from OSS Differential Revision: D17762853 fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83 * Allow use cpu_serial_kernel with void-lambda (#27370) Summary: https://github.com/pytorch/pytorch/pull/27271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27370 Differential Revision: D17763265 Pulled By: ifedan fbshipit-source-id: d670560dfc555db529b18c01aa42f0ccb2127889 * From docs of scatter_add_() removed erroneous comment on uniqueness of indices. (#27132) Summary: Fixes https://github.com/pytorch/pytorch/issues/27080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27132 Differential Revision: D17765307 Pulled By: soumith fbshipit-source-id: b0892ff442f3b49f8e3cdf029e2a08b51fa88f28 * Reduce error context from 10 -> 3 (#26765) Summary: 10 lines of error context (on both sides) is overkill, especially now that we have line numbers. With a compilation stack of a couple functions, it becomes a pain to scroll to the top of the stack to see the real error every time. This also fixes class names in the compilation stack to a format of `ClassName.method_name` instead of the the full qualified name Old output ``` clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor): Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'. : at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20 top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level) batch_idx = torch.arange(num_images, device=device)[:, None] objectness = objectness[batch_idx, top_n_idx] levels = levels[batch_idx, top_n_idx] proposals = proposals[batch_idx, top_n_idx] final_boxes = [] final_scores = [] for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes): boxes = box_ops.clip_boxes_to_image(boxes, img_shape) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE keep = box_ops.remove_small_boxes(boxes, self.min_size) boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep] # non-maximum suppression, independently done per level keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh) # keep only topk scoring predictions keep = keep[:self.post_nms_top_n] boxes, scores = boxes[keep], scores[keep] final_boxes.append(boxes) final_scores.append(scores) 'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8 num_images = len(anchors) num_anchors_per_level = [o[0].numel() for o in objectness] objectness, pred_bbox_deltas = \ concat_box_prediction_layers(objectness, pred_bbox_deltas) # apply pred_bbox_deltas to anchors to obtain the decoded proposals # note that we detach the deltas because Faster R-CNN do not backprop through # the proposals proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors) proposals = proposals.view(num_images, -1, 4) boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE losses = {} if self.training: assert targets is not None labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets) regression_targets = self.box_coder.encode(matched_gt_boxes, anchors) loss_objectness, loss_rpn_box_reg = self.compute_loss( objectness, pred_bbox_deltas, labels, regression_targets) losses = { 'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8 """ if self.training and targets is None: raise ValueError("In training mode, targets should be passed") original_image_sizes = [(img.shape[-2], img.shape[-3]) for img in images] images, targets = self.transform(images, targets) features = self.backbone(images.tensors) if isinstance(features, torch.Tensor): features = OrderedDict([(0, features)]) proposals, proposal_losses = self.rpn(images, features, targets) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets) detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes) losses = {} losses.update(detector_losses) losses.update(proposal_losses) # TODO: multiple return types?? # if self.training: ``` New output ``` RuntimeError: clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor): Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'. : at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20 final_scores = [] for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes): boxes = box_ops.clip_boxes_to_image(boxes, img_shape) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE keep = box_ops.remove_small_boxes(boxes, self.min_size) boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep] 'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8 proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors) proposals = proposals.view(num_images, -1, 4) boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE losses = {} 'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8 if isinstance(features, torch.Tensor): features = OrderedDict([(0, features)]) proposals, proposal_losses = self.rpn(images, features, targets) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets) detections = self.transform.postprocess ``` ](https://our.intern.facebook.com/intern/diff/17560963/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26765 Pulled By: driazati Differential Revision: D17560963 fbshipit-source-id: e463548744b505ca17f0158079b80e08fda47d49 * Fix some return std::move warnings (#27384) Summary: clang-tidy was complaining about these Pull Request resolved: https://github.com/pytorch/pytorch/pull/27384 Pulled By: driazati Differential Revision: D17767412 fbshipit-source-id: 03e2630790edf3f6bbf9064e754156613032b464 * add function to get nccl version for error messages (#27068) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27068 Adds a function that uses ncclGetVersion from the NCCL API to retrieve the NCCL version. Converts it into a readable string, and is called in NCCL-related error messages to log the NCCL version. Hopefully this will help with debugging NCCL errors. Test Plan: Modify C10D_NCCL_CHECK in NCCLUtils.hpp to always error by setting ncclResult_t error = ncclSystemError force an NCCL error with script test/simulate_nccl_errors.py: Start master node: python test/simulate_nccl_errors.py localhost 9124 0 2 Start other node: python test/simulate_nccl_errors.py localhost 9124 1 2 On the master node, should see the following error message w/NCCL version: ``` Traceback (most recent call last): File "simulate_nccl_errors.py", line 29, in <module> process_group.allreduce(torch.rand(10).cuda(rank)).wait() RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:375, unhandled system error, NCCL version 2.4.8 ``` Differential Revision: D17639476 fbshipit-source-id: a2f558ad9e883b6be173cfe758ec56cf140bc1ee * C++ API parity: Hardtanh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27038 Test Plan: Imported from OSS Differential Revision: D17682405 Pulled By: pbelevich fbshipit-source-id: f65e76696e0041c3518f56da94f2e3b800305234 * fix OSX CI build (#27373) Summary: fix OSX caffe2 CI build, attempt 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27373 Differential Revision: D17768461 Pulled By: soumith fbshipit-source-id: b0a076c07382327730b5d86b8a00f5388c368b5e * ProcessGroupNCCL should respect timeout passed in to init_process_group. (#27224) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27224 As part of adding error handling to NCCL, we are now able to specify a timeout for operations using ProcessGroupNCCL. Although, this timeout had a default of 10 seconds and didn't respect the timeout specified in init_process_group. In this change, I've ensured we pass the appropriate timeout to ProcessGroupNCCL. ghstack-source-id: 91283548 Test Plan: Added unit test to verify timeout passed in to init_process_group is respected. Differential Revision: D17717992 fbshipit-source-id: c73320187f1f3b2693ba1e177d80646e282d01a2 * Add clip_grad_norm_ to c++ api (#26140) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26140 Per https://github.com/pytorch/pytorch/issues/25883, we want to work towards C++/Python API parity. This diff adds clip_grad_norm_ to the c++ API to improve parity. ghstack-source-id: 91334333 ghstack-source-id: 91334333 Test Plan: Added a unit test Differential Revision: D17312367 fbshipit-source-id: 753ba3a4d084d01f3cc8919da3108e67c809ad65 * C++ API parity: LeakyReLU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27059 Test Plan: Imported from OSS Differential Revision: D17682407 Pulled By: pbelevich fbshipit-source-id: 2a4f42e9438799ba8de7282ac7a6fd3ff97ee048 * Some hipify script cleanups (#27375) Summary: continue https://github.com/pytorch/pytorch/issues/26363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27375 Differential Revision: D17764992 Pulled By: bddppq fbshipit-source-id: ecc06521179677efcedb1d58ceda63df7d63627e * add some support for the occupancy API on ROCm (#27390) Summary: Unfortunately, the HIP function takes uint32_t* instead of int*, so we still need to ifdef for the time being. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27390 Differential Revision: D17768832 Pulled By: bddppq fbshipit-source-id: c65176660cb0783a04f0a4a064f686818d759589 * Add gfx908 to the list of per-default compiled architectures. (#27388) Summary: ROCm 2.8 added preliminary support for gfx908. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388 Differential Revision: D17767772 Pulled By: bddppq fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57 * Change nightly builds version to 1.4.0-SNAPSHOT (#27381) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27381 Changing android nightly builds from master to version 1.4.0-SNAPSHOT, as we also have 1.3.0-SNAPSHOT from the branch v1.3.0 Test Plan: Imported from OSS Differential Revision: D17773620 Pulled By: IvanKobzarev fbshipit-source-id: c39a1dbf5e06f79c25367c3bc602cc8ce42cd939 * Pickup proxy parameters for publishing (#27389) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27389 Pickup gradle proxy parameters (handy for publishing from devserver) in maven publishing gradle plugin Test Plan: Imported from OSS Differential Revision: D17773548 Pulled By: IvanKobzarev fbshipit-source-id: 662c0b2835e6cf1e4009da79e27268d4a19c2ceb * MovingAverage Observer (#27396) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396 Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches ghstack-source-id: 91369018 Test Plan: buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details Differential Revision: D17727213 fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0 * Add methods to write image tensor content to buffer (#27359) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27359 Adding methods to TensorImageUtils: ``` bitmapToFloatBuffer(..., FloatBuffer outBuffer, int outBufferOffset) imageYUV420CenterCropToFloat32Tensor(..., FloatBuffer outBuffer, int outBufferOffset) ``` To be able to - reuse FloatBuffer for inference - to create batch-Tensor (contains several images/bitmaps) As we reuse FloatBuffer for example demo app - image classification, profiler shows less memory allocations (before that for every run we created new input tensor with newly allocated FloatBuffer) and ~-20ms on my PixelXL Known open question: At the moment every tensor element is written separatly calling `outBuffer.put()`, which is native call crossing lang boundaries As an alternative - to allocation `float[]` on java side and fill it and put it in `outBuffer` with one call, reducing native calls, but increasing memory allocation on java side. Tested locally just eyeballing durations - have not noticed big difference - decided to go with less memory allocations. Will be good to merge into 1.3.0, but if not - demo app can use snapshot dependencies with this change. PR with integration to demo app: https://github.com/pytorch/android-demo-app/pull/6 Test Plan: Imported from OSS Differential Revision: D17758621 Pulled By: IvanKobzarev fbshipit-source-id: b4f1a068789279002d7ecc0bc680111f781bf980 * add warning to dnnlowp fc if quantization kind is not min_max Summary: Print warning when using DNNLOWP dynamic int8 quant for FC and activation_quantization_kind != min_max. Warning will display in console but not in Bento. Would have to use CAFFE_ENFORCE to alert in Bento. Test Plan: buck run unit test forcing DNNLOWP FC with activation_quantization_kind = "l2" and saw warning printed in console. Reviewed By: csummersea Differential Revision: D17770921 fbshipit-source-id: b6532e4c9a86d74e3db4cb432735505d378a366e * Add interface/object serialization as module attribute (#26770) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26770 This PR added the interface/object serialization as module attribute, to allow initializing object as a interface type during python initialization. Because interface type can be backed by any class object that implements that interface, if we declare it in python/module.__init__, we will need to collect the run time types of the value and serialize them to ensure complete code information Test Plan: Imported from OSS Differential Revision: D17742707 fbshipit-source-id: 7f614ad4f982996d320a0e2dd3515bf47370e730 * Adding docstrings for nnq.functional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27363 Test Plan: Imported from OSS Differential Revision: D17758907 Pulled By: zafartahirov fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2 * Enable RCCL in ROCm build (#27383) Summary: continues https://github.com/pytorch/pytorch/pull/23884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383 Differential Revision: D17767248 Pulled By: bddppq fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90 * Add randomFill to test_utils.h Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests. Test Plan: ``` buck run mode/opt //tvm/sparse:cblas_bench ``` Reviewed By: yinghai Differential Revision: D17759193 fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1 * Use deepcopy inputs for ONNX ort test cases (#27186) Summary: Running models with inplace operators will change values of input tensors. Deepcopy input tensors each time to keep the original input tensors intact. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27186 Differential Revision: D17776598 Pulled By: jerryzh168 fbshipit-source-id: d4808a11185a9ab0d782a62d7d708dfe7e94559c * Remove dependency on six from dist_autograd_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27369 Test Plan: Imported from OSS Differential Revision: D17763104 Pulled By: mrshenli fbshipit-source-id: dd146809686e7720f2b77012eebb6aed72851556 * Docstring fix (#27225) Summary: Correcting docstring for `add_image_with_boxes` method. Fixed spelling mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27225 Differential Revision: D17776604 Pulled By: jerryzh168 fbshipit-source-id: 45f69643ec3b58c46b9fb67411c42a6d09b7290e * Tweak docs on building docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27364 Differential Revision: D17777402 Pulled By: dzhulgakov fbshipit-source-id: 304c678e5c80d7f8c779d65c11f9bf1b0facdb52 * Upgrade to ROCm 2.9 (#27417) Summary: New docker images built with tag 325: https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/325 Related ossci-job-dsl commits: https://github.com/pytorch/ossci-job-dsl/commit/a00a76f927944aed961a3bbbc4f17aff0fc30d71 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27417 Differential Revision: D17777517 Pulled By: bddppq fbshipit-source-id: a6b8cb86b37f537d402f6d2c7d28ad28a6a5a317 * enable rocTX API (#27416) Summary: ROCm 2.9 brings support for the rocTX API through rocTracer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416 Differential Revision: D17777480 Pulled By: bddppq fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7 * C++ API parity: LogSigmoid Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27060 Test Plan: Imported from OSS Differential Revision: D17682404 Pulled By: pbelevich fbshipit-source-id: d60d64cd4caf1f56a2e05c516f91321d46ec9624 * Remove Tensor.h, TensorMethods.h from src/core. (#27086) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086 This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past). This is a commandeer of #25031 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D17687345 Pulled By: ezyang fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f * Remove outdated note in cholesky_solve and triangular_solve doc strings (#26989) Summary: We do support inputs with dim > 2 in _out variants Pull Request resolved: https://github.com/pytorch/pytorch/pull/26989 Differential Revision: D17785632 Pulled By: soumith fbshipit-source-id: d42ba7ca9c225ad1a26ff3b410d0c5c08eaed001 * Disable tsan for test_multiprocessing. (#27410) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27410 Similar to https://github.com/pytorch/pytorch/pull/25005, TSAN is not safe to use in a multi-threaded program with fork and can cause deadlocks. As a result, disabling this test for TSAN. ghstack-source-id: 91393545 Test Plan: buildbot Differential Revision: D17775141 fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a * Unfold export (#24970) Summary: ONNX export for Unfold in symbolic opset9 + op and ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/24970 Reviewed By: hl475 Differential Revision: D17495106 Pulled By: houseroad fbshipit-source-id: fcd179a1213c0f219628f25c09e66fcfe4c5df50 * Reduce special casing around 'training' (#27109) Summary: Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling. Fixes #26884 ](https://our.intern.facebook.com/intern/diff/17728129/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27109 Pulled By: driazati Differential Revision: D17728129 fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57 * Put metrics back to torch.utils.tensorboard similar we have in TensorboardX Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27252 Test Plan: Check metrics in the Scuba table: https://fburl.com/scuba/k5x8yosj Reviewed By: sanekmelnikov Differential Revision: D17723414 fbshipit-source-id: 64d42e0b4582f635d38f38feb2b2a6c4826f2065 * Automatic update of fbcode/onnx to 2891e1459745933f4bba9a8cb3371cf3c9eb1d16 (#27474) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27474 Previous import was 034921bd574cc84906b7996c07873454b7dd4135 Included changes: - **[2891e145](https://github.com/onnx/onnx/commit/2891e145)**: Fix Unique unit test (#2381) <Scott McKay> - **[25cf73e5](https://github.com/onnx/onnx/commit/25cf73e5)**: update shapeInference h file link (#2369) <prcvih> - **[e3074bc0](https://github.com/onnx/onnx/commit/e3074bc0)**: modify file path (#2378) <prcvih> - **[9058d3a4](https://github.com/onnx/onnx/commit/9058d3a4)**: Incrementing version number to 1.6.0 (#2353) (#2385) <Kevin Chen> - **[c963586d](https://github.com/onnx/onnx/commit/c963586d)**: Remove typing packages from test requirements (#2375) <Aiken Cairncross> Test Plan: ci Reviewed By: bddppq Differential Revision: D17791527 fbshipit-source-id: 23ad5abe313cd4e4eedcbe7794b98450b3b7d3bc * Fixed Select symbolic to export slice when index = negative one (#25273) Summary: Exporting torch.select when index = negative one (x[:,-1]) was broken. This PR has the fix in symbolic function for select. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25273 Reviewed By: hl475 Differential Revision: D17159707 Pulled By: houseroad fbshipit-source-id: 2c3b275421082758f1b63c1c9b6e578f03ca9f76 * Avoid variable shadowing in ``::at::philox_engine::single_round()`` (#27486) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27486 Rename `key` argument of `single_round` method to `in_key` Test Plan: CI Reviewed By: stepancheg, soumith Differential Revision: D17782904 fbshipit-source-id: 6feae55c407f39d41db099b013dcbd3990768603 * Refactor python_android test to separate Android-specific components (#27453) Summary: All of the test cases move into a base class that is extended by the intrumentation test and a new "HostTests" class that can be run in normal Java. (Some changes to the build script and dependencies are required before the host test can actually run.) ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e Pull Request resolved: https://github.com/pytorch/pytorch/pull/27453 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800410 fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181 * Various cleanups to pytorch_android API (#27454) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27454 See detailed discussion at https://github.com/pytorch/pytorch/issues/27350 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800480 Pulled By: dreiss fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0 * Clean up JavaDoc comments in pytorch_android Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27455 Test Plan: Imported from OSS Differential Revision: D17800658 Pulled By: dreiss fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd * FunctionEventAvg implements __iadd__ interface (#27498) Summary: Resolving issue https://github.com/pytorch/pytorch/issues/26433 by making FunctionEventAvg implement the `__iadd__` interface again, like it used to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27498 Differential Revision: D17801918 Pulled By: ezyang fbshipit-source-id: 0597059c903ac168ed64a05ac1decff3ffd14f06 * Move hipify to torch/utils to bundle them into torch package (#27425) Summary: Similar to https://github.com/pytorch/pytorch/pull/27418 but try to put it under "torch" namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/27425 Differential Revision: D17779490 Pulled By: bddppq fbshipit-source-id: 688338d143509b37dfc110df17af3331db48a42b * Ensure NCCL error handling code is disabled for NCCL versions < 2.4 (#27124) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27124 ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL 2.4 to detect errors in NCCL communicators. These were used as part of ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were supported. Although, there is still legitimate use for older NCCL versions and hence we should still support those. For that purpose, in this change I've ensured we disable NCCL error checking for versions < 2.4. ghstack-source-id: 91452959 Test Plan: 1) Test with 2.4.8 2) Test with 2.2.13 3) unit tests. Differential Revision: D17178988 fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698 * #include <stdexcept> into flat_hash_map.h (#27478) Summary: Fixing https://github.com/pytorch/pytorch/issues/27266 In general we should not rely on transitively included headers, we should implicitly include all headers if their members are used in the source file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27478 Differential Revision: D17799522 Pulled By: pbelevich fbshipit-source-id: 5818394a212c947cfac3a6cf042af9ebb8b9d9a0 * Fix broken name mangling Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27511 Test Plan: Imported from OSS Differential Revision: D17801185 Pulled By: jamesr66a fbshipit-source-id: 3eaa9542a445c9401f3f96e11138ec09b0d8350a * Updating submodules Summary: GitHub commits: https://github.com/facebook/fbthrift/commit/e80ecd1d63c956ed34b257fbd1aaef73ef8eb781 https://github.com/facebook/proxygen/commit/6c7a36b1b3f2825fd30ba00c708ec5ceaa5db760 https://github.com/facebookincubator/mvfst/commit/875046204325f9bd8cc5343b98a8fa4b99187a3c https://github.com/facebook/proxygen/commit/442d7def679c297427f5d0b679685db92fe3d28c https://github.com/facebook/wangle/commit/c138dc3d2c0c4f4f68ab4931e44b87a6becb194c https://github.com/facebookincubator/fizz/commit/3833f10989711256704260a01e0c9f7d1c33e468 https://github.com/facebookincubator/katran/commit/6fc473d5304985aa31d351c6305904e80af4b614 https://github.com/pytorch/fbgemm/commit/82d259dade58e53775a534f88b7b48e760f09a64 Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7834a4a8620d0ab9b60060e0abadfba457fb2890 * Revert D17159707: [pytorch][PR] [ONNX] Fixed Select symbolic to export slice when index = negative one Test Plan: revert-hammer Differential Revision: D17159707 Original commit changeset: 2c3b27542108 fbshipit-source-id: accce910abdbe13270d0f592810a48b1dabe4b01 * Roll master to 1.4.0 (#27374) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27374 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17809770 Pulled By: ezyang fbshipit-source-id: 75bd97426494a7bbbf08f9bce7563d35871443d8 * Exponential decay of the weight of task loss (#27508) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27508 Implemented a simple exponential decay of the weight of lr loss function, with a lower bound. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308 canary: f140103452 Reviewed By: chenshouyuan Differential Revision: D17524101 fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20 * docstring only formatting changes: quantize.py, fake_quantize.py, observer.…

Summary: Pull Request resolved: pytorch#37101 Fixes pytorch#36954. The basic concept is to streamline the process of rethrowing c10::Error with extra error information. This is in a few steps: - I completely remodeled the Error data type and the internal invariants. Instead of manually adding in newlines, the message stack formatting process is responsible for inserting newlines and spacing as necessary. Call sites are then modified to respect the new API model. - TORCH_RETHROW macro is added, which adds context to an error message and then rethrows it. New internal assert failure looks like: ``` 0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch. Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first): frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so) frame #1: ... ``` Error message with context looks like: ``` This is an error This is context 1 This is context 2 ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202891 Pulled By: ezyang fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169

Ashkan Aliabadi and others added 30 commits September 25, 2019 11:02

Port CUDA implementation of expm1 to ATen (pytorch#26598)

aaf30cd

Summary: Closes pytorch#24562 Pull Request resolved: pytorch#26598 Differential Revision: D17531503 Pulled By: VitalyFedyunin fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48

Export baddbmm (pytorch#25738)

63fd105

Summary: Added ONNX export for baddbmm in opset9 Pull Request resolved: pytorch#25738 Reviewed By: hl475 Differential Revision: D17565828 Pulled By: houseroad fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5

Fix Future default constructor missing for ParallelNative

334e78b

Summary: Pull Request resolved: pytorch#26739 Test Plan: Imported from OSS Differential Revision: D17577908 Pulled By: bwasti fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8

Revert D17565828: [pytorch][PR] [ONNX] Export baddbmm

b6a1d61

Test Plan: revert-hammer Differential Revision: D17565828 Original commit changeset: 85f605a7b3fa fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37

Cuda101 upgrade (pytorch#26823)

5379e87

Summary: test run: pytorch#26732 Pull Request resolved: pytorch#26823 Reviewed By: soumith Differential Revision: D17576095 Pulled By: mingbowan fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b

Revert D17473200: [pytorch][distributed] add function to get NCCL ver…

4bd1da1

…sion for logging Test Plan: revert-hammer Differential Revision: D17473200 Original commit changeset: 4881ed5221b3 fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576

Highlighting in the doc that square root comes before adding epsilon

660d9e2

Summary: Pull Request resolved: pytorch#26735 Test Plan: Imported from OSS Differential Revision: D17558505 Pulled By: vincentqb fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4

Move the CUDA implementation of log to ATen. (pytorch#26494)

91549ef

Summary: Pull Request resolved: pytorch#26494 Close pytorch#24586 Test Plan: Imported from OSS Differential Revision: D17572497 Pulled By: VitalyFedyunin fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945

Migrate multinomial from the TH to Aten (CUDA) (pytorch#26481)

2eb5923

Summary: pytorch#24604 Pull Request resolved: pytorch#26481 Differential Revision: D17489859 Pulled By: ifedan fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3

QEngine::QNNPACK enabled, module.eval()

ed82a28

Summary: Pull Request resolved: pytorch#26855 Test Plan: Imported from OSS Differential Revision: D17589837 Pulled By: IvanKobzarev fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7

Use optimized_graph in graph_executor.

9df887d

Summary: Pull Request resolved: pytorch#26705 Test Plan: Imported from OSS Differential Revision: D17543281 Pulled By: ZolotukhinM fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9

Remove convert_to_ssa argument from runCleanupPasses - it is only use…

d842435

…d in one place. Summary: Pull Request resolved: pytorch#26703 Test Plan: Imported from OSS Differential Revision: D17543131 Pulled By: ZolotukhinM fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941

lara-hdr and others added 27 commits October 2, 2019 01:17

Fix ONNX Interpolate

f4f6d8d

Summary: Pull Request resolved: pytorch#27179 Reviewed By: hl475 Differential Revision: D17698364 Pulled By: houseroad fbshipit-source-id: 8fddd1c13e7af026962cf2d9c05fd7c957d8526e

C++ API parity: MaxUnpool3d

5005f7b

Summary: Pull Request resolved: pytorch#27027 Test Plan: Imported from OSS Differential Revision: D17682402 Pulled By: pbelevich fbshipit-source-id: 2008ce405176c174cdba88b4f25cd77a82bb13ea

C++ API parity: ELU

c864454

Summary: Pull Request resolved: pytorch#27028 Test Plan: Imported from OSS Differential Revision: D17682406 Pulled By: pbelevich fbshipit-source-id: 9c313237cb93b9870c6fcf8d01b3dbe4af4c6f2a

Eliminate outdated comments

eeaef21

Summary: Pull Request resolved: pytorch#26933 Differential Revision: D17685153 Pulled By: ezyang fbshipit-source-id: e402a12dc9a172649f153903a3a7834004b5667a

C++ API parity: Hardshrink

515e3b8

Summary: Pull Request resolved: pytorch#27035 Test Plan: Imported from OSS Differential Revision: D17682403 Pulled By: pbelevich fbshipit-source-id: 186377fe577abfdd53acc95751a7ed845b51af95

Report docker push / pull time (pytorch#26861)

5835460

Summary: Pull Request resolved: pytorch#26861 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17712801 Pulled By: ezyang fbshipit-source-id: 504594452e6594d79e41856ce5177ab370dc26f1

Suppressing hypothesis health check for qnnpack_add

eb5040c

Summary: Pull Request resolved: pytorch#27193 Test Plan: Imported from OSS Differential Revision: D17704958 Pulled By: zafartahirov fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb

Update export for topk and sort (pytorch#25739)

d93fc64

Summary: updated export for topk and sort as part of opset11 Pull Request resolved: pytorch#25739 Reviewed By: hl475 Differential Revision: D17467131 Pulled By: houseroad fbshipit-source-id: 653be138455728ec8e9bb81ae63dd7ce0c4d0793

Enabled comparison ops with named tensors (pytorch#27162)

5e776d8

Summary: Fixing this [issue](pytorch#27077). Tested via unit tests Pull Request resolved: pytorch#27162 Differential Revision: D17694187 Pulled By: izdeby fbshipit-source-id: 939017c91605c89a0e08e0c3f8fe21de93bba95b

Add insert_prepack_unpack for conv2d (pytorch#27118)

e33ec39

Summary: Pull Request resolved: pytorch#27118 att Test Plan: test_jit.py Imported from OSS Differential Revision: D17717637 fbshipit-source-id: 83c94ff12e6a2137e0161a338fcdd100514c452f

resolve merge conflicts

9a01c0f

bf16 changes for compare op

1d28734

Merge branch 'rohithkrn-rn/up_master' into bf16_bringup_v2

e06737a

Enable bfloat16 for torch test and few ops (pytorch#486 alias) (pytor…

fb2d59c

…ch#492) * enable few torch tests and ops * fix typo * fix typo

rocblas bfloat16 support (pytorch#493)

8f4958f

Add MIOpen Conv BFloat16 support (pytorch#494)

24d551a

* miopen conv bf16 support * fix typo * update error message

rohithkrn merged commit fffbac8 into master Oct 17, 2019

rohithkrn mentioned this pull request Oct 17, 2019

Revert "Bf16 dummy" #2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bf16 dummy#1

Bf16 dummy#1
rohithkrn merged 5901 commits intomasterfrom
bf16_dummy

rohithkrn commented Oct 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

rohithkrn commented Oct 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants