Skip to content

Get changes from main repo#27

Merged
imaginary-person merged 9 commits intoimaginary-person:patch-1from
pytorch:master
Mar 10, 2021
Merged

Get changes from main repo#27
imaginary-person merged 9 commits intoimaginary-person:patch-1from
pytorch:master

Conversation

@imaginary-person
Copy link
Copy Markdown
Owner

Get changes from main repo

JeanKossaifi and others added 9 commits March 10, 2021 11:57
Summary:
This uses the shape of the tensor instead of directly indexing it. This is useful when extending PyTorch's tensor class, e.g. for lazy access. Since the `init` sub-module doesn't check for `torch_function`, it is not possibly to override its functions. Explicitly indexing the tensor will force a call to tensor() and reconstruct the full tensor/explicitly access the elements. Simply using the shape allows to avoid that.

Fixes #53540

Pull Request resolved: #53522

Reviewed By: anjali411

Differential Revision: D26947794

Pulled By: jbschlosser

fbshipit-source-id: 80cd65efed16383f21363cee2eb404c9bc05971c
Summary:
Meant to make tasks like #53728 easier. The `-n` flag enables line numbers, and the `-o` flag reduces noise by only showing the part of the line that matched (which in this case is just the trailing whitespace).

Pull Request resolved: #53733

Test Plan:
```
$ git checkout e937db5
```

Before:
```
$ (! git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' || (echo "The above files have trailing spaces; please remove them"; false))
aten/src/ATen/native/cuda/BatchLinearAlgebra.cu
The above files have trailing spaces; please remove them
```

After:

```
$ (! git grep -I -no ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' || (echo "The above files have trailing spaces; please remove them"; false))
aten/src/ATen/native/cuda/BatchLinearAlgebra.cu:1972:
The above files have trailing spaces; please remove them
```

Reviewed By: mruberry

Differential Revision: D26953538

Pulled By: samestep

fbshipit-source-id: 5f7d48b79f1a02e5e5a09fe00316ec350cfc340e
Summary:
Pull Request resolved: #51674

See
https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceReader

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D26237034

Pulled By: suo

fbshipit-source-id: 4c19f6172d16b710737528d3de48372873b9368d
Summary:
Pull Request resolved: #51676

We offer the ability to access the importer from within packaged modules by doing
`import resources`. This behavior is nice (and more powerful than the
importlib resources API), but I think `resources` is too common a name
(pip has a package for it)

Change to `import torch_package_importer` but open to bikeshedding

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D26620314

Pulled By: suo

fbshipit-source-id: 0942c99f02c0f55f5f3a1b2566961018b796bdd4
Summary:
ref: #16574

Pull Request resolved: #52408

Reviewed By: anjali411

Differential Revision: D26654687

Pulled By: malfet

fbshipit-source-id: 6feb603d8fb03c2ba2a01468bfde1a9901e889fd
…ultiprocessing to torch/distributed/elastic (#53574)

Summary:
Pull Request resolved: #53574

Upstreams `torchelastic/timer|multiprocessing` to `torch/distributed/elastic/timer|multiprocessing`

Test Plan:
```
buck test mode/dev-nosan //caffe2/torch/distributed/elastic/...
buck test mode/dev-nosan //caffe2/test/distributed/elastic/...
buck test mode/dev-nosan //pytorch/elastic/torchelastic/...
buck test mode/dev-nosan //hpc/...
buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/...
```

Reviewed By: borovsky-d, wilson100hong

Differential Revision: D26899809

fbshipit-source-id: e6dbc2a78282eac296c262b3206a979e3ef1ff53
)

Summary:
A number of derived distributions use base distributions in their
implementation.

We add what we hope is a comprehensive test whether all distributions
actually honor skipping validation of arguments in log_prob and then
fix the bugs we found. These bugs are particularly cumbersome in
PyTorch 1.8 and master when validate_args is turned on by default
In addition one might argue that validate_args is not performing
as well as it should when the default is not to validate but the
validation is turned on in instantiation.

Arguably, there is another set of bugs or at least inconsistencies
when validation of inputs does not prevent invalid indices in
sample validation (when with validation an IndexError is raised
in the test). We would encourage the implementors to be more
ambitious when validation is turned on and amend sample validation
to throw a ValueError for consistency.

Pull Request resolved: #53600

Reviewed By: mrshenli

Differential Revision: D26928088

Pulled By: neerajprad

fbshipit-source-id: 52784a754da2faee1a922976e2142957c6c02e28
Summary:
#52875 introduced this bug, as `supports_tensor_out` was replaced with `supports_out` in #53259, so CI checks are failing.

Pull Request resolved: #53745

Reviewed By: gmagogsfm

Differential Revision: D26958151

Pulled By: malfet

fbshipit-source-id: 7cfe5d1c1a33f06cb8be94281ca98c635df76838
Summary:
Pull Request resolved: #53665

ngimel pointed out to me where we already test the behavior of the `Upsample` ops in `test_nn.py`. This PR deleting my bespoke tests in `test_torch.py` and updates those in `test_nn.py` to test memory format properly.

There were two reasons the original test didn't pick up on a memory format regression:
- They didn't test the memory format of the output tensor explicitly, i.e. `output.is_contiguous(memory_format=...)`
- Even with that change, the test tensors were to simple to fail the tests. From some trial and error, it looks like one of the first two dimensions in the inputs needs to be > 1 in order for the `channels_last` memory format to actually re-order the strides.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26929683

Pulled By: bdhirsh

fbshipit-source-id: d17bc660ff031e9b3e2c93c60a9e9308e56ea612
@imaginary-person imaginary-person merged commit d581dd4 into imaginary-person:patch-1 Mar 10, 2021
imaginary-person pushed a commit that referenced this pull request Jul 2, 2021
Summary:
Pull Request resolved: pytorch#60987

We were seeing deadlocks as follows during shutdown:

```
Thread 1 (LWP 2432101):
#0  0x00007efca470190b in __pause_nocancel () from /lib64/libc.so.6
#1  0x00007efca49de485 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
#2  0x00007ef91d4c42c6 in __cuda_CallJitEntryPoint () from /lib64/libnvidia-ptxjitcompiler.so.1
#3  0x00007efc651ac8f1 in ?? () from /lib64/libcuda.so
#4  0x00007efc651aee03 in ?? () from /lib64/libcuda.so
#5  0x00007efc64f76b84 in ?? () from /lib64/libcuda.so
#6  0x00007efc64f77f5d in ?? () from /lib64/libcuda.so
#7  0x00007efc64eac858 in ?? () from /lib64/libcuda.so
#8  0x00007efc64eacfbc in ?? () from /lib64/libcuda.so
#9  0x00007efc7810a924 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#10 0x00007efc780fa2be in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#11 0x00007efc78111044 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#12 0x00007efc7811580a in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#13 0x00007efc78115aa4 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#14 0x00007efc781079ec in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#15 0x00007efc780e6a7a in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#16 0x00007efc7811cfa5 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#17 0x00007efc777ea98c in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#18 0x00007efc777ebd80 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#19 0x00007efc777ea2c9 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#20 0x00007efc778c2e2d in cublasDestroy_v2 () from /usr/local/cuda/lib64/libcublas.so.11
#21 0x00007efc51a3fb56 in std::_Sp_counted_ptr_inplace<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle>, std::allocator<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so
#22 0x00007efc51a3fc5f in std::shared_ptr<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >::~shared_ptr() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so
#23 0x00007efca4648b0c in __run_exit_handlers () from /lib64/libc.so.6
#24 0x00007efca4648c40 in exit () from /lib64/libc.so.6
#25 0x0000558c8852e5f9 in Py_Exit (sts=0) at /tmp/build/80754af9/python_1614362349910/work/Python/pylifecycle.c:2292
#26 0x0000558c8852e6a7 in handle_system_exit () at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:636
#27 0x0000558c8852e742 in PyErr_PrintEx (set_sys_last_vars=<optimized out>, set_sys_last_vars=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:646
#28 0x0000558c88540dd6 in PyRun_SimpleStringFlags (command=0x7efca4dc9050 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=9, pipe_handle=13)\n", flags=0x7ffe3a986110) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:457
#29 0x0000558c88540ead in pymain_run_command (cf=0x7ffe3a986110, command=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:420
#30 pymain_run_python (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:2907
#31 pymain_main (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3460
#32 0x0000558c8854122c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3495
#33 0x00007efca4632493 in __libc_start_main () from /lib64/libc.so.6
#34 0x0000558c884e5e90 in _start () at ../sysdeps/x86_64/elf/start.S:103
```

This was likely caused due to a static singleton that wasn't leaky. Following
the guidance in https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 to
use a leaky singleton instead.
ghstack-source-id: 132847448

Test Plan: Verified locally.

Reviewed By: malfet

Differential Revision: D29468866

fbshipit-source-id: 89250594c5cd2643417b1da584c658b742dc5a5c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants