Skip to content

Upstream merge 0621#1767

Merged
csarofeen merged 1302 commits intodevelfrom
upstream_merge_0621
Jun 23, 2022
Merged

Upstream merge 0621#1767
csarofeen merged 1302 commits intodevelfrom
upstream_merge_0621

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

merged upstream master into our devel branch.

The commit of choice 61305cd638b6fcd73a0b66b4cde7014fecb9e8ce viable/strict from 06/21/2022

IvanYashchuk and others added 30 commits June 14, 2022 19:43
…h#79423)

This PR adds references for:

- `torch.softmax`
- `torch.log_softmax`
- `torch.logsumexp`

Unfortunately, none of them currently pass `test_python_ref_executor` even with `"aten"` executor.
Pull Request resolved: pytorch#79423
Approved by: https://github.com/mruberry
This follows the structure of linalg.svd.

Pull Request resolved: pytorch#79072

Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
See title

Addresses pytorch#79540

Error it's causing:
```
2022-06-14T16:29:53.6335274Z Results (1120.92s):
2022-06-14T16:29:53.6335495Z      393 passed
2022-06-14T16:29:53.6335710Z        1 failed
2022-06-14T16:29:53.6336041Z          - test/onnx/test_models.py:155 TestModels_new_jit_API.test_inception
2022-06-14T16:29:53.6336326Z       60 skipped
2022-06-14T16:29:54.4670969Z ##[error]Process completed with exit code 1.
2022-06-14T16:29:54.4730658Z Prepare all required actions
2022-06-14T16:29:54.4730993Z Getting action download info
<probably uninteresting folded group, click to show>
```
Pull Request resolved: pytorch#79556
Approved by: https://github.com/janeyx99, https://github.com/seemethere
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#79493

Approved by: https://github.com/albanD
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#79494

Approved by: https://github.com/albanD
…rch#79541)

Allows use of .jenkins/pytorch/build.sh without assuming OUR_GITHUB_JOB_ID is set.  This is a regression caused by pytorch#79366.
Pull Request resolved: pytorch#79541
Approved by: https://github.com/suo, https://github.com/seemethere
…orch#76601)

Fixes pytorch#68299.  Fixes pytorch#70875.

Test is flaky on ROCm because the HIP runtime occasionally copies asynchronously too quickly for the current sleep value of 50ms.  This is not a bug.  Increasing the sleep value to 1s to avoid flakiness.
Pull Request resolved: pytorch#76601
Approved by: https://github.com/pruthvistony, https://github.com/malfet
Enabling:
test_sampled_addmm_errors_cuda_complex128
test_sampled_addmm_errors_cuda_complex64
test_sampled_addmm_errors_cuda_float32
test_sampled_addmm_errors_cuda_float64
test_sparse_add_cuda_complex128
test_sparse_add_cuda_complex64

Pull Request resolved: pytorch#77877
Approved by: https://github.com/pruthvistony, https://github.com/malfet
This process is pretty confusing, so wrote it down.

[skip ci]

Pull Request resolved: pytorch#79504

Approved by: https://github.com/janeyx99
Summary: implemented threshold op for vulkan

Test Plan: buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac

Differential Revision: D36681867

Pull Request resolved: pytorch#78654
Approved by: https://github.com/SS-JIA
…#79482)

Summary:
- StaticModule was being created at runtime which was adding overhead to the forked operation
- Move staticModule creation to outside of runtime so that StaticRuntime instance can be created on top of same staticModule that is created once

Differential Revision: D37126923

Pull Request resolved: pytorch#79482
Approved by: https://github.com/tenpercent
This PR adds variance function with correction argument to nvFuser.

Now it's possible to run
```py
import torch
import torch._refs
from torch._prims.executor import make_traced

def foo1(a):
    return torch._refs.var(a, keepdim=False, unbiased=False)

def foo2(a):
    return torch._refs.var(a, keepdim=False, correction=2)

a = torch.randn(3, 3, device='cuda')
make_traced(foo1)(a, executor="nvfuser")
make_traced(foo2)(a, executor="nvfuser")
```

Pull Request resolved: pytorch#79517
Approved by: https://github.com/mruberry, https://github.com/jjsjann123
Summary: The _detect_dynamic_vs_static function was added to take in a
prepared fx graph model that already had ModelReportObservers built into
it and uses the collected information to determine whether input and
output are stationary or non-stationary and provides feedback on whether
to make linear modules static or dynamic based on this information.

This PR will be followed up soon with another PR that will more
rigoursly test the whole end to end performance of this system, which is
primarily how the function in this PR will be tested for functionality,
which is why this one only has 1 test.

Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportDetectDynamicStatic

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: pytorch#79326

Approved by: https://github.com/HDCharles
# Summary

ShipIt jobs triggered by co-development workflows are failing to merge PRs due to stale checks.  This diff skips the stale check when merge is triggered by `facebook-github-bot`.

Sample merge failure: pytorch#78654 (comment)
Pull Request resolved: pytorch#79572
Approved by: https://github.com/bigfootjon, https://github.com/seemethere, https://github.com/malfet
This reverts commit 05664a9.

Reverted pytorch#79494 on behalf of https://github.com/ezyang due to conflicts with earlier diff that needs revert
This reverts commit d332724.

Reverted pytorch#78135 on behalf of https://github.com/ezyang due to broke torchvision tests
This reverts commit 6b015af.

Reverted pytorch#79493 on behalf of https://github.com/ezyang due to this land races with a revert
…ster (pytorch#79559)

Relates to pytorch#76700

**Overview**: Wrote GHA to get the latest commit SHA. Another component of the script is pushing this SHA to the viable/strict branch, which I will test on pytorch/pytorch-canary.

Todo in the next PR: add comment explaining cron, replace package installation statements with txt file

**Test Plan:** Monitor github actions results to see if the SHA printed is correct by running GHA on pytorch/pytorch-canary. The successful test workflow is [here](https://github.com/pytorch/pytorch-canary/runs/6888486129?check_suite_focus=true).
Pull Request resolved: pytorch#79559
Approved by: https://github.com/janeyx99
Remove as many references as can be easily done of unittest in favor of our custom infra.

Left a todo where I could not easily replace unittest.main with run_tests()
Pull Request resolved: pytorch#79546
Approved by: https://github.com/seemethere
Relates to pytorch#76700

**Overview**: One edge case not accounted for in the original logic of `isGreen` was for commits with no workflow checks. Similarly, if any of the required checks are not present (ex: if all of the pull checks are skipped), the workflow should not be promoteble. A commit should only be promoteable if there is it least one workflow check from each required group present (i.e. none of them are skipped)

**Test Plan:** Verify that commits on the HUD with no workflow checks are not considered promote-able. Added a test case with no workflows in `test_print_latest_commits.py`
Pull Request resolved: pytorch#79565
Approved by: https://github.com/seemethere
see title
```
2022-06-15T15:20:12.5183743Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.
2022-06-15T15:20:12.5183765Z
2022-06-15T15:20:12.5183880Z Broken ops: [
2022-06-15T15:20:12.5184275Z 	aten::empty.SymInt(SymInt[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None, int? memory_format=None) -> Tensor
2022-06-15T15:20:12.5184342Z ]
2022-06-15T15:20:12.6303306Z + cleanup
2022-06-15T15:20:12.6303395Z + retcode=1
2022-06-15T15:20:12.6303463Z + set +x
2022-06-15T15:20:12.6345211Z ##[error]Process completed with exit code 1.
2022-06-15T15:20:12.6377724Z Prepare all required actions
2022-06-15T15:20:12.6377844Z Getting action download info

```
Pull Request resolved: pytorch#79612
Approved by: https://github.com/malfet
seemethere and others added 22 commits June 21, 2022 19:05
Removes the pytorch- prefix from android jobs to make naming more
consistent across all of our jobs

Signed-off-by: Eli Uriegas <eliuriegasfb.com>

Pull Request resolved: pytorch#79435

Approved by: https://github.com/janeyx99
Moves the deply job prefix to a suffix to make it more consistent with
our other job names

Signed-off-by: Eli Uriegas <eliuriegasfb.com>

Pull Request resolved: pytorch#79436

Approved by: https://github.com/janeyx99
…#79298)

Adds a `same_shape` util and updates maybe_broadcast to use it; previously maybe_broadcast was always broadcasting because its equality check was always failing.
Pull Request resolved: pytorch#79298
Approved by: https://github.com/ezyang
…t from the middle (pytorch#75763)

Summary: When 3 tensor lists are packed into variadic arg list, it is convenient to view the original lists as (start, end) pairs in the varargs list

Differential Revision: D35622049

Pull Request resolved: pytorch#75763
Approved by: https://github.com/mikeiovine
What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do).

Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler.

The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups`
Pull Request resolved: pytorch#72856
Approved by: https://github.com/jbschlosser
Update release.md with rc validation steps

Pull Request resolved: pytorch#79889
Approved by: https://github.com/seemethere
Summary: The use of `c10::nullopt` instead of `c10::optional<Tensor>()` caused a null pointer dereference on Android.

Test Plan:
On Mac:
```
buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac
```

On Android:
```
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```

Reviewed By: SS-JIA

Differential Revision: D37189192

Pull Request resolved: pytorch#79701
Approved by: https://github.com/SS-JIA
Last week, we had to disable it due to infra issues. We are now re enabling since the infra issues are past.

Test Plan:
Reran this job and it looks OK.
Pull Request resolved: pytorch#79949
Approved by: https://github.com/seemethere
We have hard limitations on the number of linux.16xlarge.nvidia.gpu
machines we can spin up. Considering that the TTS for this specific job
has increased 2x over the past 7 days.

Signed-off-by: Eli Uriegas <eliuriegasfb.com>

Pull Request resolved: pytorch#79894

Approved by: https://github.com/malfet, https://github.com/janeyx99
…79942)

Summary:
Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models.

Here's what was going on:

1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl`
2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution
3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452)
4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone.

So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors.

Test Plan: CI

Differential Revision: D37303728

Pull Request resolved: pytorch#79942
Approved by: https://github.com/tenpercent
…#79905)

We now use Rockset to store test case stats and we can glean flakiness from that data, so this complicated logic isn't necessary anymore!

This will also prevent spurious red on trunk that occurs with this code e.g., https://github.com/pytorch/pytorch/runs/6968080598?check_suite_focus=true

## test plan
print_test_stats work on the PR: https://github.com/pytorch/pytorch/runs/6974961237?check_suite_focus=true though it seems several jobs ran into HTTP 500 and Memory errors which I don't think is related to my code + more a problem with the server https://github.com/pytorch/pytorch/runs/6974938954?check_suite_focus=true

Pull Request resolved: pytorch#79905
Approved by: https://github.com/zengk95, https://github.com/seemethere
…torch#79945)

###  Motivation

In order to match the internal platform010 builds, we are creating a new config to run on PRs that uses compiler and package versions >= used in platform010. Here are the versions used in the new build

- Ubuntu 22.04 (Jammy-Jellyfish)
- Clang-12
- Python 3.8
- CUDA 11.6

### Summary of Changes

- As `nvidia/docker` images only support Cuda 11.7 with Ubuntu 22.04, we are starting with base Ubuntu 22.04 docker images, and then installing Cuda 11.6

- Fetching `install_cuda.sh` from `pytorch/builder` repo in order to install Cuda using `wget`

- `libssl-dev` has been upgraded to libssl3 in Ubuntu 22.04. Instead, we are using `include` and `lib` folders downloaded with `Openssl1.1.1`

- `Clang-12` requires `libomp-12-dev` to work with `OpenMP` which is added to the `install_base.sh` file.

- Minor fixes to handle compilation errors generated when using `clang-12`.
      - In `pow_test.cpp` adding a `static_cast` to input of `sqrt` method
      - In `vec512_qint.h`, explicitly defining `copy-assignment` operator as its implicit definition is deprecated due to
         user-declared `copy-constructor` in C++11
Pull Request resolved: pytorch#79945
Approved by: https://github.com/seemethere, https://github.com/kit1980
For ignored modules' parameters, we should also clean their parameter names since they will have the FSDP-specific prefixes.

This change only affects the prefixed parameter name keys in `full_optim_state_dict()` (i.e. optim state dict saving). Not having this change does not actually violate the correctness of the optim state dict save-load flow because it only requires that the keys are unique and internally consistent.

Either way, this PR explicitly adds the specification now that the parameter keys in the optim state dict should match the keys of full model state dict.
Pull Request resolved: pytorch#79955
Approved by: https://github.com/rohan-varma
…utputs

Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output.

Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs.

Pull Request resolved: pytorch#79498

Approved by: https://github.com/eellison
Fixes pytorch#79871

Make `module.cpp` tests respect change that was made in pytorch#78436 (no int types in autograd).

Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before.

As far as I can tell it should be executed, because it's included here https://github.com/pytorch/pytorch/blob/79507d2a9d06d4a3fb50eb21b30e08cc044776ce/test/cpp/api/CMakeLists.txt#L17:L17

Pull Request resolved: pytorch#79926
Approved by: https://github.com/soulitzer
Currently, `bitonicSortKVInPlace` is written to sort one array per
block of threads. If that dimension happens to be very small
(<128 elements), this results in low thread occupancy.

Instead, this changes `bitonicSortKVInPlace` to operate with a 2d
block. Sorting happens along the x dimension, and the y dimension
is a fixed size batch.

Pull Request resolved: pytorch#79627

Approved by: https://github.com/ngimel
@jjsjann123 jjsjann123 requested a review from csarofeen June 22, 2022 09:16
@jjsjann123 jjsjann123 requested a review from mruberry as a code owner June 22, 2022 09:16
@jjsjann123 jjsjann123 force-pushed the upstream_merge_0621 branch from b1ed31b to da0c0d8 Compare June 22, 2022 09:21
@jjsjann123 jjsjann123 force-pushed the upstream_merge_0621 branch from da0c0d8 to 9d6c3d8 Compare June 22, 2022 12:04
@csarofeen csarofeen merged commit 570c5a8 into devel Jun 23, 2022
shmsong pushed a commit to shmsong/pytorch that referenced this pull request Jul 24, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation;
- Indexing refactor: remove reference tensor creation in all tensor indexing logic (csarofeen#1690)
- (more) generic grouped grid reduction kernel;
- Minor parser/fuser patches:
  1. zero-dim tensor reduction support
  3. no-op binary removal within fused graph
  4. expand supported in fusion

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (csarofeen#1775)
d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (csarofeen#1690)
1b65299 Issue 1770 (csarofeen#1774)
35b0427 Avoid compilation errors like below: (csarofeen#1773)
452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (csarofeen#1771)
31d6c56 TransformPropagator refactor (csarofeen#1769)
570c5a8 Merge pull request csarofeen#1767 from csarofeen/upstream_merge_0621
9d6c3d8 merging upstream 61305cd
0ed815f New TransformPropagator algorithm (csarofeen#1763)
6c19520 no-op binary removal (csarofeen#1764)
ec7fa41 Proper propagation of IterType (csarofeen#1762)
b263562 Fix dimensionality check (csarofeen#1759)
2d6343f More generic grouped grid reduction kernel (csarofeen#1740)
64e2b56 [nvfuser] prevent spamming warning message (pytorch#77777) (csarofeen#1758)
0c43162 [nvFuser] Improving bitwise ops support (pytorch#77158) (csarofeen#1757)
b93a147 Parser expand (csarofeen#1754)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: pytorch#80355
Approved by: https://github.com/davidberard98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.