Conversation
…h#79423) This PR adds references for: - `torch.softmax` - `torch.log_softmax` - `torch.logsumexp` Unfortunately, none of them currently pass `test_python_ref_executor` even with `"aten"` executor. Pull Request resolved: pytorch#79423 Approved by: https://github.com/mruberry
This follows the structure of linalg.svd. Pull Request resolved: pytorch#79072 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD
Will be skipped when imported internally, for more details see https://www.internalfb.com/diff/D37114156?src_version_fbid=3331368873807344 Pull Request resolved: pytorch#79554 Approved by: https://github.com/albanD
See title Addresses pytorch#79540 Error it's causing: ``` 2022-06-14T16:29:53.6335274Z Results (1120.92s): 2022-06-14T16:29:53.6335495Z 393 passed 2022-06-14T16:29:53.6335710Z 1 failed 2022-06-14T16:29:53.6336041Z - test/onnx/test_models.py:155 TestModels_new_jit_API.test_inception 2022-06-14T16:29:53.6336326Z 60 skipped 2022-06-14T16:29:54.4670969Z ##[error]Process completed with exit code 1. 2022-06-14T16:29:54.4730658Z Prepare all required actions 2022-06-14T16:29:54.4730993Z Getting action download info <probably uninteresting folded group, click to show> ``` Pull Request resolved: pytorch#79556 Approved by: https://github.com/janeyx99, https://github.com/seemethere
Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#79493 Approved by: https://github.com/albanD
Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: pytorch#79494 Approved by: https://github.com/albanD
…rch#79541) Allows use of .jenkins/pytorch/build.sh without assuming OUR_GITHUB_JOB_ID is set. This is a regression caused by pytorch#79366. Pull Request resolved: pytorch#79541 Approved by: https://github.com/suo, https://github.com/seemethere
…orch#76601) Fixes pytorch#68299. Fixes pytorch#70875. Test is flaky on ROCm because the HIP runtime occasionally copies asynchronously too quickly for the current sleep value of 50ms. This is not a bug. Increasing the sleep value to 1s to avoid flakiness. Pull Request resolved: pytorch#76601 Approved by: https://github.com/pruthvistony, https://github.com/malfet
Enabling: test_sampled_addmm_errors_cuda_complex128 test_sampled_addmm_errors_cuda_complex64 test_sampled_addmm_errors_cuda_float32 test_sampled_addmm_errors_cuda_float64 test_sparse_add_cuda_complex128 test_sparse_add_cuda_complex64 Pull Request resolved: pytorch#77877 Approved by: https://github.com/pruthvistony, https://github.com/malfet
This process is pretty confusing, so wrote it down. [skip ci] Pull Request resolved: pytorch#79504 Approved by: https://github.com/janeyx99
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: pytorch#79172 Approved by: https://github.com/zengk95
Pull Request resolved: pytorch#79547 Approved by: https://github.com/ejguan
Summary: implemented threshold op for vulkan Test Plan: buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac Differential Revision: D36681867 Pull Request resolved: pytorch#78654 Approved by: https://github.com/SS-JIA
…#79482) Summary: - StaticModule was being created at runtime which was adding overhead to the forked operation - Move staticModule creation to outside of runtime so that StaticRuntime instance can be created on top of same staticModule that is created once Differential Revision: D37126923 Pull Request resolved: pytorch#79482 Approved by: https://github.com/tenpercent
This PR adds variance function with correction argument to nvFuser.
Now it's possible to run
```py
import torch
import torch._refs
from torch._prims.executor import make_traced
def foo1(a):
return torch._refs.var(a, keepdim=False, unbiased=False)
def foo2(a):
return torch._refs.var(a, keepdim=False, correction=2)
a = torch.randn(3, 3, device='cuda')
make_traced(foo1)(a, executor="nvfuser")
make_traced(foo2)(a, executor="nvfuser")
```
Pull Request resolved: pytorch#79517
Approved by: https://github.com/mruberry, https://github.com/jjsjann123
Summary: The _detect_dynamic_vs_static function was added to take in a prepared fx graph model that already had ModelReportObservers built into it and uses the collected information to determine whether input and output are stationary or non-stationary and provides feedback on whether to make linear modules static or dynamic based on this information. This PR will be followed up soon with another PR that will more rigoursly test the whole end to end performance of this system, which is primarily how the function in this PR will be tested for functionality, which is why this one only has 1 test. Test Plan: python test/quantization/fx/test_model_report_fx.py TestModelReportDetectDynamicStatic Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#79326 Approved by: https://github.com/HDCharles
# Summary ShipIt jobs triggered by co-development workflows are failing to merge PRs due to stale checks. This diff skips the stale check when merge is triggered by `facebook-github-bot`. Sample merge failure: pytorch#78654 (comment) Pull Request resolved: pytorch#79572 Approved by: https://github.com/bigfootjon, https://github.com/seemethere, https://github.com/malfet
…8088) Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#78088 Approved by: https://github.com/JackCaoG, https://github.com/wconstab
This reverts commit 05664a9. Reverted pytorch#79494 on behalf of https://github.com/ezyang due to conflicts with earlier diff that needs revert
This reverts commit d332724. Reverted pytorch#78135 on behalf of https://github.com/ezyang due to broke torchvision tests
This reverts commit 6b015af. Reverted pytorch#79493 on behalf of https://github.com/ezyang due to this land races with a revert
…ster (pytorch#79559) Relates to pytorch#76700 **Overview**: Wrote GHA to get the latest commit SHA. Another component of the script is pushing this SHA to the viable/strict branch, which I will test on pytorch/pytorch-canary. Todo in the next PR: add comment explaining cron, replace package installation statements with txt file **Test Plan:** Monitor github actions results to see if the SHA printed is correct by running GHA on pytorch/pytorch-canary. The successful test workflow is [here](https://github.com/pytorch/pytorch-canary/runs/6888486129?check_suite_focus=true). Pull Request resolved: pytorch#79559 Approved by: https://github.com/janeyx99
…#79568) Fixes pytorch#79531 Pull Request resolved: pytorch#79568 Approved by: https://github.com/jbschlosser
Pull Request resolved: pytorch#79545 Approved by: https://github.com/ngimel, https://github.com/albanD
…behavior & warn Pull Request resolved: pytorch#79549 Approved by: https://github.com/ngimel, https://github.com/albanD
Includes pytorch@30fb2c4 and pytorch@95b15c2 Pull Request resolved: pytorch#79593 Approved by: https://github.com/janeyx99
Remove as many references as can be easily done of unittest in favor of our custom infra. Left a todo where I could not easily replace unittest.main with run_tests() Pull Request resolved: pytorch#79546 Approved by: https://github.com/seemethere
Relates to pytorch#76700 **Overview**: One edge case not accounted for in the original logic of `isGreen` was for commits with no workflow checks. Similarly, if any of the required checks are not present (ex: if all of the pull checks are skipped), the workflow should not be promoteble. A commit should only be promoteable if there is it least one workflow check from each required group present (i.e. none of them are skipped) **Test Plan:** Verify that commits on the HUD with no workflow checks are not considered promote-able. Added a test case with no workflows in `test_print_latest_commits.py` Pull Request resolved: pytorch#79565 Approved by: https://github.com/seemethere
see title ``` 2022-06-15T15:20:12.5183743Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 2022-06-15T15:20:12.5183765Z 2022-06-15T15:20:12.5183880Z Broken ops: [ 2022-06-15T15:20:12.5184275Z aten::empty.SymInt(SymInt[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None, int? memory_format=None) -> Tensor 2022-06-15T15:20:12.5184342Z ] 2022-06-15T15:20:12.6303306Z + cleanup 2022-06-15T15:20:12.6303395Z + retcode=1 2022-06-15T15:20:12.6303463Z + set +x 2022-06-15T15:20:12.6345211Z ##[error]Process completed with exit code 1. 2022-06-15T15:20:12.6377724Z Prepare all required actions 2022-06-15T15:20:12.6377844Z Getting action download info ``` Pull Request resolved: pytorch#79612 Approved by: https://github.com/malfet
)" This reverts commit 4ebb326. Reverted pytorch#79034 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
Removes the pytorch- prefix from android jobs to make naming more consistent across all of our jobs Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: pytorch#79435 Approved by: https://github.com/janeyx99
Moves the deply job prefix to a suffix to make it more consistent with our other job names Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: pytorch#79436 Approved by: https://github.com/janeyx99
…#79298) Adds a `same_shape` util and updates maybe_broadcast to use it; previously maybe_broadcast was always broadcasting because its equality check was always failing. Pull Request resolved: pytorch#79298 Approved by: https://github.com/ezyang
…t from the middle (pytorch#75763) Summary: When 3 tensor lists are packed into variadic arg list, it is convenient to view the original lists as (start, end) pairs in the varargs list Differential Revision: D35622049 Pull Request resolved: pytorch#75763 Approved by: https://github.com/mikeiovine
What was happening is that when we have multiple learning rate schedulers, the order in which they are being initialized is not being taken into account. This is a problem if they were being initialized in sequential order (as one might intuitively do). Each scheduler calls `step()` on initialization and sets the `lr` in its optimizer's `params_groups`. However, this means that step 0 will be using the `lr` that was set by the very last scheduler (in the case of initializing schedulers sequentially) instead of the first scheduler. The fix in this PR, addresses the above bug by performing a call to the appropriate scheduler on initialization after decrementing the `last_epoch` values in order to keep them the same post-step. This will ensure that the correct scheduler is the one setting the `lr` values for the optimizer's `param_groups` Pull Request resolved: pytorch#72856 Approved by: https://github.com/jbschlosser
Update release.md with rc validation steps Pull Request resolved: pytorch#79889 Approved by: https://github.com/seemethere
Summary: The use of `c10::nullopt` instead of `c10::optional<Tensor>()` caused a null pointer dereference on Android. Test Plan: On Mac: ``` buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac ``` On Android: ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Reviewed By: SS-JIA Differential Revision: D37189192 Pull Request resolved: pytorch#79701 Approved by: https://github.com/SS-JIA
Last week, we had to disable it due to infra issues. We are now re enabling since the infra issues are past. Test Plan: Reran this job and it looks OK. Pull Request resolved: pytorch#79949 Approved by: https://github.com/seemethere
Pull Request resolved: pytorch#79874 Approved by: https://github.com/robieta
We have hard limitations on the number of linux.16xlarge.nvidia.gpu machines we can spin up. Considering that the TTS for this specific job has increased 2x over the past 7 days. Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: pytorch#79894 Approved by: https://github.com/malfet, https://github.com/janeyx99
…79942) Summary: Memory planner destruction was hitting [this assertion](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/c10/util/intrusive_ptr.h?lines=117) in debug mode for a few models. Here's what was going on: 1) The set of unmanaged `IValue`s acquires one or more owning refs of a managed `StorageImpl` 2) Then, one or more tensors in that storage group have their `StorageImpl` swapped out during execution 3) During `deallocateManagedTensors`, we swap the correct `StorageImpl` back in, [calling `unsafe_adapt_non_heap_allocated` again and resetting the refcount](https://www.internalfb.com/code/fbsource/[f8baf8a0bab462c860d2eb7491a4e3fb40d2907a]/fbcode/caffe2/torch/csrc/jit/runtime/static/memory_planner.cpp?lines=446-452) 4) The unmanaged `IValues` are deallocated, decrementing the refcount into the danger zone. So, we just have to make sure that unmanaged `IValue`s are destructed before we deallocate the managed tensors. Test Plan: CI Differential Revision: D37303728 Pull Request resolved: pytorch#79942 Approved by: https://github.com/tenpercent
…#79905) We now use Rockset to store test case stats and we can glean flakiness from that data, so this complicated logic isn't necessary anymore! This will also prevent spurious red on trunk that occurs with this code e.g., https://github.com/pytorch/pytorch/runs/6968080598?check_suite_focus=true ## test plan print_test_stats work on the PR: https://github.com/pytorch/pytorch/runs/6974961237?check_suite_focus=true though it seems several jobs ran into HTTP 500 and Memory errors which I don't think is related to my code + more a problem with the server https://github.com/pytorch/pytorch/runs/6974938954?check_suite_focus=true Pull Request resolved: pytorch#79905 Approved by: https://github.com/zengk95, https://github.com/seemethere
…torch#79945) ### Motivation In order to match the internal platform010 builds, we are creating a new config to run on PRs that uses compiler and package versions >= used in platform010. Here are the versions used in the new build - Ubuntu 22.04 (Jammy-Jellyfish) - Clang-12 - Python 3.8 - CUDA 11.6 ### Summary of Changes - As `nvidia/docker` images only support Cuda 11.7 with Ubuntu 22.04, we are starting with base Ubuntu 22.04 docker images, and then installing Cuda 11.6 - Fetching `install_cuda.sh` from `pytorch/builder` repo in order to install Cuda using `wget` - `libssl-dev` has been upgraded to libssl3 in Ubuntu 22.04. Instead, we are using `include` and `lib` folders downloaded with `Openssl1.1.1` - `Clang-12` requires `libomp-12-dev` to work with `OpenMP` which is added to the `install_base.sh` file. - Minor fixes to handle compilation errors generated when using `clang-12`. - In `pow_test.cpp` adding a `static_cast` to input of `sqrt` method - In `vec512_qint.h`, explicitly defining `copy-assignment` operator as its implicit definition is deprecated due to user-declared `copy-constructor` in C++11 Pull Request resolved: pytorch#79945 Approved by: https://github.com/seemethere, https://github.com/kit1980
Ref: pytorch#69991 Pull Request resolved: pytorch#78515 Approved by: https://github.com/zou3519, https://github.com/Lezcano
For ignored modules' parameters, we should also clean their parameter names since they will have the FSDP-specific prefixes. This change only affects the prefixed parameter name keys in `full_optim_state_dict()` (i.e. optim state dict saving). Not having this change does not actually violate the correctness of the optim state dict save-load flow because it only requires that the keys are unique and internally consistent. Either way, this PR explicitly adds the specification now that the parameter keys in the optim state dict should match the keys of full model state dict. Pull Request resolved: pytorch#79955 Approved by: https://github.com/rohan-varma
Otherwise it errors on ONNX clang10 build: https://github.com/pytorch/pytorch/runs/6976721937?check_suite_focus=true Pull Request resolved: pytorch#79964 Approved by: https://github.com/malfet
…utputs Without profiled outputs, autodiff can't tell whether or not the outputs of a DifferentiableGraph should requires_grad. Autodiff would default to requires_grad=True if there was no profiled information, causing autodiff to mark tensors as requires_grad when they shouldn't have. This adds requires_grad info onto the type of the output, if it can be found in later uses of the output. Adds a test for correct autodiff requires_grad behavior and also a test to make sure the output type is correctly annotated in create_autodiff_subgraphs. Pull Request resolved: pytorch#79498 Approved by: https://github.com/eellison
Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#79947 Approved by: https://github.com/ngimel
Fixes pytorch#79871 Make `module.cpp` tests respect change that was made in pytorch#78436 (no int types in autograd). Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before. As far as I can tell it should be executed, because it's included here https://github.com/pytorch/pytorch/blob/79507d2a9d06d4a3fb50eb21b30e08cc044776ce/test/cpp/api/CMakeLists.txt#L17:L17 Pull Request resolved: pytorch#79926 Approved by: https://github.com/soulitzer
…torch#74704)" This reverts commit 93b0fec. Reverted pytorch#74704 on behalf of https://github.com/malfet due to broke torchvision
per title Pull Request resolved: pytorch#79973 Approved by: https://github.com/mruberry
Currently, `bitonicSortKVInPlace` is written to sort one array per block of threads. If that dimension happens to be very small (<128 elements), this results in low thread occupancy. Instead, this changes `bitonicSortKVInPlace` to operate with a 2d block. Sorting happens along the x dimension, and the y dimension is a fixed size batch. Pull Request resolved: pytorch#79627 Approved by: https://github.com/ngimel
b1ed31b to
da0c0d8
Compare
da0c0d8 to
9d6c3d8
Compare
csarofeen
approved these changes
Jun 23, 2022
shmsong
pushed a commit
to shmsong/pytorch
that referenced
this pull request
Jul 24, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation; - Indexing refactor: remove reference tensor creation in all tensor indexing logic (csarofeen#1690) - (more) generic grouped grid reduction kernel; - Minor parser/fuser patches: 1. zero-dim tensor reduction support 3. no-op binary removal within fused graph 4. expand supported in fusion Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (csarofeen#1775) d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (csarofeen#1690) 1b65299 Issue 1770 (csarofeen#1774) 35b0427 Avoid compilation errors like below: (csarofeen#1773) 452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (csarofeen#1771) 31d6c56 TransformPropagator refactor (csarofeen#1769) 570c5a8 Merge pull request csarofeen#1767 from csarofeen/upstream_merge_0621 9d6c3d8 merging upstream 61305cd 0ed815f New TransformPropagator algorithm (csarofeen#1763) 6c19520 no-op binary removal (csarofeen#1764) ec7fa41 Proper propagation of IterType (csarofeen#1762) b263562 Fix dimensionality check (csarofeen#1759) 2d6343f More generic grouped grid reduction kernel (csarofeen#1740) 64e2b56 [nvfuser] prevent spamming warning message (pytorch#77777) (csarofeen#1758) 0c43162 [nvFuser] Improving bitwise ops support (pytorch#77158) (csarofeen#1757) b93a147 Parser expand (csarofeen#1754) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#80355 Approved by: https://github.com/davidberard98
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
merged upstream master into our devel branch.
The commit of choice
61305cd638b6fcd73a0b66b4cde7014fecb9e8ceviable/strict from 06/21/2022