Skip to content

[WIP][DO NOT REVIEW] upstream push smoke test#82937

Closed
jjsjann123 wants to merge 909 commits intopytorch:masterfrom
jjsjann123:upstream_push_0806
Closed

[WIP][DO NOT REVIEW] upstream push smoke test#82937
jjsjann123 wants to merge 909 commits intopytorch:masterfrom
jjsjann123:upstream_push_0806

Conversation

@jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Aug 6, 2022

place holder PR for nvfuser code bump. smoke test for CI. Real PR should go through upstream repo.

dfe02f3faed4c64477e5f5c678f21f33415d0195 Merge remote-tracking branch 'csarofeen/devel' into HEAD
16173732ecfafc4797e93c2449cfb778015a6c7a Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884)
7cfb7796bdcf055eb61d600b7b5c9df292950290 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6de62061d30781de50ef1862bbfb1615173 Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5bba3bc158d41ccbefa0ee2c5ceea7aedb Add `UnaryOpType::Print` which can be helpful for debugging (#1878)
0646522454aa715ef164c88a73fb8bdddc706805 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881)
7bc76aa219293a59e4166e258d76289fe13633ca Fix most inlined propagator for mismatched dims (#1875)
501f4aa270bf4dd47b0d2f4860bc6f23ebc32a38 Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826)
d863d690f923047a85b5229a787118708f810741 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827)
e0ae11a61c87cd998e88ddd79a496548171c31e0 Larger sized mma instructions to support full vectorization (#1824)
9bb4cf7a66b098f04c9d95a2d34ab2bceee151b3 fragment iteration to support fully unrolled mma ops (#1823)
a48270a18dc2d3accc2626758d14d5858ae55032 Merge all dims in pointwise scheduler (#1872)
172fb3673fb4aaf4c1e889922a4fc5c06cbd59f7 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868)
a64462a5ac2fcf57a177bf36b0f26c61a4e252a4 Allow trivial reduction to be merged (#1871)
440102bcda6eb1dcd42d5fa5aeab9d6b049956bc Symmetric API for BestEffortReplay (#1870)
d1caf330c08ea8002f7133ca655bbd5b28c4eb98 Some misc cleanups/refactor split out from #1854 (#1867)
1013eda50be38eac96c00ba781340ac199d5a136 Remove some welford specific logic. (#1864)
51589d36be5a101d06e641fe0400b39028b7cb81 Some cleanups on tests and heuristics params (#1866)
a6b3e70da5dee51dbc246347228ea21384e46ac3 Segmenter bug fix, and deterministic iteration ordering.  (#1865)
1b665b9b5e562d6f0caba5e7319e83e5df64104f Add nullptr checks to IrBuilder (#1861)
1cd9451d7493f631c2837ba07c1ea93a74e83a15 Simplify matmul scheduling with the new transform propagator.  (#1817)
bbc1fb9b8c454f557ab9fcf5b1c3cef9b9e136d0 Add leaky_relu operation (#1852)
e842a9bab5e9f7289b7ce33ee37a682b22373f49 Minor cleanup in pointwise scheduler (#1858)
9ee850ca2f7f51dd5269bffb1255e485f809282d Fix stringstream usage (#1857)
20a36c1e4f28c4ff9837e56784be2686d17435f3 Improve nsight compute support (#1855)
405910308301097297b55c34d560aab6a360e897 Remove debugging `true ||` from getPointwiseHeuristics (#1822)
01117bfe8fdfacdbfdcfba9a624cdf900fe044d4 Misc cleanup (#1853)
5cc64943dc381a568223140bce0f22163c01e29f Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846)
92e6f0207e3a89fe90fd5cd3ffc575dfd766ba00 Cleanup normalization scheduler (#1845)
db89c6591a2f21130599a93675e0615e55564e41 Type inference patch (#1848)
102fe93a4605ca465cda26ebaee4ba1af2026901 Add debug dump for InlinePropagator (#1847)
b7a4d93d375a6e2ddef483763c93ffddc62ec452 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687)
942be5b256056d0e02877361b814ae6af32ca15f Upstream ci build fixes (#1842)
0b83645915029d67f9345aa4649b8c6f62b0061b Fix vectorization bug introduced in #1831 (#1840)
63630f1ae091180e541932a9d9dc598e0a9902dd Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825)
9135a963c01d97ba34b1a7d2f106e78a13fd6651 Fix transpose benchmark dtype (#1839)
2c9a6c02312d5bf4f83cde653b847b4f85849432 Add extra configurability to `parallelizeAllLike` (#1831)

naoyam and others added 30 commits March 17, 2022 19:44
* initial volta support

* mma parallel type && cleanup

* cleanup

* alignment

* comment

* change request

* fix same parallel type

* move validation pass

* comment and cleanup

* lint

* comment and cleanup

* comment and format
* Propagate new symbol throughout fusion using ValReplacementMutator
* Replace FusionViewFailPersistent with FusionViewPersistentShmoo
* Create separate test-gpu-view.cpp for view tests
* Move replaceValue to ir_utils
fusion_args prints arguments given to runFusion.

kernel_args prints arguments given to generated CUDA kernels
* Fixes validation of vectorization with contig indexing

True contig indexing needs reference tensors, so finding vectorized
contig domains at the initial validation time can result in false
positives and negatives. Fixed by filling that information at the time
of indexing.

Also considered to keep it separated from indexing and fill it at the
validation time, but it would end up replicating the same logic as reference
replay.

Closes pytorch#1534
To highlight the impact of the change, renamed `IterDomain::clone()` to `IterDomain::cloneWithoutRFactor()`.
* save

* save

* save

* save
…ch#1552)

* Fix ComputeAtRootDomainMap with broadcast in view root domains

Fixes pytorch#1549
…torch#1529)

* Allow vectorization with contig-merged domains in pwise scheduler
* Forward merging of trivial-reduction dims in producers

* Enable trivial reduction forwarding only when trivial reduction domain
is a root domain.

For example, splitting a reduction domain by 1 and merging it with
another non-reduction domain would result in a trivial-reduction merge.

Probably possible to allow such non-root trivial reduction domains, but that
would mean, e.g., a leaf domain would be mappable yet its root domain
could be unmappable, which seems rather confusing. Considering such
transformations would be unlikely, not enabling forwarding would be fine
and would cause less surprise.
…1556)

* Propagate root domain mappings from rfactor to root domains in
ComputeAtRootDomainMap

The main purpose of ComputeAtRootDomainMap is to find unmappable domains
for comptueAt. This analyais is done by traversing a fusion in a
backward direction. Currently, the traversal only visits arithmetic
expressions, so information propagation is done from consumer tensors to
producer tensors. This propagation is also required from rfactor domains
to root domains. Previously it doesn't really matter as rfactor is
limited reduction domains, but that's not the case with view.

This change also means that ComputeAtRootDomain does not guarantee
one-to-one mappings. For example,

```
tv0: [I0, I1]
tv1 = view(tv0); // tv1: [I0*I1/N, N]
```

I.e., the view op is done first merging the two domains of `tv0` and
then splitting it by N. Note that both of the two rfactor axes of `tv1`
are now mapped with the two axes of `tv0`.

Because of this change, `ComputeAtRootDomainMap:mapBestEffort` and other
mapping functions between a producer and a consumer that is supposed to
return a one-to-one map can fail.
`ComputeAtRootDomainMap::getMappableDims` is fine as it just grabs any
domain that is mappable.

`ComputeAtRootDomainMap::mapConsumerToProducer` and
`ComputeAtRootDomainMap::mapProducerToConsumer` were used in
`TransformReplay::replayPasC` and `TransformReplay::replayCasP`, but
they don't really need to use `ComputeAtRootDomainMap` but just
`PairwiseRootDomainMap` is sufficient, so replaed the usages with the
pairwise variant.
* Minor fix on python test
Add flatten support on the python side
pytorch#1559)

* Added a more helpful error message when checking for empty outputs on the Fusion.

* Clang fix.
…#1561)

* do not re-compute unary op with output and allow expr duplication in debug print.
* always allocate dynamic smem

* add driver API call for large smem usage

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>
This reverts commit 2d5e4cf.
csarofeen and others added 17 commits July 25, 2022 23:49
* Remove some welford specific logic.

* Multi-reduction fix

* Some more minor cleanup.

* Add a note on multi-input reductions

Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>
Split out from pytorch#1854
- The `InlinePropagatorSelector` seems to be less generally useful than `BoundedPropagationSelector`, so I made `InlinePropagatorSelector` a private class of `compute_at.cpp` and renamed it to `ComputeAtSelector`, and moved `BoundedPropagationSelector` to `maxinfo_propagator.h` and renamed it to `SetSelector`.
- Split `DomainMap` from `pointwise.cpp` into `pointwise_utils.cpp`, and renamed some functions.
- Add two cache entry: `DOMAIN_MAP` and `REFERENCE_TENSORS`, and use them to in the pointwise scheduler.
@jjsjann123 jjsjann123 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 6, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 6, 2022

🔗 Helpful links

❌ 5 New Failures

As of commit 932a0e1 (more details on the Dr. CI page):

Expand to see more
  • 5/5 failures introduced in this PR

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build trunk / macos-12-py3-x86-64 / test (default, 2, 2, macos-12) (1/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:17.6607890Z ##[error]The operation was canceled.
2022-08-09T13:00:56.4930920Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T13:00:56.4931220Z   TEST_CONFIG: default
2022-08-09T13:00:56.4931590Z   SHARD_NUMBER: 2
2022-08-09T13:00:56.4931880Z   NUM_TEST_SHARDS: 2
2022-08-09T13:00:56.4932140Z   PR_BODY: 
2022-08-09T13:00:56.4932410Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T13:00:56.4932840Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T13:00:56.4933140Z   CONDA: /Users/runner/miniconda3
2022-08-09T13:00:56.4933460Z   CONDA_PKGS_DIR: /Users/runner/conda_pkgs_dir
2022-08-09T13:00:56.4933740Z ##[endgroup]
2022-08-09T16:56:17.6607890Z ##[error]The operation was canceled.
2022-08-09T16:56:17.6676490Z Prepare all required actions
2022-08-09T16:56:17.6677360Z Getting action download info
2022-08-09T16:56:17.9406380Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:18.2912810Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:18.2913480Z with:
2022-08-09T16:56:18.2915150Z   github-token: ***
2022-08-09T16:56:18.2915560Z env:
2022-08-09T16:56:18.2916310Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:18.2916590Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T16:56:18.2916900Z   TEST_CONFIG: default

See GitHub Actions build trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12) (2/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:32.0204410Z ##[error]The operation was canceled.
2022-08-09T12:58:44.6468940Z env:
2022-08-09T12:58:44.6469050Z   GIT_DEFAULT_BRANCH: master
2022-08-09T12:58:44.6469200Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T12:58:44.6469330Z   TEST_CONFIG: default
2022-08-09T12:58:44.6469450Z   SHARD_NUMBER: 1
2022-08-09T12:58:44.6469560Z   NUM_TEST_SHARDS: 2
2022-08-09T12:58:44.6469670Z   PR_BODY: 
2022-08-09T12:58:44.6469790Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T12:58:44.6469930Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T12:58:44.6470050Z ##[endgroup]
2022-08-09T16:56:32.0204410Z ##[error]The operation was canceled.
2022-08-09T16:56:32.0237260Z Prepare all required actions
2022-08-09T16:56:32.0237650Z Getting action download info
2022-08-09T16:56:32.2451870Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:32.4859240Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:32.4859400Z with:
2022-08-09T16:56:32.4859730Z   github-token: ***
2022-08-09T16:56:32.4859860Z env:
2022-08-09T16:56:32.4859980Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:32.4860130Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T16:56:32.4860270Z   TEST_CONFIG: default

See GitHub Actions build trunk / macos-12-py3-x86-64 / test (default, 1, 2, macos-12) (3/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:10.3471840Z ##[error]The operation was canceled.
2022-08-09T13:03:19.1031550Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T13:03:19.1031840Z   TEST_CONFIG: default
2022-08-09T13:03:19.1032110Z   SHARD_NUMBER: 1
2022-08-09T13:03:19.1032390Z   NUM_TEST_SHARDS: 2
2022-08-09T13:03:19.1032820Z   PR_BODY: 
2022-08-09T13:03:19.1033140Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T13:03:19.1033420Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T13:03:19.1033740Z   CONDA: /Users/runner/miniconda3
2022-08-09T13:03:19.1034080Z   CONDA_PKGS_DIR: /Users/runner/conda_pkgs_dir
2022-08-09T13:03:19.1034560Z ##[endgroup]
2022-08-09T16:56:10.3471840Z ##[error]The operation was canceled.
2022-08-09T16:56:10.3714780Z Prepare all required actions
2022-08-09T16:56:10.3716060Z Getting action download info
2022-08-09T16:56:10.7609040Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:11.4590960Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:11.4591370Z with:
2022-08-09T16:56:11.4592920Z   github-token: ***
2022-08-09T16:56:11.4593260Z env:
2022-08-09T16:56:11.4593500Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:11.4593930Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T16:56:11.4594270Z   TEST_CONFIG: default

See GitHub Actions build trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12) (4/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:31.2929720Z ##[error]The operation was canceled.
2022-08-09T12:58:51.1727950Z env:
2022-08-09T12:58:51.1728070Z   GIT_DEFAULT_BRANCH: master
2022-08-09T12:58:51.1728220Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T12:58:51.1728370Z   TEST_CONFIG: default
2022-08-09T12:58:51.1728490Z   SHARD_NUMBER: 2
2022-08-09T12:58:51.1728600Z   NUM_TEST_SHARDS: 2
2022-08-09T12:58:51.1728720Z   PR_BODY: 
2022-08-09T12:58:51.1728850Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T12:58:51.1728990Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T12:58:51.1729120Z ##[endgroup]
2022-08-09T16:56:31.2929720Z ##[error]The operation was canceled.
2022-08-09T16:56:31.2960280Z Prepare all required actions
2022-08-09T16:56:31.2960480Z Getting action download info
2022-08-09T16:56:31.4886890Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:31.7174780Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:31.7174980Z with:
2022-08-09T16:56:31.7175520Z   github-token: ***
2022-08-09T16:56:31.7175660Z env:
2022-08-09T16:56:31.7175810Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:31.7175980Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T16:56:31.7176150Z   TEST_CONFIG: default

See GitHub Actions build trunk / macos-12-py3-x86-64 / test (functorch, 1, 1, macos-12) (5/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:09.1288000Z ##[error]The operation was canceled.
2022-08-09T13:03:19.1677930Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T13:03:19.1678210Z   TEST_CONFIG: functorch
2022-08-09T13:03:19.1678650Z   SHARD_NUMBER: 1
2022-08-09T13:03:19.1678910Z   NUM_TEST_SHARDS: 1
2022-08-09T13:03:19.1679170Z   PR_BODY: 
2022-08-09T13:03:19.1679440Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T13:03:19.1679850Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T13:03:19.1680170Z   CONDA: /Users/runner/miniconda3
2022-08-09T13:03:19.1680480Z   CONDA_PKGS_DIR: /Users/runner/conda_pkgs_dir
2022-08-09T13:03:19.1680770Z ##[endgroup]
2022-08-09T16:56:09.1288000Z ##[error]The operation was canceled.
2022-08-09T16:56:09.1616920Z Prepare all required actions
2022-08-09T16:56:09.1618290Z Getting action download info
2022-08-09T16:56:09.4569000Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:09.9531620Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:09.9532110Z with:
2022-08-09T16:56:09.9534750Z   github-token: ***
2022-08-09T16:56:09.9535590Z env:
2022-08-09T16:56:09.9535810Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:09.9536110Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T16:56:09.9536400Z   TEST_CONFIG: functorch

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 6, 2022
@jjsjann123
Copy link
Collaborator Author

hmmm. the assert failure on cuda10.2 is really strange... It's complaining about mismatch inputs number to fused kernel. 😕

The other failure about codegen error should be easy to patch, we are leaking __bfloat there in a debug print.

@jjsjann123 jjsjann123 changed the title [WIP][DO NOT REVIEW] Upstream push 0806 [WIP][DO NOT REVIEW] upstream push smoke test Aug 9, 2022
@jjsjann123 jjsjann123 closed this Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.