[WIP][DO NOT REVIEW] upstream push smoke test by jjsjann123 · Pull Request #82937 · pytorch/pytorch

jjsjann123 · 2022-08-06T19:23:28Z

place holder PR for nvfuser code bump. smoke test for CI. Real PR should go through upstream repo.

dfe02f3faed4c64477e5f5c678f21f33415d0195 Merge remote-tracking branch 'csarofeen/devel' into HEAD
16173732ecfafc4797e93c2449cfb778015a6c7a Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884)
7cfb7796bdcf055eb61d600b7b5c9df292950290 Merge pull request #1887 from csarofeen/upstream_merge_0803
3399f6de62061d30781de50ef1862bbfb1615173 Merge remote-tracking branch 'origin/viable/strict' into HEAD
01208f5bba3bc158d41ccbefa0ee2c5ceea7aedb Add `UnaryOpType::Print` which can be helpful for debugging (#1878)
0646522454aa715ef164c88a73fb8bdddc706805 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881)
7bc76aa219293a59e4166e258d76289fe13633ca Fix most inlined propagator for mismatched dims (#1875)
501f4aa270bf4dd47b0d2f4860bc6f23ebc32a38 Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826)
d863d690f923047a85b5229a787118708f810741 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827)
e0ae11a61c87cd998e88ddd79a496548171c31e0 Larger sized mma instructions to support full vectorization (#1824)
9bb4cf7a66b098f04c9d95a2d34ab2bceee151b3 fragment iteration to support fully unrolled mma ops (#1823)
a48270a18dc2d3accc2626758d14d5858ae55032 Merge all dims in pointwise scheduler (#1872)
172fb3673fb4aaf4c1e889922a4fc5c06cbd59f7 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868)
a64462a5ac2fcf57a177bf36b0f26c61a4e252a4 Allow trivial reduction to be merged (#1871)
440102bcda6eb1dcd42d5fa5aeab9d6b049956bc Symmetric API for BestEffortReplay (#1870)
d1caf330c08ea8002f7133ca655bbd5b28c4eb98 Some misc cleanups/refactor split out from #1854 (#1867)
1013eda50be38eac96c00ba781340ac199d5a136 Remove some welford specific logic. (#1864)
51589d36be5a101d06e641fe0400b39028b7cb81 Some cleanups on tests and heuristics params (#1866)
a6b3e70da5dee51dbc246347228ea21384e46ac3 Segmenter bug fix, and deterministic iteration ordering.  (#1865)
1b665b9b5e562d6f0caba5e7319e83e5df64104f Add nullptr checks to IrBuilder (#1861)
1cd9451d7493f631c2837ba07c1ea93a74e83a15 Simplify matmul scheduling with the new transform propagator.  (#1817)
bbc1fb9b8c454f557ab9fcf5b1c3cef9b9e136d0 Add leaky_relu operation (#1852)
e842a9bab5e9f7289b7ce33ee37a682b22373f49 Minor cleanup in pointwise scheduler (#1858)
9ee850ca2f7f51dd5269bffb1255e485f809282d Fix stringstream usage (#1857)
20a36c1e4f28c4ff9837e56784be2686d17435f3 Improve nsight compute support (#1855)
405910308301097297b55c34d560aab6a360e897 Remove debugging `true ||` from getPointwiseHeuristics (#1822)
01117bfe8fdfacdbfdcfba9a624cdf900fe044d4 Misc cleanup (#1853)
5cc64943dc381a568223140bce0f22163c01e29f Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846)
92e6f0207e3a89fe90fd5cd3ffc575dfd766ba00 Cleanup normalization scheduler (#1845)
db89c6591a2f21130599a93675e0615e55564e41 Type inference patch (#1848)
102fe93a4605ca465cda26ebaee4ba1af2026901 Add debug dump for InlinePropagator (#1847)
b7a4d93d375a6e2ddef483763c93ffddc62ec452 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687)
942be5b256056d0e02877361b814ae6af32ca15f Upstream ci build fixes (#1842)
0b83645915029d67f9345aa4649b8c6f62b0061b Fix vectorization bug introduced in #1831 (#1840)
63630f1ae091180e541932a9d9dc598e0a9902dd Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825)
9135a963c01d97ba34b1a7d2f106e78a13fd6651 Fix transpose benchmark dtype (#1839)
2c9a6c02312d5bf4f83cde653b847b4f85849432 Add extra configurability to `parallelizeAllLike` (#1831)

…rch#1526)

* initial volta support * mma parallel type && cleanup * cleanup * alignment * comment * change request * fix same parallel type * move validation pass * comment and cleanup * lint * comment and cleanup * comment and format

* Propagate new symbol throughout fusion using ValReplacementMutator * Replace FusionViewFailPersistent with FusionViewPersistentShmoo * Create separate test-gpu-view.cpp for view tests * Move replaceValue to ir_utils

…ort it. (pytorch#1528)

fusion_args prints arguments given to runFusion. kernel_args prints arguments given to generated CUDA kernels

* Fixes validation of vectorization with contig indexing True contig indexing needs reference tensors, so finding vectorized contig domains at the initial validation time can result in false positives and negatives. Fixed by filling that information at the time of indexing. Also considered to keep it separated from indexing and fill it at the validation time, but it would end up replicating the same logic as reference replay. Closes pytorch#1534

To highlight the impact of the change, renamed `IterDomain::clone()` to `IterDomain::cloneWithoutRFactor()`.

…ns V2 (pytorch#1536)

* save * save * save * save

…ch#1552) * Fix ComputeAtRootDomainMap with broadcast in view root domains Fixes pytorch#1549

…torch#1529) * Allow vectorization with contig-merged domains in pwise scheduler

* Forward merging of trivial-reduction dims in producers * Enable trivial reduction forwarding only when trivial reduction domain is a root domain. For example, splitting a reduction domain by 1 and merging it with another non-reduction domain would result in a trivial-reduction merge. Probably possible to allow such non-root trivial reduction domains, but that would mean, e.g., a leaf domain would be mappable yet its root domain could be unmappable, which seems rather confusing. Considering such transformations would be unlikely, not enabling forwarding would be fine and would cause less surprise.

…1556) * Propagate root domain mappings from rfactor to root domains in ComputeAtRootDomainMap The main purpose of ComputeAtRootDomainMap is to find unmappable domains for comptueAt. This analyais is done by traversing a fusion in a backward direction. Currently, the traversal only visits arithmetic expressions, so information propagation is done from consumer tensors to producer tensors. This propagation is also required from rfactor domains to root domains. Previously it doesn't really matter as rfactor is limited reduction domains, but that's not the case with view. This change also means that ComputeAtRootDomain does not guarantee one-to-one mappings. For example, ``` tv0: [I0, I1] tv1 = view(tv0); // tv1: [I0*I1/N, N] ``` I.e., the view op is done first merging the two domains of `tv0` and then splitting it by N. Note that both of the two rfactor axes of `tv1` are now mapped with the two axes of `tv0`. Because of this change, `ComputeAtRootDomainMap:mapBestEffort` and other mapping functions between a producer and a consumer that is supposed to return a one-to-one map can fail. `ComputeAtRootDomainMap::getMappableDims` is fine as it just grabs any domain that is mappable. `ComputeAtRootDomainMap::mapConsumerToProducer` and `ComputeAtRootDomainMap::mapProducerToConsumer` were used in `TransformReplay::replayPasC` and `TransformReplay::replayCasP`, but they don't really need to use `ComputeAtRootDomainMap` but just `PairwiseRootDomainMap` is sufficient, so replaed the usages with the pairwise variant.

* Minor fix on python test

Add flatten support on the python side

pytorch#1559) * Added a more helpful error message when checking for empty outputs on the Fusion. * Clang fix.

…#1561) * do not re-compute unary op with output and allow expr duplication in debug print.

* always allocate dynamic smem * add driver API call for large smem usage Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

This reverts commit 2d5e4cf.

…_040722

quick test fix

* Remove some welford specific logic. * Multi-reduction fix * Some more minor cleanup. * Add a note on multi-input reductions Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>

Split out from pytorch#1854 - The `InlinePropagatorSelector` seems to be less generally useful than `BoundedPropagationSelector`, so I made `InlinePropagatorSelector` a private class of `compute_at.cpp` and renamed it to `ComputeAtSelector`, and moved `BoundedPropagationSelector` to `maxinfo_propagator.h` and renamed it to `SetSelector`. - Split `DomainMap` from `pointwise.cpp` into `pointwise_utils.cpp`, and renamed some functions. - Add two cache entry: `DOMAIN_MAP` and `REFERENCE_TENSORS`, and use them to in the pointwise scheduler.

…eplayed (pytorch#1868)

…1824)

…ined matmul operand load (pytorch#1827)

…ch#1881)

…1878)

Upstream merge 0803

facebook-github-bot · 2022-08-06T19:23:40Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82937
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 5 New Failures

As of commit 932a0e1 (more details on the Dr. CI page):

Expand to see more

5/5 failures introduced in this PR

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

trunk / macos-12-py3-x86-64 / test (default, 2, 2, macos-12) (1/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:17.6607890Z ##[error]The operation was canceled.

2022-08-09T13:00:56.4930920Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T13:00:56.4931220Z   TEST_CONFIG: default
2022-08-09T13:00:56.4931590Z   SHARD_NUMBER: 2
2022-08-09T13:00:56.4931880Z   NUM_TEST_SHARDS: 2
2022-08-09T13:00:56.4932140Z   PR_BODY: 
2022-08-09T13:00:56.4932410Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T13:00:56.4932840Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T13:00:56.4933140Z   CONDA: /Users/runner/miniconda3
2022-08-09T13:00:56.4933460Z   CONDA_PKGS_DIR: /Users/runner/conda_pkgs_dir
2022-08-09T13:00:56.4933740Z ##[endgroup]
2022-08-09T16:56:17.6607890Z ##[error]The operation was canceled.
2022-08-09T16:56:17.6676490Z Prepare all required actions
2022-08-09T16:56:17.6677360Z Getting action download info
2022-08-09T16:56:17.9406380Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:18.2912810Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:18.2913480Z with:
2022-08-09T16:56:18.2915150Z   github-token: ***
2022-08-09T16:56:18.2915560Z env:
2022-08-09T16:56:18.2916310Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:18.2916590Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T16:56:18.2916900Z   TEST_CONFIG: default

trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12) (2/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:32.0204410Z ##[error]The operation was canceled.

2022-08-09T12:58:44.6468940Z env:
2022-08-09T12:58:44.6469050Z   GIT_DEFAULT_BRANCH: master
2022-08-09T12:58:44.6469200Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T12:58:44.6469330Z   TEST_CONFIG: default
2022-08-09T12:58:44.6469450Z   SHARD_NUMBER: 1
2022-08-09T12:58:44.6469560Z   NUM_TEST_SHARDS: 2
2022-08-09T12:58:44.6469670Z   PR_BODY: 
2022-08-09T12:58:44.6469790Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T12:58:44.6469930Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T12:58:44.6470050Z ##[endgroup]
2022-08-09T16:56:32.0204410Z ##[error]The operation was canceled.
2022-08-09T16:56:32.0237260Z Prepare all required actions
2022-08-09T16:56:32.0237650Z Getting action download info
2022-08-09T16:56:32.2451870Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:32.4859240Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:32.4859400Z with:
2022-08-09T16:56:32.4859730Z   github-token: ***
2022-08-09T16:56:32.4859860Z env:
2022-08-09T16:56:32.4859980Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:32.4860130Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T16:56:32.4860270Z   TEST_CONFIG: default

trunk / macos-12-py3-x86-64 / test (default, 1, 2, macos-12) (3/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:10.3471840Z ##[error]The operation was canceled.

2022-08-09T13:03:19.1031550Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T13:03:19.1031840Z   TEST_CONFIG: default
2022-08-09T13:03:19.1032110Z   SHARD_NUMBER: 1
2022-08-09T13:03:19.1032390Z   NUM_TEST_SHARDS: 2
2022-08-09T13:03:19.1032820Z   PR_BODY: 
2022-08-09T13:03:19.1033140Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T13:03:19.1033420Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T13:03:19.1033740Z   CONDA: /Users/runner/miniconda3
2022-08-09T13:03:19.1034080Z   CONDA_PKGS_DIR: /Users/runner/conda_pkgs_dir
2022-08-09T13:03:19.1034560Z ##[endgroup]
2022-08-09T16:56:10.3471840Z ##[error]The operation was canceled.
2022-08-09T16:56:10.3714780Z Prepare all required actions
2022-08-09T16:56:10.3716060Z Getting action download info
2022-08-09T16:56:10.7609040Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:11.4590960Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:11.4591370Z with:
2022-08-09T16:56:11.4592920Z   github-token: ***
2022-08-09T16:56:11.4593260Z env:
2022-08-09T16:56:11.4593500Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:11.4593930Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T16:56:11.4594270Z   TEST_CONFIG: default

trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12) (4/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:31.2929720Z ##[error]The operation was canceled.

2022-08-09T12:58:51.1727950Z env:
2022-08-09T12:58:51.1728070Z   GIT_DEFAULT_BRANCH: master
2022-08-09T12:58:51.1728220Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T12:58:51.1728370Z   TEST_CONFIG: default
2022-08-09T12:58:51.1728490Z   SHARD_NUMBER: 2
2022-08-09T12:58:51.1728600Z   NUM_TEST_SHARDS: 2
2022-08-09T12:58:51.1728720Z   PR_BODY: 
2022-08-09T12:58:51.1728850Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T12:58:51.1728990Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T12:58:51.1729120Z ##[endgroup]
2022-08-09T16:56:31.2929720Z ##[error]The operation was canceled.
2022-08-09T16:56:31.2960280Z Prepare all required actions
2022-08-09T16:56:31.2960480Z Getting action download info
2022-08-09T16:56:31.4886890Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:31.7174780Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:31.7174980Z with:
2022-08-09T16:56:31.7175520Z   github-token: ***
2022-08-09T16:56:31.7175660Z env:
2022-08-09T16:56:31.7175810Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:31.7175980Z   BUILD_ENVIRONMENT: macos-12-py3-arm64
2022-08-09T16:56:31.7176150Z   TEST_CONFIG: default

trunk / macos-12-py3-x86-64 / test (functorch, 1, 1, macos-12) (5/5)

Step: "Unknown" (full log | diagnosis details)

2022-08-09T16:56:09.1288000Z ##[error]The operation was canceled.

2022-08-09T13:03:19.1677930Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T13:03:19.1678210Z   TEST_CONFIG: functorch
2022-08-09T13:03:19.1678650Z   SHARD_NUMBER: 1
2022-08-09T13:03:19.1678910Z   NUM_TEST_SHARDS: 1
2022-08-09T13:03:19.1679170Z   PR_BODY: 
2022-08-09T13:03:19.1679440Z   PYTORCH_RETRY_TEST_CASES: 1
2022-08-09T13:03:19.1679850Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2022-08-09T13:03:19.1680170Z   CONDA: /Users/runner/miniconda3
2022-08-09T13:03:19.1680480Z   CONDA_PKGS_DIR: /Users/runner/conda_pkgs_dir
2022-08-09T13:03:19.1680770Z ##[endgroup]
2022-08-09T16:56:09.1288000Z ##[error]The operation was canceled.
2022-08-09T16:56:09.1616920Z Prepare all required actions
2022-08-09T16:56:09.1618290Z Getting action download info
2022-08-09T16:56:09.4569000Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-09T16:56:09.9531620Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-09T16:56:09.9532110Z with:
2022-08-09T16:56:09.9534750Z   github-token: ***
2022-08-09T16:56:09.9535590Z env:
2022-08-09T16:56:09.9535810Z   GIT_DEFAULT_BRANCH: master
2022-08-09T16:56:09.9536110Z   BUILD_ENVIRONMENT: macos-12-py3-x86-64
2022-08-09T16:56:09.9536400Z   TEST_CONFIG: functorch

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jjsjann123 · 2022-08-08T20:37:17Z

hmmm. the assert failure on cuda10.2 is really strange... It's complaining about mismatch inputs number to fused kernel. 😕

The other failure about codegen error should be easy to patch, we are leaking __bfloat there in a debug print.

naoyam and others added 30 commits March 17, 2022 19:44

Disable index hoisting with PYTORCH_NVFUSER_DISABLE_INDEX_HOIST (pyto…

9522b1b

…rch#1526)

Fixes pytorch#1523 (pytorch#1525)

5ba9343

Mma operator and volta mma integration (pytorch#1439)

6df7b77

* initial volta support * mma parallel type && cleanup * cleanup * alignment * comment * change request * fix same parallel type * move validation pass * comment and cleanup * lint * comment and cleanup * comment and format

Fix Issue 1502 (pytorch#1503)

dd6d838

* Propagate new symbol throughout fusion using ValReplacementMutator * Replace FusionViewFailPersistent with FusionViewPersistentShmoo * Create separate test-gpu-view.cpp for view tests * Move replaceValue to ir_utils

Drop mappings of unused reference IDs (pytorch#1527)

d87e738

Add contiguous indexing through broadcast roots, fix indexing to supp…

84029bd

…ort it. (pytorch#1528)

Add fusion_args option for PYTORCH_NVFUSER_DUMP (pytorch#1532)

5e572b3

fusion_args prints arguments given to runFusion. kernel_args prints arguments given to generated CUDA kernels

Fixes pytorch#1530 (pytorch#1531)

fe82228

Fixes pytorch#1497 (pytorch#1533)

cbbe872

Do not propagate rfactor flag (pytorch#1542)

97eb2e2

To highlight the impact of the change, renamed `IterDomain::clone()` to `IterDomain::cloneWithoutRFactor()`.

Rework compute at map concrete ID resolution for view (pytorch#1535)

8195ea7

Makes the reference tensor have the complete history of transformatio…

0bd4959

…ns V2 (pytorch#1536)

Fix ID counting with rfactor (pytorch#1544)

7ed57c9

flatten support - C++ only (pytorch#1545)

66a04e9

* save * save * save * save

Fix ComputeAtRootDomainMap with broadcast in view root domains (pytor…

6d14059

…ch#1552) * Fix ComputeAtRootDomainMap with broadcast in view root domains Fixes pytorch#1549

Enable the pointwise scheduler to vectorize contig merged domains (py…

e2ae174

…torch#1529) * Allow vectorization with contig-merged domains in pwise scheduler

Python test fix (pytorch#1558)

efdc460

* Minor fix on python test

flatten support - python (pytorch#1550)

efead96

Add flatten support on the python side

remove debug printing (pytorch#1560)

5b5fafa

Added a more helpful error message when checking for empty outputs on… (

8b112f9

pytorch#1559) * Added a more helpful error message when checking for empty outputs on the Fusion. * Clang fix.

Fix segmenter issue with unary op forwarding and debug print (pytorch…

2c49c56

…#1561) * do not re-compute unary op with output and allow expr duplication in debug print.

shared memory allocation update (pytorch#1498)

d4ffc79

* always allocate dynamic smem * add driver API call for large smem usage Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

Merge commit '873ced7cd02f3f33e65b2020fd52801df963cd25' into HEAD

18dd730

Merge remote-tracking branch 'upstream/master' into upstream_merge_april

a7382d5

Revert "disabling view"

a6db366

This reverts commit 2d5e4cf.

Merge remote-tracking branch 'origin/devel' into upstream_master_bump…

813ad72

…_040722

clang-tidy flake8 ci_workflows

f977f8d

quick test fix

csarofeen and others added 17 commits July 25, 2022 23:49

Remove some welford specific logic. (pytorch#1864)

1013eda

* Remove some welford specific logic. * Multi-reduction fix * Some more minor cleanup. * Add a note on multi-input reductions Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>

Symmetric API for BestEffortReplay (pytorch#1870)

440102b

Allow trivial reduction to be merged (pytorch#1871)

a64462a

Make MostInlined and BestEffort inline propagation no longer assert r…

172fb36

…eplayed (pytorch#1868)

Merge all dims in pointwise scheduler (pytorch#1872)

a48270a

fragment iteration to support fully unrolled mma ops (pytorch#1823)

9bb4cf7

Larger sized mma instructions to support full vectorization (pytorch#…

e0ae11a

…1824)

Ampere async copy ep.2: circular buffering extension to support pipel…

d863d69

…ined matmul operand load (pytorch#1827)

Nonaffine swizzle formulation ep.2: Loop swizzle variant. (pytorch#1826)

501f4aa

Fix most inlined propagator for mismatched dims (pytorch#1875)

7bc76aa

Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (pytor…

0646522

…ch#1881)

Add UnaryOpType::Print which can be helpful for debugging (pytorch#…

01208f5

…1878)

Merge remote-tracking branch 'origin/viable/strict' into HEAD

3399f6d

Merge pull request pytorch#1887 from csarofeen/upstream_merge_0803

7cfb779

Upstream merge 0803

Add TensorViewBuilder::shape(std::vector<Val*> shape) (pytorch#1884)

1617373

Merge remote-tracking branch 'csarofeen/devel' into HEAD

dfe02f3

jjsjann123 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 6, 2022

facebook-github-bot added the cla signed label Aug 6, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 6, 2022

pytorchbot added the open source label Aug 6, 2022

jjsjann123 added 2 commits August 6, 2022 13:22

comparison of unsigned int to 0

9832b25

fixing missing CUDA on test

928c655

jjsjann123 added 2 commits August 8, 2022 14:50

fixing bfloat in herlps.cu

0d39f6e

Merge remote-tracking branch 'origin/master' into upstream_push_0806

932a0e1

jjsjann123 changed the title ~~[WIP][DO NOT REVIEW] Upstream push 0806~~ [WIP][DO NOT REVIEW] upstream push smoke test Aug 9, 2022

jjsjann123 mentioned this pull request Aug 9, 2022

[NVFuser] Upstream push 0809 #83067

Closed

jjsjann123 closed this Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][DO NOT REVIEW] upstream push smoke test#82937

[WIP][DO NOT REVIEW] upstream push smoke test#82937
jjsjann123 wants to merge 909 commits intopytorch:masterfrom
jjsjann123:upstream_push_0806

jjsjann123 commented Aug 6, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 6, 2022 •

edited

Loading

🕵️ 5 new failures recognized by patterns

trunk / macos-12-py3-x86-64 / test (default, 2, 2, macos-12) (1/5)

trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12) (2/5)

trunk / macos-12-py3-x86-64 / test (default, 1, 2, macos-12) (3/5)

trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12) (4/5)

trunk / macos-12-py3-x86-64 / test (functorch, 1, 1, macos-12) (5/5)

Uh oh!

jjsjann123 commented Aug 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

jjsjann123 commented Aug 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 5 New Failures

🕵️ 5 new failures recognized by patterns

trunk / macos-12-py3-x86-64 / test (default, 2, 2, macos-12) (1/5)

trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12) (2/5)

trunk / macos-12-py3-x86-64 / test (default, 1, 2, macos-12) (3/5)

trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12) (4/5)

trunk / macos-12-py3-x86-64 / test (functorch, 1, 1, macos-12) (5/5)

Uh oh!

jjsjann123 commented Aug 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

jjsjann123 commented Aug 6, 2022 •

edited

Loading

facebook-github-bot commented Aug 6, 2022 •

edited

Loading