Skip to content

[JIT] OpInfo tests for nvfuser#71299

Closed
davidberard98 wants to merge 22 commits intogh/davidberard98/34/basefrom
gh/davidberard98/34/head
Closed

[JIT] OpInfo tests for nvfuser#71299
davidberard98 wants to merge 22 commits intogh/davidberard98/34/basefrom
gh/davidberard98/34/head

Conversation

@davidberard98
Copy link
Copy Markdown
Contributor

@davidberard98 davidberard98 commented Jan 14, 2022

Stack from ghstack:

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: D33595299

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
@pytorch-probot
Copy link
Copy Markdown

pytorch-probot Bot commented Jan 14, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/ae81a5f56c2ef29581d92ca54fff62bfa9ae4294/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk, ciflow/xla ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/linux, ciflow/rocm, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 14, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 2a08ab2 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️‍♀️ 2 failures not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step Action
GitHub Actions trunk / linux-bionic-rocm4.5-py3.7-distributed / test (distributed, 1, 1, linux.rocm.gpu) Checkout PyTorch 🔁 rerun
GitHub Actions pull / linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu) Checkout PyTorch 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jan 14, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 8168033
Pull Request resolved: #71299
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jan 14, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 1dee1ef
Pull Request resolved: #71299
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jan 14, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 99e4c81
Pull Request resolved: #71299
@davidberard98
Copy link
Copy Markdown
Contributor Author

davidberard98 commented Jan 14, 2022

current test failures: nvfuser-opinfo.txt

ignore the following op failures (which I've disabled now, since they fail on the jit variant consistency tests as well):

  • allclose
  • gradient
  • empty_like
  • new_empty

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]

_tracing_ops = partial(ops, dtypes=OpDTypes.supported,
allowed_dtypes=(torch.float, torch.cfloat))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we're restricting to float and cfloat ?

Copy link
Copy Markdown
Contributor Author

@davidberard98 davidberard98 Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied this from the variant_consistency tests, where

# variant testing is only done with torch.float and torch.cfloat to avoid
#   excessive test times and maximize signal to noise ratio

What are your thoughts here, should we expand this to all dtypes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a slow test that runs nightly, we could at least run it then (and initially, to flush out issue)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctrl-f SLOW_TEST or something and you'll find it

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jan 15, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 516236b
Pull Request resolved: #71299
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jan 21, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 741b4cf
Pull Request resolved: #71299
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 54fc4c5
Pull Request resolved: #71299
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98 davidberard98 changed the title [WIP][JIT] OpInfo tests for nvfuser [JIT] OpInfo tests for nvfuser Mar 31, 2022
@davidberard98 davidberard98 marked this pull request as ready for review March 31, 2022 17:30
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: 5925998
Pull Request resolved: #71299
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98 davidberard98 requested a review from eellison March 31, 2022 22:05
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 31, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: b848e6f
Pull Request resolved: #71299
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Differential Revision: [D33595299](https://our.internmc.facebook.com/intern/diff/D33595299)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 1, 2022
These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

ghstack-source-id: dc9ab97
Pull Request resolved: #71299
@davidberard98
Copy link
Copy Markdown
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 😍 😍 Should we file issues for the failing tests ?

# https://github.com/pytorch/pytorch/issues/71784
DecorateInfo(unittest.skip('Skipped!'), 'TestNNCOpInfo', 'test_nnc_correctness',
device_type='cpu', dtypes=(torch.float16,)),
DecorateInfo(unittest.skip('Skipped!'), 'TestCudaFuserOpInfo', 'test_nvfuser_correctness', dtypes=(torch.float16,)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! should we file issues for the failing tests ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of them either have an issue filed or are expected to fail

e.g. #71784 for this one

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me jump on the failing tests!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjsjann123 fyi I think #71784 might be expected

And list of tests that need fixes is in #75029 (also see this board: https://github.com/pytorch/pytorch/projects/30)

facebook-github-bot pushed a commit that referenced this pull request Apr 1, 2022
Summary:
Pull Request resolved: #71299

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33595299

Pulled By: davidberard98

fbshipit-source-id: 26fdacf44941808c134953e7a883a02d13a43f19
@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/34/head branch April 5, 2022 14:17
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
Pull Request resolved: pytorch#71299

These tests verify that for the same inputs, the eager version of an op
and a traced, fused version of the op return the same output.

Currently the tests don't check whether or not fusion actually occurred.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33595299

Pulled By: davidberard98

fbshipit-source-id: 26fdacf44941808c134953e7a883a02d13a43f19
(cherry picked from commit 8cd084e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants