Skip to content

Fix ONNX ATen fallback for non-caffe2 engines#73954

Closed
thiagocrepaldi wants to merge 14 commits intopytorch:masterfrom
thiagocrepaldi:thiagofc/fix-onnx-aten-fallback
Closed

Fix ONNX ATen fallback for non-caffe2 engines#73954
thiagocrepaldi wants to merge 14 commits intopytorch:masterfrom
thiagocrepaldi:thiagofc/fix-onnx-aten-fallback

Conversation

@thiagocrepaldi
Copy link
Collaborator

@thiagocrepaldi thiagocrepaldi commented Mar 9, 2022

This PR introduces 3 BC changes:

First, this PR propagates BUILD_CAFFE2 flag to libtorch and libtorch_python, which is necessary for non-caffe2 ONNX runtimes when using ONNX_ATEN_FALLBACK operator export type.

Second, as a complement of #68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the operator_export_type (aka ONNX_ATEN_FALLBACK) to emit Caffe2 Aten ops, but also whether BUILD_CAFFE2 (which is called torch.onnx._CAFFE2_ATEN_FALLBACK in python binding) is set.

Lastly, it renames onnx::ATen to aten::ATen for ONNX spec consistency in a BC fashion.
ONNX doesn't have ATen op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 9, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/thiagocrepaldi/pytorch/blob/155482e629f351523cdea31bff5833309ec26542/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
linux-binary-manywheel ciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build ciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
windows-binary-libtorch-debug ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
windows-binary-libtorch-release ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
windows-binary-wheel ciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 9, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit a6a8d8d (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step Action
GitHub Actions pull / linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge) Download build artifacts 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 9, 2022
@thiagocrepaldi thiagocrepaldi added module: onnx Related to torch.onnx release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category topic: improvements topic category labels Mar 9, 2022
@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-aten-fallback branch from 155482e to 062663d Compare March 9, 2022 17:38
@thiagocrepaldi thiagocrepaldi marked this pull request as ready for review March 10, 2022 14:17
@thiagocrepaldi thiagocrepaldi changed the title [WIP] Fix ONNX ATen fallback for non-caffe2 engines Fix ONNX ATen fallback for non-caffe2 engines Mar 10, 2022
@thiagocrepaldi
Copy link
Collaborator Author

thiagocrepaldi commented Mar 10, 2022

Folks, I have added you in the hope you are still the Caffe2 experts. Feel free to remove yourself and add include a colleague if that is not the case.

The proposed change aims to refactor the clever Caffe2;s Aten Op in a way that external DL frameworks (aka ONNX Runtime) can use it without having to parse native_functions.yaml, as external DL wont have this file shipped in and nor torch wheel packages include it

@thiagocrepaldi thiagocrepaldi requested a review from BowenBao March 10, 2022 23:15
@BowenBao BowenBao added the onnx-needs-import This PR is related to ONNX, but touches files outside of merge rule patterns, and hence needs import label Mar 10, 2022
Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the broader objective are you trying to achieve? Why does moving from onnx to aten namespace help you while you're still operating on TorchScript graph? Generally aten::Aten kind of doesn't make sense, why no just use regular aten ops?

@thiagocrepaldi
Copy link
Collaborator Author

What is the broader objective are you trying to achieve? Why does moving from onnx to aten namespace help you while you're still operating on TorchScript graph? Generally aten::Aten kind of doesn't make sense, why no just use regular aten ops?

Thanks for the quick feedback Dmytro!

The broader goal is to allow other DL frameworks (e.g. Microsoft's ONNX Runtime) to execute ONNX graphs containing Caffe2's Aten nodes on graphs generated by torch.onnx.export(...,expor_operator_type=ONNX_ATEN_FALLBACK).

The renaming from onnx::aten to aten::aten was done because there is no such ATen operator at ONNX spec and that might cause confusion for the ONNX community when they debug ONNX graphs and find those unspecified ops. org.pytorch.aten::ATen, on the other hand, clearly states this is a PyTorch thing as opposed to ONNX's.

The renaming from operator to operator_name was just an opportunistic improvement that I proposed given that I had compile-time errors when tried to print logs using node->s(::c10::attr::operator) idiom. I realized the hard way that doing node->s(::c10::attr::name) would work just fine but node->s(::c10::attr::operator) wouldn't; so the renaming fix that minor inconsistency without any side effect.

Both renaming are not strictly necessary, but IMO they add clarity (onnx::aten ->> aten::ATen) and consistency (all attributes, including the operator name, can be accessed the same way).

@malfet malfet requested a review from a team as a code owner April 6, 2022 16:52
@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the broken test, will see what will happen

@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-aten-fallback branch from a350605 to 34406d9 Compare April 7, 2022 18:33
Thiago Crepaldi added 14 commits April 7, 2022 11:34
This PR 1) unifies `onnx::ATen` and `aten::ATen` interned strings  nodes onto
`aten::ATen` for consistency. ONNX doesn't have ATen op but PyTorch ONNX
converter emits it in some cases.

This PR also 2) renames ATen's attribute `operator` to
`operator_name` to allow access from C++ code. "operator" is a reserved word in
C++ and any attempt to using `attr::operator` will cause a compiling error.
Although this PR does not use `attr::operator_name` directly, it enables users
to do so, especially during debug.

For example, one developer debugging `onnx::operator` export could do something like

```
if (node->kind() == aten::ATen) {
    std::cout << "ATen operator name: " << node->s(::c10::attr::operator_name) << std::endl;
    std::cout << "ATen overload name: " << node->s(::c10::attr::overload_name) << std::endl;
}
```

Note that doing

```
if (node->kind() == aten::ATen) {
    std::cout << "ATen operator name: " << node->s(::c10::attr::operator) << std::endl;
}
```
would be invalid as C++ has reserved the "operator" word.

Lastly, this PR also fixes a bug in the build script in which
`BUILD_CAFFE2` build definition would always be set, which may cause issues
for non-caffe2 ONNX graphs generated using `ONNX_ATEN_FALLBACK`
operator export type.
@thiagocrepaldi thiagocrepaldi force-pushed the thiagofc/fix-onnx-aten-fallback branch from 34406d9 to a6a8d8d Compare April 7, 2022 18:38
@garymm garymm linked an issue Apr 11, 2022 that may be closed by this pull request
@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge this

(Initiating merge automatically since Phabricator Diff has merged)

facebook-github-bot pushed a commit that referenced this pull request Apr 14, 2022
Summary:
This PR introduces 3 BC changes:

First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.

Second, as a complement of #68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.

Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.

Pull Request resolved: #73954

Reviewed By: anjali411

Differential Revision: D34978368

Pulled By: malfet

fbshipit-source-id: 775b281e90fdea77379fe36212cff1f57b95f6a1
pytorchmergebot pushed a commit that referenced this pull request Apr 19, 2022
Currently ONNX exporter symbolics can emit ATen operators when `operator_export_type==ONNX_ATEN_FALLBACK`. However, this is a behavior specific to Caffe2 builds, as the intend use of `ONNX_ATEN_FALLBACK` is to emit ATen operators only when there is no ONNX equivalent.

The reason Caffe2 choses to emit ATen operators when ONNX counterpart exists is for performance on their particular engine implementation, which might not be true for other implementations. e.g. ONNX Runtime can optimize the generated ONNX graph into something more efficient

This PR must be merged only after #73954
Pull Request resolved: #74680
Approved by: https://github.com/garymm, https://github.com/malfet
facebook-github-bot pushed a commit that referenced this pull request Apr 19, 2022
)

Summary:
Currently ONNX exporter symbolics can emit ATen operators when `operator_export_type==ONNX_ATEN_FALLBACK`. However, this is a behavior specific to Caffe2 builds, as the intend use of `ONNX_ATEN_FALLBACK` is to emit ATen operators only when there is no ONNX equivalent.

The reason Caffe2 choses to emit ATen operators when ONNX counterpart exists is for performance on their particular engine implementation, which might not be true for other implementations. e.g. ONNX Runtime can optimize the generated ONNX graph into something more efficient

This PR must be merged only after #73954

Pull Request resolved: #74680
Approved by: https://github.com/garymm, https://github.com/malfet

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/eab3f428833ce9e11e2db97e516fe4b78ea2e4dc

Reviewed By: seemethere

Differential Revision: D35751465

fbshipit-source-id: 079d2a9db7174e68a9f8740f08364a598630dabc
malfet pushed a commit that referenced this pull request Apr 20, 2022
Currently ONNX exporter symbolics can emit ATen operators when `operator_export_type==ONNX_ATEN_FALLBACK`. However, this is a behavior specific to Caffe2 builds, as the intend use of `ONNX_ATEN_FALLBACK` is to emit ATen operators only when there is no ONNX equivalent.

The reason Caffe2 choses to emit ATen operators when ONNX counterpart exists is for performance on their particular engine implementation, which might not be true for other implementations. e.g. ONNX Runtime can optimize the generated ONNX graph into something more efficient

This PR must be merged only after #73954
Pull Request resolved: #74680
Approved by: https://github.com/garymm, https://github.com/malfet

(cherry picked from commit eab3f42)
@thiagocrepaldi thiagocrepaldi deleted the thiagofc/fix-onnx-aten-fallback branch May 4, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2-op caffe2 cla signed module: onnx Related to torch.onnx oncall: jit Add this issue/PR to JIT oncall triage queue onnx-needs-import This PR is related to ONNX, but touches files outside of merge rule patterns, and hence needs import open source release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category topic: improvements topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ONNX] ONNX_ATEN_FALLBACK supports ONNX Runtime

10 participants