Fix ONNX ATen fallback for non-caffe2 engines#73954
Fix ONNX ATen fallback for non-caffe2 engines#73954thiagocrepaldi wants to merge 14 commits intopytorch:masterfrom
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit a6a8d8d (more details on the Dr. CI page):
🕵️♀️ 1 failure not recognized by patterns:The following CI failures may be due to changes from the PR
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
155482e to
062663d
Compare
|
Folks, I have added you in the hope you are still the Caffe2 experts. Feel free to remove yourself and add include a colleague if that is not the case. The proposed change aims to refactor the clever Caffe2;s Aten Op in a way that external DL frameworks (aka ONNX Runtime) can use it without having to parse |
dzhulgakov
left a comment
There was a problem hiding this comment.
What is the broader objective are you trying to achieve? Why does moving from onnx to aten namespace help you while you're still operating on TorchScript graph? Generally aten::Aten kind of doesn't make sense, why no just use regular aten ops?
Thanks for the quick feedback Dmytro! The broader goal is to allow other DL frameworks (e.g. Microsoft's ONNX Runtime) to execute ONNX graphs containing Caffe2's Aten nodes on graphs generated by The renaming from The renaming from Both renaming are not strictly necessary, but IMO they add clarity ( |
|
@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
malfet
left a comment
There was a problem hiding this comment.
Updated the broken test, will see what will happen
a350605 to
34406d9
Compare
This PR 1) unifies `onnx::ATen` and `aten::ATen` interned strings nodes onto
`aten::ATen` for consistency. ONNX doesn't have ATen op but PyTorch ONNX
converter emits it in some cases.
This PR also 2) renames ATen's attribute `operator` to
`operator_name` to allow access from C++ code. "operator" is a reserved word in
C++ and any attempt to using `attr::operator` will cause a compiling error.
Although this PR does not use `attr::operator_name` directly, it enables users
to do so, especially during debug.
For example, one developer debugging `onnx::operator` export could do something like
```
if (node->kind() == aten::ATen) {
std::cout << "ATen operator name: " << node->s(::c10::attr::operator_name) << std::endl;
std::cout << "ATen overload name: " << node->s(::c10::attr::overload_name) << std::endl;
}
```
Note that doing
```
if (node->kind() == aten::ATen) {
std::cout << "ATen operator name: " << node->s(::c10::attr::operator) << std::endl;
}
```
would be invalid as C++ has reserved the "operator" word.
Lastly, this PR also fixes a bug in the build script in which
`BUILD_CAFFE2` build definition would always be set, which may cause issues
for non-caffe2 ONNX graphs generated using `ONNX_ATEN_FALLBACK`
operator export type.
34406d9 to
a6a8d8d
Compare
|
@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
Summary: This PR introduces 3 BC changes: First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type. Second, as a complement of #68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set. Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion. ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator. Pull Request resolved: #73954 Reviewed By: anjali411 Differential Revision: D34978368 Pulled By: malfet fbshipit-source-id: 775b281e90fdea77379fe36212cff1f57b95f6a1
Currently ONNX exporter symbolics can emit ATen operators when `operator_export_type==ONNX_ATEN_FALLBACK`. However, this is a behavior specific to Caffe2 builds, as the intend use of `ONNX_ATEN_FALLBACK` is to emit ATen operators only when there is no ONNX equivalent. The reason Caffe2 choses to emit ATen operators when ONNX counterpart exists is for performance on their particular engine implementation, which might not be true for other implementations. e.g. ONNX Runtime can optimize the generated ONNX graph into something more efficient This PR must be merged only after #73954 Pull Request resolved: #74680 Approved by: https://github.com/garymm, https://github.com/malfet
) Summary: Currently ONNX exporter symbolics can emit ATen operators when `operator_export_type==ONNX_ATEN_FALLBACK`. However, this is a behavior specific to Caffe2 builds, as the intend use of `ONNX_ATEN_FALLBACK` is to emit ATen operators only when there is no ONNX equivalent. The reason Caffe2 choses to emit ATen operators when ONNX counterpart exists is for performance on their particular engine implementation, which might not be true for other implementations. e.g. ONNX Runtime can optimize the generated ONNX graph into something more efficient This PR must be merged only after #73954 Pull Request resolved: #74680 Approved by: https://github.com/garymm, https://github.com/malfet Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/eab3f428833ce9e11e2db97e516fe4b78ea2e4dc Reviewed By: seemethere Differential Revision: D35751465 fbshipit-source-id: 079d2a9db7174e68a9f8740f08364a598630dabc
Currently ONNX exporter symbolics can emit ATen operators when `operator_export_type==ONNX_ATEN_FALLBACK`. However, this is a behavior specific to Caffe2 builds, as the intend use of `ONNX_ATEN_FALLBACK` is to emit ATen operators only when there is no ONNX equivalent. The reason Caffe2 choses to emit ATen operators when ONNX counterpart exists is for performance on their particular engine implementation, which might not be true for other implementations. e.g. ONNX Runtime can optimize the generated ONNX graph into something more efficient This PR must be merged only after #73954 Pull Request resolved: #74680 Approved by: https://github.com/garymm, https://github.com/malfet (cherry picked from commit eab3f42)
This PR introduces 3 BC changes:
First, this PR propagates
BUILD_CAFFE2flag tolibtorchandlibtorch_python, which is necessary for non-caffe2 ONNX runtimes when usingONNX_ATEN_FALLBACKoperator export type.Second, as a complement of #68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the
operator_export_type(akaONNX_ATEN_FALLBACK) to emit Caffe2 Aten ops, but also whetherBUILD_CAFFE2(which is calledtorch.onnx._CAFFE2_ATEN_FALLBACKin python binding) is set.Lastly, it renames
onnx::ATentoaten::ATenfor ONNX spec consistency in a BC fashion.ONNX doesn't have
ATenop on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.