Reject requests for nonexistent kernels. #2364

fdwr · 2019-11-09T03:02:44Z

Description: ORT silently substitutes v10 operators when higher version operators (say v13 or v42..) are declared as opset dependencies by the model. This is problematic because it will appear to the user that the model was successfully executed when in reality the code has no idea if the results are correct because it can't predict the future - the semantics between v10 and v13 could change in the specification. It's better to fail than to return the incorrect results. Applications can always register custom schema and newer opset versions to fill in missing gaps, in cases where ORT is behind.

Note this branch of ORT is an older intermediate snapshot which only supported opset 10. Some 11 ops were implemented (BitShift, CumSum, Reduce*, DynamicQuantizeLinear, Pad, Resize, Slice), but none of the slice operators.

Motivation and Context

Avoids silently returning the incorrect data.
Note this targeted fix specifically applies only to the Vibranium branch (which only fully supported up to opset 10). Master should probably have a different (more complete fix) after more discussion.

#2371

snnn · 2019-11-09T03:12:55Z

include/onnxruntime/core/graph/constants.h

 constexpr const char* kBrainSliceExecutionProvider = "BrainSliceExecutionProvider";
 constexpr const char* kTensorrtExecutionProvider = "TensorrtExecutionProvider";
+
+constexpr int kMaximumAiOnnxVersionSupported = 10;


Why it is 10?

Note this is branch rel-1905, which is a much older snapshot of ORT where 11 support was only partial (IIRC only a few ops out of them all), which was before the ONNX 1.6 specification was even published and finalized. So the little bit of 11 support contained in this branch is an uncertain state relative to the spec. We want to say cleanly that we support the published version of ONNX 1.4 (opset 10) in Vibranium.

Now you need to find a way to make the tests pass :-(

Well I found a way to make them pass (at least onnxruntime_test_all.exe runs locally without errors) by applying the same rules to the other GetLatestOpsetVersions overload, which is what the Model class indirectly used to determine its default opset when no specific opset was passed to the constructor (it was using 11 by default). Also updated to testdata .onnx files which were set to opset 11 unnecessarily (moving them back down to 10).

Note that in master, I since saw that Scott McKay added explicit versions in a number of places so that we continue to still get testing on older op versions, and so that may not apply quite the same there.

Alas, I see that onnxruntime_test_all.exe doesn't run all the test_models, and there are still a number of opset 11 models in the CI tests complaining about failing to an 11 kernel, which is many to update O_o. I don't know if I'll make it in time before the bug bar rises to pacify the tests. I'll at least undo the latest change, but I'm considering abandoning this PR and keeping the logic exclusively in WinML for now.

2019-11-14 14:24:55.967545903 [E:onnxruntime:Default, runner.cc:42 RunTestCase] Test raw failed:Load model from test_models/test_abs_uint32/raw/model.onnx failed:This is an invalid model. Error in Node: : No Op registered for Abs with domain_version of 11

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=96833

snnn · 2019-11-09T03:15:57Z

Are you saying:
for example, the latest onnxruntime release supports opset 11, there is a new model with opset 12, because onnxruntime 1.0 doesn't have the opset 12 schema, so it assumes the op didn't get changed betwenn 11 and 12 and onnxruntime will execute the model? If that is true, please add a test case to prove your change actually fixed the error.

fdwr · 2019-11-10T22:52:37Z

Are you saying:
for example, the latest onnxruntime release supports opset 11, there is a new model with opset 12, because onnxruntime 1.0 doesn't have the opset 12 schema, so it assumes the op didn't get changed betwenn 11 and 12 and onnxruntime will execute the model?

Correct, I loaded and executed an Add-13 model without any warning/error.

ir_version: 3
producer_name: "OnnxConformanceTest"
graph {
  node {
    input: "A"
    input: "B"
    output: "C"
    op_type: "Add"
    domain: ""
  }
  name: "Add"
  input {
    name: "A"
    type {
      tensor_type {
        elem_type: FLOAT
        shape {
          dim {
            dim_value: 5
          }
        }
      }
    }
  }
  input {
    name: "B"
    type {
      tensor_type {
        elem_type: FLOAT
        shape {
          dim {
            dim_value: 5
          }
        }
      }
    }
  }
  output {
    name: "C"
    type {
      tensor_type {
        elem_type: FLOAT
        shape {
          dim {
            dim_value: 5
          }
        }
      }
    }
  }
}
opset_import {
  version: 13
}

fdwr · 2019-11-10T22:59:01Z

Hmm, CI tests: "No Op registered for Identity with domain_version of 11"
There is no Identity-11, and so the test framework must be generically stamping opset 11 into all the models it creates, but I'm unsure where this is occuring... This branch of ORT doesn't really support opset 11 anyway (just partially).

jywu-msft · 2019-11-11T04:50:02Z

include/onnxruntime/core/graph/constants.h

 constexpr const char* kBrainSliceExecutionProvider = "BrainSliceExecutionProvider";
 constexpr const char* kTensorrtExecutionProvider = "TensorrtExecutionProvider";
+
+constexpr int kMaximumAiOnnxVersionSupported = 10;


shouldn't we be using
DomainToVersionRange from ONNX
see: https://github.com/onnx/onnx/blob/master/onnx/defs/schema.h#L687
rather than hardcoding it in ORT?

Thanks for pointing that out. I like the idea of using it. The issue here is that ONNX and ORT have different versions (note this fix targets the rel-1905 Vibranium branch, and so this might not be applicable to the master branch) as we took an older snapshot of ORT that only supported opset 10, but its ONNX submodule has 11 since ONNX 1.6 had already started (but wasn't finalized and published yet). The ORT opset 11 support in this branch was preliminary and partial (before the ONNX 1.6 spec was published, meaning ops could still change behavior). So we don't really want 11.

A pending question is what this should ultimately look like in master. I opened a GitHub issue: #2371

pranavsharma · 2019-11-12T00:33:55Z

Changes look ok. Can you add a test case?

fdwr · 2019-11-14T13:36:57Z

Changes look ok. Can you add a test case?

@pranavsharma : Test case added.

Ack, there are many other model tests that fail. I think I'll have to give up on this change to make the window and revert to just having this check only in WinML, even though it's not as complete there.

snnn · 2019-11-14T22:02:29Z

onnxruntime/core/graph/schema_registry.cc

  registries.push_front(registry);
 }

+static int GetActualLastSupportedOpSet(


How does this relate to the "DomainToVersionMap" in graph and model ?

Hmm, I fear my latest iteration last night (in an attempt to pacify the test code which calls this function to indirectly figure out what opset it should use for dynamically generated models by default) will break app registered schemas :/, and we do want to support that.

fdwr · 2019-11-15T23:50:26Z

We'll make a modified change targeting master (incorporating George Wu's feedback) with the same policy (and keep this specific change local for Vibranium only).

Reject requests for nonexistent kernels.

e522113

fdwr requested a review from a team as a code owner November 9, 2019 03:02

snnn reviewed Nov 9, 2019

View reviewed changes

jywu-msft reviewed Nov 11, 2019

View reviewed changes

Use char instead of string_view.

32fd860

fdwr added 2 commits November 14, 2019 05:09

Update tests accordingly. onnxruntime_test_all.exe now passes locally.

7a613e7

Add test case for future opset.

db4b45e

snnn reviewed Nov 14, 2019

View reviewed changes

fdwr closed this Nov 15, 2019

fdwr mentioned this pull request Nov 15, 2019

Add opset and IR check when loading model #2413

Merged

fdwr deleted the user/dwayner/ClampVersion2 branch May 13, 2020 10:25

Reject requests for nonexistent kernels. #2364

Reject requests for nonexistent kernels. #2364

Uh oh!

Conversation

fdwr commented Nov 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn Nov 9, 2019

Choose a reason for hiding this comment

Uh oh!

fdwr Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

snnn Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

fdwr Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

fdwr Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snnn commented Nov 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fdwr commented Nov 10, 2019

Uh oh!

fdwr commented Nov 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jywu-msft Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

fdwr Nov 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma commented Nov 12, 2019

Uh oh!

fdwr commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

fdwr Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

fdwr commented Nov 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fdwr commented Nov 9, 2019 •

edited

Loading

fdwr Nov 14, 2019 •

edited

Loading

snnn commented Nov 9, 2019 •

edited

Loading

fdwr commented Nov 10, 2019 •

edited

Loading

fdwr Nov 11, 2019 •

edited

Loading

fdwr commented Nov 14, 2019 •

edited

Loading