Add PriorBox op and test. by XinWangIntel · Pull Request #1581 · plaidml/plaidml

XinWangIntel · 2020-11-30T07:09:49Z

No description provided.

XinWangIntel · 2020-11-30T07:11:45Z

Work with plaidml/openvino#117 to solve plaidml/openvino#71, change openvino commit to enable PriorBoxOp

Signed-off-by: xin1.wang <xin1.wang@intel.com>

plaidml/bridge/openvino/ops/prior_box.cpp

Signed-off-by: xin1.wang <xin1.wang@intel.com>

tzerrell

We should rewrite this to not require the 2nd input to be a Constant tensor.

While making the changes necessary for that, also look for whether you can rewrite to remove some of the concatenates & contractions. My intuition is that there may be some room to streamline those. Perhaps I'm mistaken though; I can review in more detail once the other changes are in

tzerrell · 2020-11-30T21:01:48Z

...dml/bridge/openvino/tests/functional/shared_tests_instances/single_layer_tests/prior_box.cpp

+const std::vector<std::vector<float>> maxSizes = {{315.0f}};
+const std::vector<std::vector<float>> aspectRatios = {
+    {1.0f},
+    // {2.0f, 3.0f}


Is this test case commented out to reduce the total number of tests, or because the case isn't working yet?

And I have the same question for other parameters below, i.e. clip, flip, etc.

If it's for test size reduction, we should make 2 sets of tests, a smoke test that runs the current tests, and a non-smoke test that is more comprehensive (i.e., that runs all the currently-commented-out tests).

Just to reduce the amount of tests, there will be 700+ tests for this op and in fact 1400+ if I add more details, these may take hours on my side, so not sure if it worth to open all tests as i locally passed all of them.

Open all tests and just keep a few in smoke tests

I'm a little worried about how slow these tests are running. I would not expect PriorBox to be especially fast, but it's slower than I hoped for. I wonder if perhaps there's a way to use additional dimensions when constructing the output (perhaps a dimension for x vs y, a dimension for which aspect ratio we're on, etc) that would enable us to avoid some of the splits/concatenations currently required...

I divide the tests to three parts which with some fixed arguments to reduce size, it can cover most classic combinations and now only 76 tests, it takes 2 minutes on my side. As i have confirmed the whole combinations which covers all paths can be passed, i think it is enough now.

tzerrell · 2020-12-01T18:43:37Z

plaidml/bridge/openvino/ops/prior_box.cpp

+  std::vector<int> input_shape = input_shape_ngraph_op->get_vector<int>();
+  H = input_shape[0];
+  W = input_shape[1];
+  auto* image_shape_ngraph_op = ngraph::as_type<ngraph::op::Constant>(ctx.layer->get_input_node_ptr(1));


This input should not be required to be a Constant. I don't believe it will be Constant in typical use cases, and it restricts the utility of this op to require a Constant here.

Unlike the other input, which affects the output shape and therefore must be a Constant (as PlaidML does not currently support dynamic shapes), I see no barriers to allowing this input to be a generic tensor. It looks like each use of IH and IW could be converted to use a Tensor instead, although some will need downstream changes as well.

You can use op::slice, or a simple edsl::Contraction to extract and separate the two values IH/IW as tensors.

(although in places where IH/IW are used in similar ways, it is preferable to leave the values together to avoid the need to slice and then later re-concatenate)

Yes, it seems operand 1 can be changed to tensor , i will try this

The input operand layout is IH IW, but in most cases we use IW IH in layout, and use IW && IH for their related tensor then concatenate may reduce some calculations ( sometimes concatenate four tensor which are a little different, but i am not sure calculate multiple times or concatenate multiple times which is less expensive), so keep use separate IW && IH this version

I am uncertain of this, but I think currently concatenate is slower than repeat calculations, but in the long run repeat calculations may be slower. That's just a guess though.

Therefore I would not duplicate calculations to remove concatenations.

flaub · 2020-12-02T10:53:06Z

plaidml/bridge/openvino/ops/prior_box.cpp

+  auto CW = edsl::cast(edsl::index({edsl::TensorDim(H), edsl::TensorDim(W), edsl::TensorDim(1)}, 1), DType::FLOAT32);
+  auto CH = edsl::cast(edsl::index({edsl::TensorDim(H), edsl::TensorDim(W), edsl::TensorDim(1)}, 0), DType::FLOAT32);


I notice the use of FLOAT32, but I wonder how this will effect networks that are intended to run with FLOAT16. Is there some other data type we can use to cast into rather than hard coding the type?

For precision issues, just follow the reference implementation as now the output will be compared with it, i think it make sense to use different precision if we don't need to compare with it which needs to force set RefMode to IE mode

I believe in all places FLOAT32 is currently used, it would be correct to replace with the requested output type. And that should be available as ctx.layer->get_element_type() since PriorBox is a single-output op.

Yes, shall do this, i use get_output_element_type(0) to get the type as the comments on get_element_type() advice

flaub · 2020-12-02T10:54:34Z

plaidml/bridge/openvino/ops/prior_box.cpp

+  float step;
+  float step_x;
+  float step_y;
+  float offset;


Generally I think we should use the highest precision floating point type (double) for constants and values computed at eDSL construction/compile time. Is there a reason why this wouldn't be possible in this context?

follow https://github.com/plaidml/openvino/blob/plaidml/ngraph/core/reference/include/ngraph/runtime/reference/prior_box.hpp#L95 to get same results, can be flexible if close the comparation.

Make step as tensor for the change of IW && IH

…ino commit. Signed-off-by: xin1.wang <xin1.wang@intel.com>

jd-bartlett96

I discussed this with @tzerrell and agree that it could be streamlined a bit to remove some unnecessary looping/concatenation/slicing, which I believe would improve the speed. Does what I suggest below make sense?

jd-bartlett96 · 2020-12-02T22:49:40Z

plaidml/bridge/openvino/ops/prior_box.cpp

+  int density_s = 0, shift = 0;
+  float var = 0;
+
+  for (size_t s = 0; s < fixed_size_count; ++s) {


If you've calculated lists of X-centers CX and Y-centers CY, each of length N, and with a list of sizes S of length M, you can avoid this loop and the associated concatenation at line 262 and calculate all the box corners at once with something to the effect of

F1 = [1,0,1,0] F2 = [0,1,0,1] P = [-1,-1,1,1] D = [W,H,W,H] B[i,j,k] = (F1[i]*CX[j] + F2[i]*CY[j] + P[i]*S[k])/(2*D[i])

where the shape of B is 4xNxM (which can be reshaped into the correct flattened form).

I imagine doing it in this way would allow you to avoid some of the concatenations within the loop as well.

Also, an implementation detail for the last line of pseudocode:

Since you can't use explicit indices with elementwise operations, you will need to expand F1/F2/P/D/S/CX/CY to rank 3 tensors by adding size 1 dims, then rely on autobroadcasting. I.e., reshape F1, F2, and P to (4, 1, 1), reshape CX and CY to (1, N, 1), and reshape S to (1, 1, M); then you can just write B = (F1 * CX + F2 * CY + P * S) / (2 * D) and autobroadcasting will handle the indices for you.

Yes, i used to think about how to remove this loop, but i did not find a way to remove slice for the dynamic shape of each element so keep this path to avoid some duplicate calculations. The main reason is for the attribute "density", each element of fixed_size will use the density with its index, it makes each of them get a different shape for the reason of producing some elements of "density[i] * density[i]" size (different Center_temp for each attributes in one loop), and the concatenate for each attribute is a little different ( different reshape for each C_dst) . One path like what you mentioned is to broadcast the center to the result size, but this may need sperate slice for each attribute or a concatenate of other attributes and may get some duplicate calculation for broadcast. I may think more about this.

Ah, thanks for that explanation, we had not understood the significance of density.

I notice that density is an optional attribute and I suspect that it is frequently not set. I think it would be worthwhile to have a forked path where the more efficient computation method is used where density is unset or uniform, and the current method is used if density varies.

Alternatively, perhaps it would be possible to first do all the computations for the first density (in the manner @jd-bartlett96 suggested) and then where there are multiple densities follow up & concatenate those results; then reorder everything to the correct order. I suspect this may not be possible to do efficiently (in particular I think that reordering step might be very messy), but if there is a reasonable way to do this I think it's a good idea.

It's a good idea to isolate normal and special case. I will try this, but it may need some time as each change shall pass the full tests.

As I checked the reference implementation in openvino, "density" will always have same count with "fixed_size" even openvion spec not shows this request, it seems to be inevitable to use density in all paths of "fixed_size", so I keep current logical. I follow your advice to use auto broadcast to replace some hand writing ops, but I do not find a way to use one formula to combine all these conditions as the shape of each block is not equal, so there are still some concatenate and slice for the combination of results now.

Signed-off-by: xin1.wang <xin1.wang@intel.com>

tzerrell · 2020-12-08T02:28:24Z

plaidml/bridge/openvino/ops/prior_box.cpp

+// Preprocess data with some attributes
+void PriorBoxImpl::prepareConfig() {
+  precision = to_plaidml(ctx.layer->get_output_element_type(0));
+  auto* input_shape_ngraph_op = ngraph::as_type<ngraph::op::Constant>(ctx.layer->get_input_node_ptr(0));


Please add a check for whether this is a null pointer, and if so, throw an exception indicating that this input must be a constant (similar to other ops that use as_type<ngraph::op::Constant>).

Please make this an exception not an assertion. There are legal IRv10 models that do not have a Constant here. If a user runs such a model, they should get an exception indicating that the model is unsupported in PlaidML. This should happen even if they are not checking assertions.

Signed-off-by: xin1.wang <xin1.wang@intel.com>

tzerrell

Thanks for the upgrades! There's one detail to fix mentioned inline, but otherwise this looks good to land.

I am not certain we've found all the ways to eliminate concatenates. However, I'm also not certain we haven't found all the ways to eliminate concatenates. I am confident that this is a complex op and further upgrades will require long and careful work. Given this, I want to land now and leave this as-is unless and until we have evidence that this op is causing substantive performance issues.

tzerrell · 2020-12-10T18:59:15Z

plaidml/bridge/openvino/ops/prior_box.cpp

+// Preprocess data with some attributes
+void PriorBoxImpl::prepareConfig() {
+  precision = to_plaidml(ctx.layer->get_output_element_type(0));
+  auto* input_shape_ngraph_op = ngraph::as_type<ngraph::op::Constant>(ctx.layer->get_input_node_ptr(0));


Please make this an exception not an assertion. There are legal IRv10 models that do not have a Constant here. If a user runs such a model, they should get an exception indicating that the model is unsupported in PlaidML. This should happen even if they are not checking assertions.

Signed-off-by: xin1.wang <xin1.wang@intel.com>

XinWangIntel requested review from flaub and tzerrell November 30, 2020 07:09

XinWangIntel mentioned this pull request Nov 30, 2020

Add PriorBox op and test. plaidml/openvino#117

Merged

XinWangIntel added 2 commits November 30, 2020 08:15

Add PriorBoxOp and tests.

f3b23a3

Signed-off-by: xin1.wang <xin1.wang@intel.com>

Update Openvino commit to support PriorBoxOp.

5658c48

Signed-off-by: xin1.wang <xin1.wang@intel.com>

flaub reviewed Nov 30, 2020

View reviewed changes

plaidml/bridge/openvino/ops/prior_box.cpp Outdated Show resolved Hide resolved

Code refactor.

86ef451

Signed-off-by: xin1.wang <xin1.wang@intel.com>

tzerrell reviewed Dec 1, 2020

View reviewed changes

flaub reviewed Dec 2, 2020

View reviewed changes

Change the second operand of PriorBoxOp to be tensor and update Openv…

9950282

…ino commit. Signed-off-by: xin1.wang <xin1.wang@intel.com>

jd-bartlett96 reviewed Dec 2, 2020

View reviewed changes

XinWangIntel added 4 commits December 3, 2020 06:09

Reduce test size.

db743c8

Signed-off-by: xin1.wang <xin1.wang@intel.com>

Use output precsion to create tensor.

1a43c95

Signed-off-by: xin1.wang <xin1.wang@intel.com>

Fix typo.

4622f95

Signed-off-by: xin1.wang <xin1.wang@intel.com>

Merge branch 'plaidml-v1' into xin-add-PriorBoxOp

b563956

tzerrell reviewed Dec 8, 2020

View reviewed changes

XinWangIntel added 3 commits December 10, 2020 05:34

Merge branch 'plaidml-v1' into xin-add-PriorBoxOp

a4713ae

Use auto broadcast and mask to replace some ops.

2be5ac1

Signed-off-by: xin1.wang <xin1.wang@intel.com>

Merge branch 'plaidml-v1' into xin-add-PriorBoxOp

15e5b90

tzerrell reviewed Dec 11, 2020

View reviewed changes

XinWangIntel added 3 commits December 14, 2020 06:17

Use exception instead of assert to check output_size operand.

e9a3926

Signed-off-by: xin1.wang <xin1.wang@intel.com>

Merge branch 'plaidml-v1' into xin-add-PriorBoxOp

06cfaec

Fix typo.

004bfc9

Signed-off-by: xin1.wang <xin1.wang@intel.com>

tzerrell approved these changes Dec 14, 2020

View reviewed changes

mergify bot merged commit 927821d into plaidml-v1 Dec 14, 2020

mergify bot deleted the xin-add-PriorBoxOp branch December 14, 2020 17:57

		auto CW = edsl::cast(edsl::index({edsl::TensorDim(H), edsl::TensorDim(W), edsl::TensorDim(1)}, 1), DType::FLOAT32);
		auto CH = edsl::cast(edsl::index({edsl::TensorDim(H), edsl::TensorDim(W), edsl::TensorDim(1)}, 0), DType::FLOAT32);

Conversation

XinWangIntel commented Nov 30, 2020

Uh oh!

XinWangIntel commented Nov 30, 2020

Uh oh!

Uh oh!

tzerrell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jd-bartlett96 left a comment

Choose a reason for hiding this comment

Uh oh!

jd-bartlett96 Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tzerrell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

jd-bartlett96 Dec 2, 2020 •

edited

Loading