Refactor nms into TorchVision variant.#6814
Conversation
|
I'm having a bit of trouble identifying the cause of the CI failure. Basically, if we set RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: Executable expected parameter 0 of size 4000
but got buffer with incompatible size 4004After looking into where this error was coming from, I found out that it was due to PyTorch/XLA trying to figure out the actual size of the output. My guess is that, since I call Given all this information, I'm not sure where the "buffer with imcompatible size 4004" is coming from. Another thing I'm puzzled about is why is this test failing only when @JackCaoG Have you ever seen an error like this? I believe other lowerings also call this same function (e.g. |
|
|
|
Right. So, what is puzzling is why would it fail only when |
|
not really but feel free to skip the test when |
a4e90c8 to
be21831
Compare
| const xla::Shape& boxes_shape = ShapeHelper::ShapeOfXlaOp(boxes); | ||
| XLA_CHECK_EQ(boxes_shape.rank(), 2); | ||
| XLA_CHECK_EQ(boxes_shape.dimensions(1), COORDINATES); | ||
| int64_t num_boxes = boxes_shape.dimensions(0); |
There was a problem hiding this comment.
I wonder if the implementation here is a copy with some modification of the deleted nms_op.cpp, or if it is a brand new implementation.
There was a problem hiding this comment.
I would say that it is a new implementation heavily based on the previous one. So, the problems that I saw with the previous implementation were:
- Almost no comments: couldn't easily tell what some parts of the code were doing
- Copied from another source: it was copied from an old version of tensorflow
- Different signature and semantics: it returned the best
output_sizebox indices, even though some of them might have been suppressed
I believe this implementation checks all of the above-mentioned boxes. I added plenty of comments + a few changes that resulted in a more maintainable code (IMHO).
|
@JackCaoG @vanbasten23 I think this PR is ready. Could you take a look at it when you have some time? |
| init_values, "BoxSelectionLoop", builder)); | ||
| } | ||
|
|
||
| xla::XlaOp BuildNms(xla::XlaOp boxes, xla::XlaOp scores, |
There was a problem hiding this comment.
Is there any resource that you referenced when coming up with the implementation?
There was a problem hiding this comment.
I based this implementation in two other implementations:
- The old PyTorch/XLA
nmsimplementation - TorchVision CUDA
nmsimplementation
| return boxes, scores | ||
|
|
||
| @skipOnEagerDebug | ||
| def test_nms_ref(self): |
There was a problem hiding this comment.
@ysiraichi can you please add a dynamic shape input test scenario for nms? Because this op is dynamic, it's been falling back to CPU. Of course, there is a lot of interest to bring the op to the xla device; though, this requires correct functionality for dynamism.
There was a problem hiding this comment.
more clarity: number of boxes is set to 1000 at the moment. we want that number to be dynamic.
There was a problem hiding this comment.
to be clear the output of this test appears to be dynamic, though the input number of boxes can also be dynamic for nms. This test, currently, covers one dynamism scenario (i.e. the output dynamism).
This PR refactors the existing
nmslowering implementation, changing the appropriate parts for complying with TorchVision semantics. In summary:nmslowering, based on the old implementationnmstorchvision::nmstorchvision.ops.nms(...)directlycc @miladm @JackCaoG