Unhide unique from C++, make unique partially scriptable, support namedtuple return by zasdfgbnm · Pull Request #17097 · pytorch/pytorch

zasdfgbnm · 2019-02-14T05:03:31Z

Reopen of #15256 with the new namedtuple return (#394) added. Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.

cc: @wanchaol @ezyang

…ove_unique

…-cpp

…ue-cpp

…-cpp

…ue-cpp

@wanchaol

- Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

- Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

- Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@soumith

… unique_dim for performance" Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. @soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review @ngimel @VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` gh-metadata: pytorch pytorch 18648 gh/zasdfgbnm/1/head

@soumith

… for performance `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. @soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review @ngimel @VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` gh-metadata: pytorch pytorch 18648 gh/zasdfgbnm/1/head

… for performance (#18648) Summary: Pull Request resolved: #18648 ghimport-source-id: 1cf4a8f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18661 Step 7: remove _unique * #18655 Step 6: Rename _unique2 to unique and add int? dim * #18654 Step 5: remove _unque_dim in favor of unique_dim * #18651 Step 4: add support for unique with dim=None * #18650 Step 3: Add support for return_counts to torch.unique for dim not None * #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim * **#18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance** `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR ====== This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: Before --------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After ------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Differential Revision: D14730905 fbshipit-source-id: 10026b4b98628a8565cc28a13317d29adf1225cc

@soumith

… unique_dim for performance" Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. @soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review @ngimel @VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` gh-metadata: pytorch pytorch 18648 gh/zasdfgbnm/1/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@soumith

… unique_dim for performance" Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. @soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review @ngimel @VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` gh-metadata: pytorch pytorch 18648 gh/zasdfgbnm/1/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

…int? dim" Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

…int? dim" Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

@wanchaol

Step 6: Rename _unique2 to unique and add int? dim - Rename `_unique2` to `unique` - Add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. - Inside `torch.unique`, use `unique` and get rid of `unique_dim`. - Unbind `unique_dim` totally from Python at codegen. - Add OSS ONNX test for unique - Add jit test for unique Previously tried in #17097 and cause internal error, not sure about this time. cc: @wanchaol gh-metadata: pytorch pytorch 18655 gh/zasdfgbnm/6/head

zasdfgbnm added 28 commits September 25, 2018 15:39

unhide at::unique from C++

6d039b9

Merge branch 'master' of https://github.com/pytorch/pytorch into impr…

dfb94b9

…ove_unique

Merge branch 'master' of https://github.com/pytorch/pytorch into impr…

979a847

…ove_unique

step1

816fe13

fix

6a88a7c

fixes

ba01cf7

fixes

d98fd04

test_optional_int

ab8a937

Merge branch 'master' into unique-cpp

a52eaf7

merge upstream/master

5d857e3

typo

e1029b9

clean up due to pytorch#15234

97ba9bd

cleanup

29796ad

Merge branch 'master' into unique-cpp

be8a4cd

update

2bd0312

Merge branch 'unique-cpp' of github.com:zasdfgbnm/pytorch into unique…

11bb9fa

…-cpp

Add not implemented to unique

446eac5

Update derivatives.yaml

dff11ae

Update symbolic.py

7f83b08

Merge branch 'master' of https://github.com/pytorch/pytorch into uniq…

5d1fd0b

…ue-cpp

Merge branch 'unique-cpp' of github.com:zasdfgbnm/pytorch into unique…

b53d7f9

…-cpp

fix onnx

976bef5

flake8

8c31eab

Merge branch 'master' of https://github.com/pytorch/pytorch into uniq…

631fdec

…ue-cpp

add _unique, _unique_dim back

7a19962

Merge branch 'master' into unique-cpp

c1cbe89

Merge branch 'master' of https://github.com/pytorch/pytorch into uniq…

75d8206

…ue-cpp

namedtuple return

fc64ab6

This was referenced Feb 14, 2019

Unhide unique from C++, make unique partially scriptable #15256

Closed

Return namedtuples from torch.* function with multiple return arguments #394

Closed

zasdfgbnm mentioned this pull request Mar 30, 2019

Step 6: Rename _unique2 to unique and add int? dim #18655

Closed

ezyang added the open source label Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unhide unique from C++, make unique partially scriptable, support namedtuple return#17097

Unhide unique from C++, make unique partially scriptable, support namedtuple return#17097
zasdfgbnm wants to merge 29 commits intopytorch:masterfrom
zasdfgbnm:unique-cpp

zasdfgbnm commented Feb 14, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zasdfgbnm commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zasdfgbnm commented Feb 14, 2019 •

edited

Loading