Skip to content

Unhide unique from C++, make unique partially scriptable#15256

Closed
zasdfgbnm wants to merge 28 commits intopytorch:masterfrom
zasdfgbnm:unique-cpp
Closed

Unhide unique from C++, make unique partially scriptable#15256
zasdfgbnm wants to merge 28 commits intopytorch:masterfrom
zasdfgbnm:unique-cpp

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Dec 15, 2018

This PR does three things:

Null-able integer argument in schema

Allow int64_t? in function schema, which provide an elegant way of implementing null-able int arguments, as discussed in #15208 (review)

Originally implemented in #15235

Example:

- func: myop(Tensor self, int64_t? dim=None) -> Tensor
  variants: function

cc: @zou3519

Edit: implemented in #15234

Unhide unique from C++

Previously tried in #12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether unique(t, 1) actually means unique(t, dim=1) or unique(t, sorted=1).

Now I think I have a better idea on how to implement this: there are two ATen operators: unique and unique_dim. unique has the same signature as in python, and exported to both python and C++. unique_dim has signature unique_dim(tensor, dim, sorted=False, return_inverse=False), and only exported to C++, which could be used more naturally for a C++ user.

cc: @ezyang @t-vi How does it look like this time?

Unique is now partially scriptable

As a result of the above two, torch.unique can now be partially scripted. See test below:

import torch

def f(a):
    b, c = torch.unique(a, return_inverse=True)
    return b.sum()

def g(a):
    b = torch.unique(a)
    return b.sum()

a = torch.rand((5, 6, 7))

traced_f = torch.jit.trace(f, a)
traced_g = torch.jit.trace(g, a)
print('traced f:', traced_f.graph)
print('traced g:', traced_g.graph)

scripted_f = torch.jit.script(f)
print('scripted f:', scripted_f.graph)

scripted_g = torch.jit.script(g)
print('scripted g:', scripted_g.graph)

gives

traced f: graph(%input : Float(5, 6, 7)) {
  %1 : bool = prim::Constant[value=0]()
  %2 : bool = prim::Constant[value=1]()
  %3 : int? = prim::None()
  %b : Float(210), %5 : Long(5, 6, 7) = aten::unique(%input, %1, %2, %3)
  %6 : Float() = aten::sum(%b)
  return (%6);
}

traced g: graph(%input : Float(5, 6, 7)) {
  %1 : bool = prim::Constant[value=0]()
  %2 : bool = prim::Constant[value=0]()
  %3 : int? = prim::None()
  %b : Float(210), %5 : Long(0) = aten::unique(%input, %1, %2, %3)
  %6 : Float() = aten::sum(%b)
  return (%6);
}

scripted f: graph(%a : Tensor) {
  %2 : bool = prim::Constant[value=0]()
  %1 : bool = prim::Constant[value=1]()
  %5 : int? = prim::None()
  %b : Tensor, %c : Tensor = aten::unique(%a, %2, %1, %5)
  %9 : Tensor = aten::sum(%b)
  return (%9);
}

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    scripted_g = torch.jit.script(g)
  File "/home/gaoxiang/.virtualenvs/pt/lib/python3.7/site-packages/torch/jit/__init__.py", line 692, in script
    _jit_script_compile(mod, ast, _rcb, get_default_args(fn))
RuntimeError: 
arguments for call are not valid:
  
  for operator aten::sum(Tensor self, int[] dim, bool keepdim=<default>, *, Tensor out) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self, int[] dim, *, int dtype, Tensor out) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self, int[] dim, bool keepdim, *, int dtype, Tensor out) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self, *, int dtype) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self, int[] dim, bool keepdim=<default>) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self, int[] dim, *, int dtype) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
  
  for operator aten::sum(Tensor self, int[] dim, bool keepdim, *, int dtype) -> Tensor:
  expected a value of type Tensor for argument 'self' but found (Tensor, Tensor)
  def g(a):
      b = torch.unique(a)
      return b.sum()
             ~~~~~ <--- HERE
for call at:
def g(a):
    b = torch.unique(a)
    return b.sum()
           ~~~~~ <--- HERE

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 15, 2018
}

std::tuple<Tensor, Tensor>
_unique_dim_cuda(const Tensor& self, const int64_t dim, const bool sorted, const bool return_inverse) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing this function? shouldn't cpu and cuda share the same logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because unique_dim now just calls unique, so the logic here is moved to _unique_cuda

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_unique_dim_cpu is removed as well

@zasdfgbnm
Copy link
Collaborator Author

@wanchaol The int64_t? part of this PR is exactly the same as what is doing in #15234 . When I started this, #15234 was not there, so I did it myself. Now we have it, and I will revert the changes of this part from this PR.

@zasdfgbnm zasdfgbnm changed the title Unhide unique from C++, make unique partially scriptable, allow optional<int> for schema Unhide unique from C++, make unique partially scriptable Dec 21, 2018
@zasdfgbnm
Copy link
Collaborator Author

zasdfgbnm commented Dec 21, 2018

@wanchaol I've updated this PR, and removed what is done in #15234. The diff in test_jit.py was added to test if JIT correctly handle optional int parameter. Now I have reverted all the int64_t? changes except this one. Do you think it is good to keep or delete this change?

@wanchaol
Copy link
Collaborator

@wanchaol The int64_t? part of this PR is exactly the same as what is doing in #15234 . When I started this, #15234 was not there, so I did it myself. Now we have it, and I will revert the changes of this part from this PR.

I see, int64_t? part was also needed when we try to remove python_default_init from ATen in #15234 , so I added the support there.

@wanchaol I've updated this PR, and removed what is done in #15234. The diff in test_jit.py was added to test if JIT correctly handle optional int parameter. Now I have reverted all the int64_t? changes except this one. Do you think it is good to keep or delete this change?

Thanks! I think it's good to keep it to ensure the unique is working end to end :), maybe we can rename the test to something like test_script_unique_none under TestScript since it primarily are testing the unique functionality. Also try explicitly pass None to it to make sure it works? you can see a example for clamp in test_jit

@zasdfgbnm
Copy link
Collaborator Author

@wanchaol Thanks for the clamp example, I've made changes to test_jit.py according to the suggestion :)

- name: uniform_(Tensor self, double from, double to, Generator generator)
self: zeros_like(grad)

- name: _unique(Tensor self, bool sorted, bool return_inverse)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there reasons for not implementing autograd for unique?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we didn't put an operator to this list? Isn't not_implemented the default? I'm not sure about this.

I think the only reason for not supporting unique's autograd is, nobody need it, so nobody complain about it, so nobody go ahead and implement it...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure if nobody wants it then we don't need to implement it, but I think it's good to keep the correct function signature here, in your case this might be unique(Tensor self, bool sorted, bool return_inverse, int64_t? dim), and keep the not_implemented here because it will be used to give decent error msg to user that this function is not having autograd implemented yet.

sorted=sorted,
return_inverse=return_inverse,
)
output, inverse_indices = torch._C._VariableFunctions.unique(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we don't use the same torch.unique as in torch/tensor.py?

Copy link
Collaborator Author

@zasdfgbnm zasdfgbnm Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be the same reason as why using torch._C._VariableFunctions.unique as in:
https://github.com/pytorch/pytorch/blob/master/torch/functional.py#L198

Which is due to:
https://github.com/pytorch/pytorch/blob/master/torch/__init__.py#L262

When import, torch.unique would be replaced by functional.unique, so using torch.unique here would cause infinite recursion here.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wanchaol has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zasdfgbnm
Copy link
Collaborator Author

@wanchaol Just updated it, I'm not sure if it fixes the problem or not...

@t-vi
Copy link
Collaborator

t-vi commented Dec 22, 2018

I'm ignorant of whether we have a general policy of how to deal with Python kw args in C++, and I don't really have an opinion on the specific case either...

@zou3519
Copy link
Contributor

zou3519 commented Dec 26, 2018

@wanchaol is it possible to add an OSS test that covers the ONNX symbolic?

@wanchaol
Copy link
Collaborator

wanchaol commented Dec 29, 2018

@zou3519 Yes, we have oss test in test_operators to assert the onnx proto generated from ONNX symbolic.

@zasdfgbnm
Copy link
Collaborator Author

@wanchaol I've modified the symbolic and added test. How does it looks this time?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zou3519
Copy link
Contributor

zou3519 commented Jan 9, 2019

@zasdfgbnm @wanchaol says it looks fine.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jan 17, 2019

It looks like there is still something not right with the gen_aten_op.h integration. Internal tests report this error:

Exception: Test of model [redacted] failed, 
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at gen_aten_op.h:15009] . Attempting to run unknown ATen operator configuration: _unique-return_inverse-sorted-1

@wanchaol could you please take a look?

@wanchaol
Copy link
Collaborator

gen_aten_op

@ezyang Sure I am looking into it

@wanchaol
Copy link
Collaborator

@zasdfgbnm I investigated it with our internal failures, it is basically a backward compatibility issue. There're regression tests that indicates multiple serialized models have "_unique" as the operator name, so if the PR change the name of it, the serialized models will break, and per @jhcross, we could not automatically redeploy the binaries without retraining(or somehow updating) all of those models.

This should be a BC change, since pytorch and c2 don't have operator versioning for serialized models, we can only wait for either operator versioning available, or we retrained all of the serialized models. As of for now, in order to unblock this, one possible solution is that we keep both names for "_unique" and "unique" available, "_unique" just need to call "unique" in this PR.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wanchaol has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jan 21, 2019
Summary:
This PR does three things:

~~Allow `int64_t?` in function schema,  which provide an elegant way of implementing null-able int arguments, as discussed in #15208 (review)

~~Originally implemented in #15235

~~Example:~~

```yaml
- func: myop(Tensor self, int64_t? dim=None) -> Tensor
  variants: function
```

~~cc: zou3519~~

Edit: implemented in #15234

Previously tried in #12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`.

Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user.

Differential Revision: D13540278

Pulled By: wanchaol

fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd
@zasdfgbnm
Copy link
Collaborator Author

@wanchaol What happened to this? I saw it merged but then reverted. Do I need to do anything?

@wanchaol
Copy link
Collaborator

@wanchaol What happened to this? I saw it merged but then reverted. Do I need to do anything?

@zasdfgbnm this still cause some internal integration tests failed (it is not being observed until we landed it). I will do some investigation and get back to you with the details

@zasdfgbnm zasdfgbnm closed this Feb 14, 2019
@zasdfgbnm
Copy link
Collaborator Author

closed in favor of #17097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants