Skip to content

addr ref#78014

Closed
ezyang wants to merge 9 commits intogh/ezyang/1177/basefrom
gh/ezyang/1177/head
Closed

addr ref#78014
ezyang wants to merge 9 commits intogh/ezyang/1177/basefrom
gh/ezyang/1177/head

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented May 20, 2022

Stack from ghstack (oldest at bottom):

Signed-off-by: Edward Z. Yang ezyang@fb.com

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 20, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit bcec376 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
@ezyang ezyang mentioned this pull request May 20, 2022
dispatch:
CPU, CUDA: addr
CompositeImplicitAutograd: math_addr
CompositeExplicitAutograd: math_addr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it had to be CompositeImplicitAutograd for automatic backward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, although we have an explicit addr derivative formula XLA didn't want to use it. I'm forcing XLA to use our old formula now and I want to see what happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XLA tests passed so looks like we're good to go here

check(vec1.ndim == 1, lambda: f"addr: Expected 1-D argument vec1, but got {vec1.ndim}-D")
check(vec2.ndim == 1, lambda: f"addr: Expected 1-D argument vec2, but got {vec2.ndim}-D")
self = self.expand(vec1.shape[0], vec2.shape[0])
check(self.ndim == 2, lambda: f"2D tensor expected, got {self.ndim}D tensor for input")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't expand fail if self.ndim wasn't 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should update the original code lol

check(vec2.ndim == 1, lambda: f"addr: Expected 1-D argument vec2, but got {vec2.ndim}-D")
self = self.expand(vec1.shape[0], vec2.shape[0])
check(self.ndim == 2, lambda: f"2D tensor expected, got {self.ndim}D tensor for input")
check(self.shape[0] == vec1.shape[0] and self.shape[1] == vec2.shape[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, isn't it guaranteed by expand?

check(self.ndim == 2, lambda: f"2D tensor expected, got {self.ndim}D tensor for input")
check(self.shape[0] == vec1.shape[0] and self.shape[1] == vec2.shape[0],
lambda: f"size mismatch, input: {self.shape}, v1: {vec1.shape}, v2: {vec2.shape}")
if utils.is_boolean_dtype(self.dtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you also check alpha dtype (it should be of the same type kind as other inputs)?
@mruberry do you want to make a case again for alpha checking being done in the wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integers are OK though

>>> torch.addr(torch.tensor([[False]]), torch.tensor([False]), torch.tensor([True]), beta=1, alpha=2)
tensor([[False]])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the type promotion wrapper would probably make life easier

I think we should say that integer numbers can interact with bools and it's fine, however

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the obvious thing which is

@elementwise_type_promotion_wrapper(
    type_promoting_args=("self", "vec1", "vec2", "beta", "alpha"),
    type_promotion_kind=ELEMENTWISE_TYPE_PROMOTION_KIND.DEFAULT,
)       

but this doesn't work as an integer alpha causes the entire tensor to promote to int64. I'm not sure what the recommend spelling of this change would be, it seems to have far reaching effects.

ezyang added 2 commits May 20, 2022 17:20
Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request May 21, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: f570d56
Pull Request resolved: #78014
@ezyang ezyang requested a review from Chillee May 21, 2022 02:43
vec2.ndim == 1,
lambda: f"addr: Expected 1-D argument vec2, but got {vec2.ndim}-D",
)
self = self.expand(vec1.shape[0], vec2.shape[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditional expand seems odd -- maybe we can use a construct like _maybe_expand_to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the original code:

  check_1d(vec1, "vec1", "addr");
  check_1d(vec2, "vec2", "addr");
    
  const auto vec1_size0 = vec1.sizes()[0];
  const auto vec2_size0 = vec2.sizes()[0];
  auto self_ = &result == &self
    ? c10::MaybeOwned<Tensor>::borrowed(self)
    : expand_size(self, {vec1_size0, vec2_size0}, "addr");
  TORCH_CHECK(
    self_->dim() == 2,
    "2D tensor expected, got ", self_->dim(), "D tensor for input"
  );    
  TORCH_CHECK(
    self_->sizes()[0] == vec1_size0 && self_->sizes()[1] == vec2_size0,
    "size mismatch, input: ", self_->sizes(),
    ", v1: ", vec1.sizes(),
    ", v2: ", vec2.sizes()
  );

It looks like the expand is done unconditionally, so I faithfully represented this. Are we expected to simplify the logic when writing refs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if we improve it, sure -- if we don't do it now it's just more time and review later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_maybe_expand_to is... not a thing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean?

self.ndim == 2, lambda: f"2D tensor expected, got {self.ndim}D tensor for input"
)
check(
self.shape[0] == vec1.shape[0] and self.shape[1] == vec2.shape[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the expand have already asserted this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, faithful representation of the original code :) My rule for porting: first do a slavish port, and then once that passes all tests, then refactor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, OK I guess -- but some cleanups seem easy to snipe

torch.outer(vec1, vec2) if alpha else torch.full_like(self, False),
)
else:
return beta * self + alpha * torch.outer(vec1, vec2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these decompositions for addr are correct. When beta==0, for example, NaNs in self should not propagate

Copy link
Collaborator

@mruberry mruberry May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this will require adding reference inputs for addr that test the NaN propagation, too!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, in the original code, even if alpha==0 the NaNs still propagate

# No meta support for torch.tensor
DecorateInfo(unittest.expectedFailure, 'TestCommon', 'test_python_reference_meta_functions',),
# RuntimeError: no _refs support for torch.outer
DecorateInfo(unittest.expectedFailure, 'TestCommon', 'test_python_reference_consistency',),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeeks! We should probably get that mode implemented before we start writing refs with torch functions if they have to skip the reference consistency test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is what you asked for, but you are welcome to change your mind ;)

We will have a context manager that will reroute torch.* API calls to torch._ref.* (failing if the ref doesn't exist),

and

do you have a preference for two separate tests (but then I need to dupe all the xfails for the new test name I added) or one test (but then an xfail will prevent us checking that the torch.* version works)
one test if it’s easier

My preferred resolution is to maintain only one test which does NOT test with the mode (so torch goes to torch, refs go to refs), and not test anything else. If we insist on testing the strict mode, we will need to add an extra set of tests, because any ref that calls into a torch function that doesn't have a ref will fail in strict mode. We can also test in non-strict mode, in which case you don't get any signal if all torch functions being called have refs or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have a PR that will be available soon that separates the tests, though, so we can still test for consistency even without refs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI #78026 has landed updating and separating the tests (may require a rebase)

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request May 23, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 4c3126b
Pull Request resolved: #78014
Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request May 23, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 8d84855
Pull Request resolved: #78014
@ezyang
Copy link
Contributor Author

ezyang commented May 23, 2022

I think I hit most PR comments. If we're going to have to do this by hand for every op this is going to take a very long time to do properly.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request May 24, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 004e626
Pull Request resolved: #78014
Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request May 24, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 14bf401
Pull Request resolved: #78014
@ezyang
Copy link
Contributor Author

ezyang commented May 25, 2022

@pytorchbot merge this

@github-actions
Copy link
Contributor

Hey @ezyang.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request May 26, 2022
Summary:
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: #78014

Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/a1765f0176250db2b8ce7b543884ec5600ece14c

Reviewed By: mehtanirav

Differential Revision: D36668872

Pulled By: ezyang

fbshipit-source-id: 35a266e666bf812aa7df9594168ae82cd7d63aa1
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1177/head branch May 28, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants