Skip to content

[test] add cross-ref tests for python meta kernels#86228

Closed
bdhirsh wants to merge 19 commits intogh/bdhirsh/313/basefrom
gh/bdhirsh/313/head
Closed

[test] add cross-ref tests for python meta kernels#86228
bdhirsh wants to merge 19 commits intogh/bdhirsh/313/basefrom
gh/bdhirsh/313/head

Conversation

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 4, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 4, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86228

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 3 Pending

As of commit 6847440:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

)
)
raise ValueError(msg)
# TODO: this check is too restrictive. torch.add(bool_tensor, bool_tensor, 0) should work.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ngimel. This check triggers on torch.add(bool_tensor, bool_tensor, 0), which works fine on eager. My understanding is that refs should work in all cases that eager works - do you happen to know the purpose of this check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to modify these checks to account for PyTorch allowing integer numbers with boolean tensors. Typically a number cannot be a "higher" Python type than the Python type corresponding to the tensors' type. For example,

a = torch.randint(0, 5, (2, 2))
b = torch.randint(0, 5, (2, 2))
torch.add(a, b, alpha=.5)
: RuntimeError: For integral input tensors, argument alpha must not be a floating point number.

The exception to this is that boolean tensors will interoperate with integers, and I think the integer is always cast to a bool before the operation. The references just don't implement this particular oddity yet.

Copy link
Collaborator

@mruberry mruberry Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this IS an oddity, as it's inconsistent with what we'd expect from type promotion if we wrote torch.add as an actual composite operation:

a = torch.randint(0, 5, (2, 2)).bool()
b = torch.randint(0, 5, (2, 2)).bool()
torch.add(a, b, alpha=1)
: tensor([[True, True],
       [True, True]])

torch.add(a, b) * 1
: tensor([[1, 1],
       [1, 1]])

PyTorch is pretty cool that way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the overview! I'll update this check to pass in the bool case.

bdhirsh added a commit that referenced this pull request Oct 4, 2022
ghstack-source-id: 0399cc4
Pull Request resolved: #86228
bdhirsh added a commit that referenced this pull request Oct 5, 2022
ghstack-source-id: 8a82192
Pull Request resolved: #86228
bdhirsh referenced this pull request Oct 7, 2022
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang ezyang mentioned this pull request Oct 7, 2022
bdhirsh added a commit that referenced this pull request Oct 7, 2022
ghstack-source-id: f779ea7
Pull Request resolved: #86228
bdhirsh added a commit that referenced this pull request Oct 7, 2022
ghstack-source-id: 108111b
Pull Request resolved: #86228
index.ndim <= 1,
lambda: f"Index should have dimension 1 or 0 (got {index.ndim})",
)
# Fails incorrectly for bool tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD This is the same issue as commented out check on like 909. addmm(bool_tensor1, bool_tensor2, bool_tensor3, beta=1, alpha=1) fails this check, but works fine in eager mode.

Do you know why this check was added? Are there some checks in C++ that we're trying (incorrectly) to emulate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure tbh. But I would expect that yes. I'm sure Natalia would know.

Copy link
Collaborator

@ngimel ngimel Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even support addmm for bools? I'm pretty sure we don't support it on cuda. Edit: oh sorry it's not addmm, github was hiding weird things. Then I don't know, let's ask @lezcano. In other ops (like add) we require that alpha is lesser type, but these requirements are inconsistent across ops (add is different from addcmul etc)

Copy link
Collaborator

@lezcano lezcano Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the check, I believe:

TORCH_CHECK_INDEX(index.dim() <= 1, func, "_(): Index is supposed to be a vector, but got dim: ",
index.dim(), " with type: ", index.scalar_type(), " and size: ", index.sizes());

As to why does it fails for bools... I don't know.

I am currently putting together a more thorough test for decompositions. This test that has uncovered a few issues in some operations, in particular for bools. No idea what's going on there just yet. I plan to fix all the issues the test has uncovered before putting up a PR. Hopefully that'll be today or tomorrow.

In the meantime, I'm leaving here this fantastic behaviour it uncovered:

import torch msk = torch.tensor([False])
b = torch.tensor([False])
print(torch.where(msk, True, b).dtype)
print(torch.ops.aten.where.ScalarSelf(msk, True, b).dtype)
print(torch._refs.where(msk, True, b).dtype)

outputs

torch.bool
torch.int64
torch.bool

This happens just for boolean tensors, all the other types work just fine.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM but should get a second opinion from Natalia

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2022
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Oct 13, 2022

@pytorchbot merge -f "macOS flaky build failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Rick0317 pushed a commit to Rick0317/pytorch that referenced this pull request Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants