Add meta variants to float variants of normal#69632
Add meta variants to float variants of normal#69632nkaretnikov wants to merge 1 commit intogh/nkaretnikov/5/basefrom
Conversation
This mimics what normal_ does. Not sure whether TensorOptions need to explicitly set the device type to meta for these. Without this change, both of these variants already work with meta, but maybe explicit definitions are somehow better (e.g., result in less computations): >>> torch.normal(mean=4., std=1., size=(1, 2, 3), device='meta') tensor(..., device='meta', size=(1, 2, 3)) >>> torch.normal(mean=4., std=1., size=(4, 5), out=torch.rand(1, device='meta')) tensor(..., device='meta', size=(4, 5)) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
This mimics what normal_ does. Not sure whether TensorOptions need to explicitly set the device type to meta for these. Without this change, both of these variants already work with meta, but maybe explicit definitions are somehow better (e.g., result in less computations): >>> torch.normal(mean=4., std=1., size=(1, 2, 3), device='meta') tensor(..., device='meta', size=(1, 2, 3)) >>> torch.normal(mean=4., std=1., size=(4, 5), out=torch.rand(1, device='meta')) tensor(..., device='meta', size=(4, 5)) ghstack-source-id: 3c27c38 Pull Request resolved: #69632
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 1977986 (more details on the Dr. CI page):
🕵️ 14 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
hey @pbelevich, i put you in reviewers because this touches/undoes some of the work you did related to RNG: #35167 (see the previous commits in this stack) |
|
this is my first structured kernel port. so if something doesn't make sense, it's probably because i don't know what i'm doing. some notes related to this pr: #69386 in particular:
things i'm not sure about:
warning: you shouldn't rely on my tests/judgement since it's likely that i messed something up. but i did my best to uncover any potential issues related to BC/shapes. i made no effort to test RNG-related things since i didn't really touch them. |
lezcano
left a comment
There was a problem hiding this comment.
Any reason why we shouldn't port these to structured kernels?
@lezcano well, i'm not sure you can do better than what i just did here (might be wrong tho). the signatures of these are completely different (differ more than just by the |
|
I see how these are different to the others. Even then, you are given the |
i'm not sure what you mean exactly. specifically, i don't understand this part:
the problem i'm describing is this: to use structured, the signatures need to be: that is, they must only differ by the so the only way to make this work is either introducing a bunch of redundant optional memory-allocation parameters to the out variant, but that would be bad since allocation is already determined by the passed tensors OR you could add more functions to the api, but that would blow up the number of operators for no good reason, no? they would offer no new functionality and would just be there to satisfy the current parser/codegen machinery. lmk if i misunderstood your point |
|
i'd also like to point out that these already work with |
ysiraichi
left a comment
There was a problem hiding this comment.
Hey @nkaretnikov.
Good job! I think we can avoid duplicating normal_out. Check out my comment, and let me know if it's unclear.
| return result; // similar to normal_meta_ | ||
| } | ||
|
|
||
| Tensor& normal_out( |
There was a problem hiding this comment.
It seems to me that both normal_out and normal_out_meta do the same thing. You could dispatch CPU, CUDA, and META to the same function.
|
to avoid confusion, will open a new stack to address issues related to BC and broadcasting |
Stack from ghstack:
This mimics what normal_ does. Not sure whether TensorOptions need to
explicitly set the device type to meta for these.
Without this change, both of these variants already work with meta, but
maybe explicit definitions are somehow better (e.g., result in less
computations):