[TorchTidy] Refactor profiler to use SizesAndStrides#81824
[TorchTidy] Refactor profiler to use SizesAndStrides#81824Gamrix wants to merge 3 commits intogh/gamrix/82/basefrom
Conversation
This includes adding stride information [ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 44f09aa (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This includes adding stride information [ghstack-poisoned]
This includes adding stride information [ghstack-poisoned]
|
I was very surprised by how expensive it is to copy around SizesAndStrides. (I assume the copy from Interestingly, I also found that doing separate memcpy's for sizes and strides was faster than a single memcpy of the contiguous memory. (In addition to using less memory) When I did that (again, nothing fancy) overhead was identical to the baseline. So I think that (plus updating the decode) is the way to go. Sorry for leading you astray. WDYT? |
|
Firstly, we should discuss what we want from TensorImpl ( search I think from this, we can conclude that we do not want to copy TensorImpl, and likely want to do our own thing. We should align on what exactly we want from TensorImpl, and also check that there are no data structures that currently capture most of what we want. Secondly, all of the multiple benchmarks, with different resulting times shows that it is critical that we have a repeatable way of running benchmarks, and that we migrate to a system that has reproducible results. It will be really hard to get anything done when one week we are saying that x feature has a 20% overhead, and the next, we say it has 40%. My proposal is that we do future benchmarking on the AWS machines that people are preparing specifically for benchmarking and move the benchmark script to open source, and fully document the benchmarking process for profiler overheads. |
I think we might want this. (CC @c-odrin who is looking into implementing it) The specific use case is to help find leaks by allowing the user to see where the number of referents differs from what is expected. An atomic read should be cheap, but this is all subject to cost-benefit analysis once we have something concrete.
Probably not? I don't think we're expecting to get compiler-y enough to need this fine grained of detail.
I think duplicated from sizes and strides information?
Lots of great info for not much space is very tempting, but probably not until we have a concrete use case. +1 we should open source the benchmark. |
| tensor_sizes_.emplace_back(i); | ||
| } | ||
| /*layout_=*/t.layout(), | ||
| /*sizes_and_strides_*/ t.unsafeGetTensorImpl()->sizes_and_strides()); |
There was a problem hiding this comment.
heads up, this would still error out for nested tensor (because sizes_custom errors). I'm adding a test for nested tensor profiling in #82854 so hopefully this should be visible.
There was a problem hiding this comment.
Should we just add an entry to the enum for "this doesn't even make sense"? Just want to future proof things a bit since I'm sure nested tensor won't be the last non-rectangular thing that people dream up.
There was a problem hiding this comment.
One could conceivably say that .sizes() of a nested tensor should return a vector of sizes of constituent tensors, or, if nested tensor happens to be rectangular, the sizes of this rectangular box, or something else? But until nested tensor decides what it wants profiler should ignore it. I don't know if we should call sizes_and_strides with try-catch to ignore all special tensorImpls that choose to throw an error for those calls, and not hardcode just nested tensors? E.g. sparse tensors will give you sizes, but will balk at strides, and I don't remember if their strides_custom throws an error.
|
[Closing this as Taylor thinks we should not go with SizesAndStrides for this support and instead go back to what is implemented in #80072 ] |
Stack from ghstack (oldest at bottom):
This includes adding stride information