[TorchTidy] Adding support for accessing strides and scalars#80072
[TorchTidy] Adding support for accessing strides and scalars#80072Gamrix wants to merge 12 commits intogh/gamrix/77/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit f370cf0 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
robieta
left a comment
There was a problem hiding this comment.
I have a couple nits, but overall looks good. Can you run the overhead benchmark to A/B test and see what the additional info costs?
buck build @mode/opt-clang //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark
./buck-out/gen/caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/cpp_benchmark --useOutOfPlace --kinetoInputShapes
(Which, by the way, we should totally open source.)
| } else if (value.isScalar()) { | ||
| tags_.emplace_back(Tag::Scalar); | ||
| // Scalars are small enough to store as IValues | ||
| ivalues_.emplace_back(value); |
There was a problem hiding this comment.
How does the perf of holding an IValue compare to storing the Scalar directly? In theory the latter should be cheaper, but if its not meaningfully different obviously IValue has a bunch of advantages.
There was a problem hiding this comment.
For trivially copyable datatypes (which includes all scalars), the only difference in performance between an IValue and a Scalar would be an extra check on the tag to see if it is trivially copyable, which should not be meaningfully different.
Both of them are tagged unions.
pytorch/aten/src/ATen/core/ivalue.h
Lines 1124 to 1148 in fe6aa08
torch/csrc/profiler/containers.h
Outdated
| return next_++; | ||
| } | ||
|
|
||
| T* emplace_list(c10::ArrayRef<T> arg_list) { |
There was a problem hiding this comment.
A related follow up is directly using SizesAndStrides. (https://github.com/pytorch/pytorch/blob/master/c10/core/impl/SizesAndStrides.h) That, combined with a vectorized emplace would mean that most of the time we could store sizes and strides in a single memcpy.
torch/csrc/profiler/containers.h
Outdated
| } | ||
|
|
||
| T* emplace_list(c10::ArrayRef<T> arg_list) { | ||
| // TODO: Optimize this frther at a future date |
torch/csrc/profiler/collection.h
Outdated
| AppendOnlyList<TensorMetadata, IO_ENCODER_DEFAULT_BLOCK_SIZE> | ||
| tensor_metadata_; | ||
| AppendOnlyList<int64_t, IO_ENCODER_DEFAULT_BLOCK_SIZE> tensor_sizes_; | ||
| AppendOnlyList<int64_t, IO_ENCODER_DEFAULT_BLOCK_SIZE> tensor_strides_; |
There was a problem hiding this comment.
Is there any reason not to store sizes and strides in the same container? They're guaranteed to have the same size, so it's unambiguous during post processing. (And it should have modestly better cache behavior.)
There was a problem hiding this comment.
I don't see any reason not to, but I will likely refactor into using SizesAndStrides as you suggested above.
test/test_profiler.py
Outdated
| # The second argument to the add gets promotoed to a zerodim Tensor | ||
| self.assertEqual(node.extra_fields.inputs.dtypes, ['float', 'double', 'Scalar']) | ||
| self.assertEqual(node.extra_fields.inputs.shapes, [[5, 5], [], []]) | ||
| self.assertEqual(node.extra_fields.inputs.ivalues, [None, None, alpha]) |
There was a problem hiding this comment.
I will pull the layout stuff into a separate diff, I need to expose the TensorMetadata struct to Python.
| [](const Inputs& inputs) { | ||
| py::list list; | ||
| for (auto& v : inputs.ivalues_) { | ||
| list.append(torch::jit::toPyObject(v)); |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
@Gamrix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]
TensorMetadata will be used later to hold various metadata fields including sizes and strides. This is basically me separating the following diff into its logical components after they all got smushed together #80072 Pull Request resolved: #81155 Approved by: https://github.com/robieta
#81155) Summary: TensorMetadata will be used later to hold various metadata fields including sizes and strides. This is basically me separating the following diff into its logical components after they all got smushed together #80072 Pull Request resolved: #81155 Approved by: https://github.com/robieta Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/74c795871704114ef4a3c898dd388da84759d925 Reviewed By: jeanschmidt Differential Revision: D38066982 Pulled By: jeanschmidt fbshipit-source-id: a583ea42e0a3ed438923a1bd4ccd9c16e2ab9f69
Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]
|
Reopening this commit. In #81824 we concluded we would not do SizesAndStrides and go back to this method instead. |
Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]
|
@Gamrix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
We can see that the changes slightly speed up the code. This is likely due to the bulk copy being more efficient than what we had before. Base: 1.2508 1.2545 1.2627 D37571570 1.2249 1.2278 1.2366 |
Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]
Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]
|
@Gamrix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
|
@pytorchbot successfully started a merge job. Check the current status here. |
|
Hey @Gamrix. |
Summary: Pull Request resolved: #80072 Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D37571570 Pulled By: Gamrix fbshipit-source-id: 2ea2056b6a498b2a6aea77bece57090845580503
Stack from ghstack:
Differential Revision: D37571570