[TorchTidy] Adding support for accessing strides and scalars by Gamrix · Pull Request #80072 · pytorch/pytorch

Gamrix · 2022-06-22T19:45:01Z

Stack from ghstack:

Differential Revision: D37571570

[ghstack-poisoned]

ghstack-source-id: 3c626cb Pull Request resolved: #80072

facebook-github-bot · 2022-06-22T19:45:16Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/80072
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit f370cf0 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

[ghstack-poisoned]

ghstack-source-id: c5fa420 Pull Request resolved: #80072

[ghstack-poisoned]

ghstack-source-id: 8a98e29 Pull Request resolved: #80072

robieta

I have a couple nits, but overall looks good. Can you run the overhead benchmark to A/B test and see what the additional info costs?

buck build @mode/opt-clang //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark
./buck-out/gen/caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/cpp_benchmark --useOutOfPlace --kinetoInputShapes

(Which, by the way, we should totally open source.)

robieta · 2022-06-23T15:13:47Z

torch/csrc/profiler/collection.cpp

    } else if (value.isScalar()) {
      tags_.emplace_back(Tag::Scalar);
+      // Scalars are small enough to store as IValues
+      ivalues_.emplace_back(value);


How does the perf of holding an IValue compare to storing the Scalar directly? In theory the latter should be cheaper, but if its not meaningfully different obviously IValue has a bunch of advantages.

For trivially copyable datatypes (which includes all scalars), the only difference in performance between an IValue and a Scalar would be an extra check on the tag to see if it is trivially copyable, which should not be meaningfully different.

Both of them are tagged unions.

pytorch/aten/src/ATen/core/ivalue.h

Lines 1124 to 1148 in fe6aa08

union Payload {

// We use a nested union here so that we can make the copy easy

// and efficient in the non-tensor (i.e., trivially copyable)

// case. Specifically, we do not have to do a switch-on-tag to

// figure out which union member to assign; we can just use

// TriviallyCopyablePayload::operator=.

union TriviallyCopyablePayload {

TriviallyCopyablePayload() : as_int(0) {}

int64_t as_int;

double as_double;

bool as_bool;

// Invariant: never nullptr; null state is represented as

// c10::UndefinedTensorImpl::singleton() for consistency of

// representation with Tensor.

c10::intrusive_ptr_target* as_intrusive_ptr;

struct {

DeviceType type;

DeviceIndex index;

} as_device;

} u;

at::Tensor as_tensor;

Payload() : u() {}

~Payload() {}

};

robieta · 2022-06-23T15:21:27Z

torch/csrc/profiler/containers.h

    return next_++;
  }

+  T* emplace_list(c10::ArrayRef<T> arg_list) {


A related follow up is directly using SizesAndStrides. (https://github.com/pytorch/pytorch/blob/master/c10/core/impl/SizesAndStrides.h) That, combined with a vectorized emplace would mean that most of the time we could store sizes and strides in a single memcpy.

robieta · 2022-06-23T15:22:01Z

torch/csrc/profiler/containers.h

  }

+  T* emplace_list(c10::ArrayRef<T> arg_list) {
+    // TODO: Optimize this frther at a future date


nit: further

robieta · 2022-06-23T15:30:45Z

torch/csrc/profiler/collection.h

  AppendOnlyList<TensorMetadata, IO_ENCODER_DEFAULT_BLOCK_SIZE>
      tensor_metadata_;
  AppendOnlyList<int64_t, IO_ENCODER_DEFAULT_BLOCK_SIZE> tensor_sizes_;
+  AppendOnlyList<int64_t, IO_ENCODER_DEFAULT_BLOCK_SIZE> tensor_strides_;


Is there any reason not to store sizes and strides in the same container? They're guaranteed to have the same size, so it's unambiguous during post processing. (And it should have modestly better cache behavior.)

I don't see any reason not to, but I will likely refactor into using SizesAndStrides as you suggested above.

robieta · 2022-06-23T15:32:57Z

test/test_profiler.py

+        # The second argument to the add gets promotoed to a zerodim Tensor
+        self.assertEqual(node.extra_fields.inputs.dtypes, ['float', 'double', 'Scalar'])
+        self.assertEqual(node.extra_fields.inputs.shapes, [[5, 5], [], []])
+        self.assertEqual(node.extra_fields.inputs.ivalues, [None, None, alpha])


Test for layout as well?

I will pull the layout stuff into a separate diff, I need to expose the TensorMetadata struct to Python.

robieta · 2022-06-23T15:33:57Z

torch/csrc/autograd/init.cpp

+            [](const Inputs& inputs) {
+              py::list list;
+              for (auto& v : inputs.ivalues_) {
+                list.append(torch::jit::toPyObject(v));


[ghstack-poisoned]

ghstack-source-id: 6b8a0ff Pull Request resolved: #80072

Gamrix · 2022-07-01T00:15:13Z

@Gamrix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

ghstack-source-id: 81a9657 Pull Request resolved: #80072

TensorMetadata will be used later to hold various metadata fields including sizes and strides. This is basically me separating the following diff into its logical components after they all got smushed together #80072 Pull Request resolved: #81155 Approved by: https://github.com/robieta

#81155) Summary: TensorMetadata will be used later to hold various metadata fields including sizes and strides. This is basically me separating the following diff into its logical components after they all got smushed together #80072 Pull Request resolved: #81155 Approved by: https://github.com/robieta Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/74c795871704114ef4a3c898dd388da84759d925 Reviewed By: jeanschmidt Differential Revision: D38066982 Pulled By: jeanschmidt fbshipit-source-id: a583ea42e0a3ed438923a1bd4ccd9c16e2ab9f69

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

Gamrix · 2022-08-08T19:05:12Z

Reopening this commit. In #81824 we concluded we would not do SizesAndStrides and go back to this method instead.

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

Gamrix · 2022-08-10T18:52:05Z

@Gamrix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Gamrix · 2022-08-10T23:42:25Z

We can see that the changes slightly speed up the code. This is likely due to the bulk copy being more efficient than what we had before.

Base:
https://our.internmc.facebook.com/intern/aibench/details/1688022957
https://our.internmc.facebook.com/intern/aibench/details/1778300628
https://our.internmc.facebook.com/intern/aibench/details/3685288181
https://our.internmc.facebook.com/intern/aibench/details/2328188198
https://our.internmc.facebook.com/intern/aibench/details/3554131505

1.2508 1.2545 1.2627
1.2528 1.2554 1.2605
1.2637 1.2696 1.2789

D37571570
https://our.internmc.facebook.com/intern/aibench/details/1216719728
https://our.internmc.facebook.com/intern/aibench/details/2852921225
https://our.internmc.facebook.com/intern/aibench/details/308408276
https://our.internmc.facebook.com/intern/aibench/details/3785164820
https://our.internmc.facebook.com/intern/aibench/details/1536294584

1.2249 1.2278 1.2366
1.2244 1.2297 1.2459
1.2216 1.2269 1.2405

robieta

LGTM.

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

Gamrix · 2022-08-16T23:07:49Z

@Gamrix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-08-17T03:33:13Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2022-08-17T03:35:58Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

github-actions · 2022-08-17T03:36:55Z

Hey @Gamrix.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: #80072 Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D37571570 Pulled By: Gamrix fbshipit-source-id: 2ea2056b6a498b2a6aea77bece57090845580503

[TorchTidy] Adding support for accessing strides and scalars

97caa0e

[ghstack-poisoned]

Gamrix requested review from albanD and soulitzer as code owners June 22, 2022 19:45

Gamrix mentioned this pull request Jun 22, 2022

[TorchTidy] Adding testing for size and dtype capture #80071

Closed

facebook-github-bot added the cla signed label Jun 22, 2022

Gamrix added a commit that referenced this pull request Jun 22, 2022

[TorchTidy] Adding support for accessing strides and scalars

6aa9126

ghstack-source-id: 3c626cb Pull Request resolved: #80072

Gamrix marked this pull request as draft June 22, 2022 19:50

Update on "[TorchTidy] Adding support for accessing strides and scalars"

7ca7530

[ghstack-poisoned]

Update on "[TorchTidy] Adding support for accessing strides and scalars"

cfd6259

[ghstack-poisoned]

Gamrix added a commit that referenced this pull request Jun 22, 2022

[TorchTidy] Adding support for accessing strides and scalars

729d10d

ghstack-source-id: c5fa420 Pull Request resolved: #80072

Gamrix requested review from robieta and removed request for albanD and soulitzer June 22, 2022 22:06

Gamrix marked this pull request as ready for review June 22, 2022 22:07

Update on "[TorchTidy] Adding support for accessing strides and scalars"

8942a79

[ghstack-poisoned]

Gamrix added a commit that referenced this pull request Jun 22, 2022

[TorchTidy] Adding support for accessing strides and scalars

76e32db

ghstack-source-id: 8a98e29 Pull Request resolved: #80072

robieta reviewed Jun 23, 2022

View reviewed changes

Update on "[TorchTidy] Adding support for accessing strides and scalars"

c61d12e

[ghstack-poisoned]

This was referenced Jun 25, 2022

TEMP - Adding TensorMetadata: #80265

Closed

[TorchTidy] Adding support for unique tensor identifiers #80266

Closed

Gamrix marked this pull request as draft June 27, 2022 22:23

Update on "[TorchTidy] Adding support for accessing strides and scalars"

6e3abd3

[ghstack-poisoned]

Update on "[TorchTidy] Adding support for accessing strides and scalars"

206e567

[ghstack-poisoned]

Gamrix added a commit that referenced this pull request Jun 30, 2022

[TorchTidy] Adding support for accessing strides and scalars

b463c4b

ghstack-source-id: 6b8a0ff Pull Request resolved: #80072

Update on "[TorchTidy] Adding support for accessing strides and scalars"

e3b29b7

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

Gamrix added a commit that referenced this pull request Jul 8, 2022

[TorchTidy] Adding support for accessing strides and scalars

a8cf287

ghstack-source-id: 81a9657 Pull Request resolved: #80072

Update on "[TorchTidy] Adding support for accessing strides and scalars"

41998aa

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

This was referenced Aug 8, 2022

[TorchTidy] Adding support for Device #82787

Closed

[HIP] Add extra exception handling for non-ROCM builds #83009

Closed

Gamrix reopened this Aug 8, 2022

Gamrix marked this pull request as ready for review August 8, 2022 19:31

Gamrix mentioned this pull request Aug 8, 2022

[TorchTidy] Refactor profiler to use SizesAndStrides #81824

Closed

Gamrix requested review from aaronenyeshi and robieta August 8, 2022 21:48

Update on "[TorchTidy] Adding support for accessing strides and scalars"

5799a67

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

Gamrix requested review from dzhulgakov, ezyang, gchanan and soumith as code owners August 10, 2022 18:51

robieta approved these changes Aug 14, 2022

View reviewed changes

Gamrix added 2 commits August 15, 2022 12:22

Update on "[TorchTidy] Adding support for accessing strides and scalars"

b71851c

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

Update on "[TorchTidy] Adding support for accessing strides and scalars"

f370cf0

Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) [ghstack-poisoned]

pytorchmergebot added the Merged label Aug 17, 2022

pytorchmergebot closed this in 343b5f8 Aug 17, 2022

facebook-github-bot deleted the gh/gamrix/77/head branch August 20, 2022 14:19

	union Payload {
	// We use a nested union here so that we can make the copy easy
	// and efficient in the non-tensor (i.e., trivially copyable)
	// case. Specifically, we do not have to do a switch-on-tag to
	// figure out which union member to assign; we can just use
	// TriviallyCopyablePayload::operator=.
	union TriviallyCopyablePayload {
	TriviallyCopyablePayload() : as_int(0) {}
	int64_t as_int;
	double as_double;
	bool as_bool;
	// Invariant: never nullptr; null state is represented as
	// c10::UndefinedTensorImpl::singleton() for consistency of
	// representation with Tensor.
	c10::intrusive_ptr_target* as_intrusive_ptr;
	struct {
	DeviceType type;
	DeviceIndex index;
	} as_device;
	} u;
	at::Tensor as_tensor;
	Payload() : u() {}
	~Payload() {}
	};

Conversation

Gamrix commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gamrix Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gamrix commented Jul 1, 2022

Uh oh!

Gamrix commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gamrix commented Aug 10, 2022

Uh oh!

Gamrix commented Aug 10, 2022

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

Gamrix commented Aug 16, 2022

Uh oh!

facebook-github-bot commented Aug 17, 2022

Uh oh!

pytorchmergebot commented Aug 17, 2022

Uh oh!

github-actions bot commented Aug 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Gamrix commented Jun 22, 2022 •

edited

Loading

facebook-github-bot commented Jun 22, 2022 •

edited

Loading

Gamrix Jun 23, 2022 •

edited

Loading

Gamrix commented Aug 8, 2022 •

edited

Loading