Skip to content

[PyTorch] Pass TensorOptions by value#51165

Closed
swolchok wants to merge 5 commits intogh/swolchok/93/basefrom
gh/swolchok/93/head
Closed

[PyTorch] Pass TensorOptions by value#51165
swolchok wants to merge 5 commits intogh/swolchok/93/basefrom
gh/swolchok/93/head

Conversation

@swolchok
Copy link
Copy Markdown
Contributor

@swolchok swolchok commented Jan 27, 2021

Stack from ghstack:

TensorOptions does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: D25983863

`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: [D25983863](https://our.internmc.facebook.com/intern/diff/D25983863/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 27, 2021

💊 CI failures summary and remediations

As of commit 6eccd63 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

swolchok added a commit that referenced this pull request Jan 27, 2021
`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: [D25983863](https://our.internmc.facebook.com/intern/diff/D25983863/)

ghstack-source-id: 120446690
Pull Request resolved: #51165
`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: [D25983863](https://our.internmc.facebook.com/intern/diff/D25983863/)

[ghstack-poisoned]
… for Functions.cpp by over 15% on "[PyTorch] Pass TensorOptions by value"

`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: [D25983863](https://our.internmc.facebook.com/intern/diff/D25983863/)

[ghstack-poisoned]
`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: [D25983863](https://our.internmc.facebook.com/intern/diff/D25983863/)

[ghstack-poisoned]
`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.

Differential Revision: [D25983863](https://our.internmc.facebook.com/intern/diff/D25983863/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 4495b49.

@facebook-github-bot facebook-github-bot deleted the gh/swolchok/93/head branch February 5, 2021 15:22
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#51165

`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.
ghstack-source-id: 120697498

Test Plan:
Measured timing for empty framework overhead benchmark before & after this change:

Before:
```
I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645
I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485
I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485
I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359
I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537

          2,968.37 msec task-clock                #    0.997 CPUs utilized            ( +-  0.03% )
               250      context-switches          #    0.084 K/sec                    ( +-  2.21% )
                 1      cpu-migrations            #    0.000 K/sec
            11,403      page-faults               #    0.004 M/sec                    ( +-  0.28% )
     5,898,481,882      cycles                    #    1.987 GHz                      ( +-  0.03% )  (50.05%)
    16,169,242,938      instructions              #    2.74  insn per cycle           ( +-  0.03% )  (50.06%)
     3,076,546,626      branches                  # 1036.443 M/sec                    ( +-  0.05% )  (50.05%)
         2,531,859      branch-misses             #    0.08% of all branches          ( +-  0.89% )  (50.03%)
```

After:
```
I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814
I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759
I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291
I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279
I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491

          2,983.75 msec task-clock                #    0.995 CPUs utilized            ( +-  0.36% )
               243      context-switches          #    0.082 K/sec                    ( +-  1.26% )
                 1      cpu-migrations            #    0.000 K/sec
            11,422      page-faults               #    0.004 M/sec                    ( +-  0.18% )
     5,928,639,486      cycles                    #    1.987 GHz                      ( +-  0.36% )  (50.02%)
    16,105,928,210      instructions              #    2.72  insn per cycle           ( +-  0.05% )  (50.02%)
     3,150,273,453      branches                  # 1055.809 M/sec                    ( +-  0.03% )  (50.05%)
         3,713,617      branch-misses             #    0.12% of all branches          ( +-  0.83% )  (50.07%)

```

It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win.

For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher:

```
          2,968.37 msec task-clock                #    0.997 CPUs utilized            ( +-  0.03% )
               250      context-switches          #    0.084 K/sec                    ( +-  2.21% )
                 1      cpu-migrations            #    0.000 K/sec
            11,403      page-faults               #    0.004 M/sec                    ( +-  0.28% )
     5,898,481,882      cycles                    #    1.987 GHz                      ( +-  0.03% )  (50.05%)
    16,169,242,938      instructions              #    2.74  insn per cycle           ( +-  0.03% )  (50.06%)
     3,076,546,626      branches                  # 1036.443 M/sec                    ( +-  0.05% )  (50.05%)
         2,531,859      branch-misses             #    0.08% of all branches          ( +-  0.89% )  (50.03%)
```

If I've done the arithmetic correctly, we have an 0.39% instruction count win.

Reviewed By: ezyang

Differential Revision: D25983863

fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants