Skip to content

Make VariableVersion::DISABLED the default constructor for VariableVersion.#55572

Closed
ailzhang wants to merge 4 commits intogh/ailzhang/60/basefrom
gh/ailzhang/60/head
Closed

Make VariableVersion::DISABLED the default constructor for VariableVersion.#55572
ailzhang wants to merge 4 commits intogh/ailzhang/60/basefrom
gh/ailzhang/60/head

Conversation

@ailzhang
Copy link
Contributor

@ailzhang ailzhang commented Apr 7, 2021

Stack from ghstack:

We used to have VariableVersion default constructor
VariableVersion(uint32_t version=0). But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit VariableVersion(uint32_t) constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771

Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling VariableVersion(uint32_t) but we should be
able to optimize those in the followup PRs.

Differential Revision: D27669074

…rsion.

We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 7, 2021

💊 CI failures summary and remediations

As of commit c22bf73 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_rocm3_9_py3_6_build (1/2)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 09 02:41:59 Error generating file
Apr 09 02:41:59         AT_CUDA_CHECK(cub::DeviceSegmentedReduce::Reduce(
Apr 09 02:41:59                       ^
Apr 09 02:41:59 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/SegmentReduce.hip:90:23: error: use of undeclared identifier 'cub'
Apr 09 02:41:59         AT_CUDA_CHECK(cub::DeviceSegmentedReduce::Reduce(
Apr 09 02:41:59                       ^
Apr 09 02:41:59 /var/lib/jenkins/workspace/aten/src/ATen/native/hip/SegmentReduce.hip:105:23: error: use of undeclared identifier 'cub'
Apr 09 02:41:59         AT_CUDA_CHECK(cub::DeviceSegmentedReduce::Reduce(
Apr 09 02:41:59                       ^
Apr 09 02:41:59 18 errors generated when compiling for gfx900.
Apr 09 02:41:59 CMake Error at torch_hip_generated_SegmentReduce.hip.o.cmake:192 (message):
Apr 09 02:41:59   Error generating file
Apr 09 02:41:59   /var/lib/jenkins/workspace/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_SegmentReduce.hip.o
Apr 09 02:41:59 
Apr 09 02:41:59 
Apr 09 02:41:59 caffe2/CMakeFiles/torch_hip.dir/build.make:1195: recipe for target 'caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_SegmentReduce.hip.o' failed
Apr 09 02:41:59 make[2]: *** [caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_SegmentReduce.hip.o] Error 1
Apr 09 02:41:59 make[2]: *** Waiting for unfinished jobs....
Apr 09 02:42:05 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachUnaryOp.hip:4:
Apr 09 02:42:05 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/ForeachFunctors.cuh:3:
Apr 09 02:42:05 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/MultiTensorApply.cuh:6:
Apr 09 02:42:05 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/Loops.cuh:18:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 09 04:51:16 ERROR [0.009s]: TestViewOpsXLA (unittest.loader._FailedTest)
Apr 09 04:51:13 + XLA_EXPERIMENTAL=nonzero:masked_select
Apr 09 04:51:13 + run_test python3 /var/lib/jenkins/workspace/xla/test/../../test/test_view_ops.py -v TestViewOpsXLA
Apr 09 04:51:13 + python3 /var/lib/jenkins/workspace/xla/test/../../test/test_view_ops.py -v TestViewOpsXLA
Apr 09 04:51:16 Test results will be stored in test-reports/python-unittest/.var.lib.jenkins.workspace.xla.test.......test.test_view_ops
Apr 09 04:51:16 
Apr 09 04:51:16 Running tests...
Apr 09 04:51:16 ----------------------------------------------------------------------
Apr 09 04:51:16   TestViewOpsXLA (unittest.loader._FailedTest) ... ERROR (0.009s)
Apr 09 04:51:16 
Apr 09 04:51:16 ======================================================================
Apr 09 04:51:16 ERROR [0.009s]: TestViewOpsXLA (unittest.loader._FailedTest)
Apr 09 04:51:16 ----------------------------------------------------------------------
Apr 09 04:51:16 AttributeError: module '__main__' has no attribute 'TestViewOpsXLA'
Apr 09 04:51:16 
Apr 09 04:51:16 ----------------------------------------------------------------------
Apr 09 04:51:16 Ran 1 test in 0.010s
Apr 09 04:51:16 
Apr 09 04:51:16 FAILED (errors=1)
Apr 09 04:51:16 
Apr 09 04:51:16 Generating XML reports...
Apr 09 04:51:16 Generated XML report: test-reports/python-unittest/.var.lib.jenkins.workspace.xla.test.......test.test_view_ops/TEST-unittest.loader._FailedTest-20210409045116.xml

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ailzhang pushed a commit that referenced this pull request Apr 7, 2021
…rsion.

We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```

ghstack-source-id: b9535ec
Pull Request resolved: #55572
@ailzhang ailzhang requested review from albanD, bhosmer and ezyang April 7, 2021 23:25
… VariableVersion."

We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```

[ghstack-poisoned]
ailzhang pushed a commit that referenced this pull request Apr 7, 2021
…rsion.

We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```

ghstack-source-id: b9535ec
Pull Request resolved: #55572
… VariableVersion."

We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this pull request Apr 8, 2021
…rsion.

We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.

ghstack-source-id: 006a63b
Pull Request resolved: #55572
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh goody. What's the strategy for auditing all the sites? (Pray that CI gets all of them?)

@ailzhang
Copy link
Contributor Author

ailzhang commented Apr 8, 2021

@ezyang I first deleted the default constructor to make sure I got all the implicit use of default constructor in compilation error and then made DISABLED one default. So ideally this should catch all of them already.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit surprising to spray 0s in this code that doesn't really care that the value is 0...
Should we have a ENABLED arg that we use there?

… VariableVersion."

We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@ailzhang merged this pull request in 7671c15.

@facebook-github-bot facebook-github-bot deleted the gh/ailzhang/60/head branch April 13, 2021 14:16
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
…rsion. (pytorch#55572)

Summary:
Pull Request resolved: pytorch#55572

We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED  the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.

```
// benchmark code
timer = Timer(
    "y = x * x",
    """
    x = torch.rand((3, 3)).requires_grad_()
    """,
    language=Language.PYTHON,
)

 λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
     7236  lookdict_unicode_nodummy
     2600  torch::autograd::VariableType::(...)
      100  0x0000000017751750
       -5  unlink_chunk.isra.0
     -100  0x000000001773e750
     -402  _int_malloc
    -1600  operator delete(...)
    -1600  c10::intrusive_ptr_target::release_resources()
    -2400  c10::VariableVersion::VersionCounter::~VersionCounter()
    -3600  torch::autograd::SavedVariable::operator=(...)
    -4800  operator new(...)
    -6400  torch::autograd::SavedVariable::SavedVariable(...)
    -7200  torch::autograd::SavedVariable::SavedVariable()
    -8400  free
   -16800  malloc
   -24400  _int_free

Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27669074

Pulled By: ailzhang

fbshipit-source-id: a4deb297cc89142ae8bd683284516c881ddf3c87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants