Make VariableVersion::DISABLED the default constructor for VariableVersion.#55572
Make VariableVersion::DISABLED the default constructor for VariableVersion.#55572ailzhang wants to merge 4 commits intogh/ailzhang/60/basefrom
Conversation
…rsion.
We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit c22bf73 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
…rsion.
We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
ghstack-source-id: b9535ec
Pull Request resolved: #55572
… VariableVersion."
We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
// benchmark code
timer = Timer(
"y = x * x",
"""
x = torch.rand((3, 3)).requires_grad_()
""",
language=Language.PYTHON,
)
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
[ghstack-poisoned]
…rsion.
We used to have VariableVersion default constructor
`VariableVersion(uint32_t int version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t int)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
// benchmark code
timer = Timer(
"y = x * x",
"""
x = torch.rand((3, 3)).requires_grad_()
""",
language=Language.PYTHON,
)
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
ghstack-source-id: b9535ec
Pull Request resolved: #55572
… VariableVersion."
We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
// benchmark code
timer = Timer(
"y = x * x",
"""
x = torch.rand((3, 3)).requires_grad_()
""",
language=Language.PYTHON,
)
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.
[ghstack-poisoned]
…rsion.
We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
// benchmark code
timer = Timer(
"y = x * x",
"""
x = torch.rand((3, 3)).requires_grad_()
""",
language=Language.PYTHON,
)
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.
ghstack-source-id: 006a63b
Pull Request resolved: #55572
ezyang
left a comment
There was a problem hiding this comment.
Oh goody. What's the strategy for auditing all the sites? (Pray that CI gets all of them?)
|
@ezyang I first deleted the default constructor to make sure I got all the implicit use of default constructor in compilation error and then made DISABLED one default. So ideally this should catch all of them already. |
albanD
left a comment
There was a problem hiding this comment.
It looks a bit surprising to spray 0s in this code that doesn't really care that the value is 0...
Should we have a ENABLED arg that we use there?
… VariableVersion."
We used to have VariableVersion default constructor
`VariableVersion(uint32_t version=0)`. But sometimes
we override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit `VariableVersion(uint32_t)` constructor.
Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
```
// benchmark code
timer = Timer(
"y = x * x",
"""
x = torch.rand((3, 3)).requires_grad_()
""",
language=Language.PYTHON,
)
λ ~ python compare.py
No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2'
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts
object at 0x7f06c48b3a50>
7236 lookdict_unicode_nodummy
2600 torch::autograd::VariableType::(...)
100 0x0000000017751750
-5 unlink_chunk.isra.0
-100 0x000000001773e750
-402 _int_malloc
-1600 operator delete(...)
-1600 c10::intrusive_ptr_target::release_resources()
-2400 c10::VariableVersion::VersionCounter::~VersionCounter()
-3600 torch::autograd::SavedVariable::operator=(...)
-4800 operator new(...)
-6400 torch::autograd::SavedVariable::SavedVariable(...)
-7200 torch::autograd::SavedVariable::SavedVariable()
-8400 free
-16800 malloc
-24400 _int_free
Total: -67771
```
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling `VariableVersion(uint32_t)` but we should be
able to optimize those in the followup PRs.
[ghstack-poisoned]
…rsion. (pytorch#55572) Summary: Pull Request resolved: pytorch#55572 We used to have VariableVersion default constructor `VariableVersion(uint32_t version=0)`. But sometimes we override the version_counter right after it's constructed. E.g in SavedVariable/TensorImpl. Thus we should make DISABLED the default constructor and else where using explicit `VariableVersion(uint32_t)` constructor. Note this PR effectively changes SavedVariable constructor (which overrides version_counter_ inside) to use the DISABLED constructor and we can see the gains in reduced instruction counts. ``` // benchmark code timer = Timer( "y = x * x", """ x = torch.rand((3, 3)).requires_grad_() """, language=Language.PYTHON, ) λ ~ python compare.py No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2' <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f06c48b3a50> 7236 lookdict_unicode_nodummy 2600 torch::autograd::VariableType::(...) 100 0x0000000017751750 -5 unlink_chunk.isra.0 -100 0x000000001773e750 -402 _int_malloc -1600 operator delete(...) -1600 c10::intrusive_ptr_target::release_resources() -2400 c10::VariableVersion::VersionCounter::~VersionCounter() -3600 torch::autograd::SavedVariable::operator=(...) -4800 operator new(...) -6400 torch::autograd::SavedVariable::SavedVariable(...) -7200 torch::autograd::SavedVariable::SavedVariable() -8400 free -16800 malloc -24400 _int_free Total: -67771 ``` Note there're for other callsites(esp. view related) we just keep it unchanged by explicitly calling `VariableVersion(uint32_t)` but we should be able to optimize those in the followup PRs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27669074 Pulled By: ailzhang fbshipit-source-id: a4deb297cc89142ae8bd683284516c881ddf3c87
Stack from ghstack:
We used to have VariableVersion default constructor
VariableVersion(uint32_t version=0). But sometimeswe override the version_counter right after it's constructed.
E.g in SavedVariable/TensorImpl.
Thus we should make DISABLED the default constructor and else
where using explicit
VariableVersion(uint32_t)constructor.Note this PR effectively changes SavedVariable constructor (which overrides
version_counter_ inside) to use the DISABLED constructor and we
can see the gains in reduced instruction counts.
Note there're for other callsites(esp. view related) we just keep it unchanged by
explicitly calling
VariableVersion(uint32_t)but we should beable to optimize those in the followup PRs.
Differential Revision: D27669074