Memory profiling#37775
Conversation
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
python
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 448af80 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 271 times. |
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
dzhulgakov
left a comment
There was a problem hiding this comment.
What about testing? Can you write unittests? (including enabling profiler in the middle of the allocation)
I'd also double-triple check that memory allocation tracking works well with non-standard allocators, e.g. if one does MKLDNN allocations (with to_mkldnn) or some of the internal ones (huge pages)
| size_table_.erase(it); | ||
| void ProfiledCPUMemoryReporter::Delete(void* ptr) { | ||
| if (memoryProfilingEnabled()) { | ||
| std::lock_guard<std::mutex> guard(mutex_); |
There was a problem hiding this comment.
that'd make execution way slower but I guess it's ok for memory profiling. We should just make sure not to mix the two
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary:
Adding memory usage into profiler table output
Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake
```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)
with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inp)
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]]
empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 []
stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]]
empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 []
is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]]
masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]]
conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]]
contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]]
_convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 []
thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [
--------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 154.855ms
```
Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248)
[ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
dzhulgakov
left a comment
There was a problem hiding this comment.
Great! I think it's good to go! (and nice catch on overlapping ranges)
There's also some allocator stuff left in TH, but I'm not sure what it is and I didn't trace where it gets called:
https://github.com/pytorch/pytorch/blob/40265e2d663cc0027cffa6e80ee1ec67d467ca00/aten/src/TH/THAllocator.cpp
| // An interface for reporting thread local memory usage | ||
| // per device | ||
| struct C10_API MemoryReportingInfoBase : public c10::DebugInfoBase { | ||
| MemoryReportingInfoBase() {} |
There was a problem hiding this comment.
as discussed - move it to .cpp file to avoid potentially duplicated symbols
|
from what I got (thanks @ezyang) this is a special allocator used to allocate tensors in shared memory space used for inter-process communication; I guess we can add memory reporting to there too |
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
Summary: Adding memory usage into profiler table output Test Plan: ``` BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=1 USE_CUDA=1 python setup.py develop install ``` ``` $ python benchmarks/profiler_benchmark/resnet_memory_profiler.py output: https://gist.github.com/ilia-cher/3f37d54c3b2afb24d6776858e6860f69 ``` ``` $ python test/test_autograd.py TestAutograd.test_memory_profiler Couldn't download test skip set, leaving all tests enabled... Running CPU test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 60.58% 105.892us 93.42% 163.285us 163.285us 800 b 0 b 0 b 0 b 1 [] rand 10.53% 18.405us 32.83% 57.393us 57.393us 800 b 0 b 0 b 0 b 1 [] empty 1.77% 3.092us 1.77% 3.092us 3.092us 800 b 800 b 0 b 0 b 1 [] uniform_ 19.64% 34.325us 20.54% 35.896us 35.896us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.90% 1.571us 0.90% 1.571us 1.571us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 6.58% 11.508us 6.58% 11.508us 11.508us -800 b -800 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 174.793us Running CUDA test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 29.37% 86.836us 93.05% 275.143us 275.143us 0 b -800 b 1.00 Kb 0 b 1 [] to 7.42% 21.939us 51.31% 151.703us 151.703us 0 b 0 b 1.00 Kb 0 b 1 [[10, 10]] empty_strided 6.19% 18.295us 6.19% 18.295us 18.295us 0 b 0 b 1.00 Kb 1.00 Kb 1 [] rand 4.50% 13.316us 12.38% 36.604us 36.604us 800 b 0 b 0 b 0 b 1 [] empty 0.83% 2.456us 0.83% 2.456us 2.456us 800 b 800 b 0 b 0 b 1 [] uniform_ 6.44% 19.044us 7.05% 20.832us 20.832us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.60% 1.788us 0.60% 1.788us 1.788us 0 b 0 b 0 b 0 b 1 [[10, 10]] copy_ 37.70% 111.469us 37.70% 111.469us 111.469us 0 b 0 b 0 b 0 b 1 [[10, 10], [10, 10]] test_user_scope_dealloc 6.95% 20.544us 6.95% 20.544us 20.544us 0 b 0 b -1.00 Kb -1.00 Kb 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 295.687us Running MKLDNN test --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- test_user_scope_alloc 34.23% 43.503us 88.57% 112.550us 112.550us 400 b -400 b 0 b 0 b 1 [] rand 8.00% 10.167us 18.34% 23.302us 23.302us 400 b 0 b 0 b 0 b 1 [] empty 2.22% 2.815us 2.22% 2.815us 2.815us 400 b 400 b 0 b 0 b 1 [] to_mkldnn 35.16% 44.675us 36.00% 45.745us 45.745us 400 b 400 b 0 b 0 b 1 [[10, 10]] uniform_ 7.24% 9.198us 8.12% 10.320us 10.320us 0 b 0 b 0 b 0 b 1 [[10, 10]] is_complex 0.88% 1.122us 0.88% 1.122us 1.122us 0 b 0 b 0 b 0 b 1 [[10, 10]] contiguous 0.84% 1.070us 0.84% 1.070us 1.070us 0 b 0 b 0 b 0 b 1 [[10, 10]] test_user_scope_dealloc 11.43% 14.525us 11.43% 14.525us 14.525us -400 b -400 b 0 b 0 b 1 [] --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 127.075us . ---------------------------------------------------------------------- Ran 1 test in 1.571s OK ``` Differential Revision: [D21384248](https://our.internmc.facebook.com/intern/diff/D21384248) [ghstack-poisoned]
|
@ilia-cher merged this pull request in a94fb71. |
|
This broke ROCm tests: |
|
Thank you for the notification. In the future, if a new test for a new feature is breaking ROCm CI, can we have the developer(s) add the |
|
I'm landing the fix #38795 |
|
have you noticed though that py3.6-clang7-rocmdeb-ubuntu16.04-test2 was broken on trunk at least since May 16 |
|
I don't think it was continuously broken, it was fixed on the 19th, then broken again 3 commits later :/ |
Summary: Pull Request resolved: pytorch#37775 Adding memory usage into profiler table output Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake ``` import torch import torchvision.models as models model = models.resnet18() inp = torch.randn(5, 3, 224, 224) with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof: model(inp) print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15)) ``` ``` --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]] empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 [] stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]] empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 [] is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]] masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]] conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]] contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]] _convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 [] thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [ --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 154.855ms ``` Reviewed By: ngimel Differential Revision: D21384248 Pulled By: ilia-cher fbshipit-source-id: 31359cce2aa06f6255ed1ad8c60d03cb640bfec3
Stack from ghstack:
Summary:
Adding memory usage into profiler table output
Test Plan:
Differential Revision: D21384248