Skip to content

Eager module attribution in profiler stack traces#48433

Merged
ilia-cher merged 4 commits intogh/ilia-cher/86/basefrom
gh/ilia-cher/86/head
Dec 16, 2020
Merged

Eager module attribution in profiler stack traces#48433
ilia-cher merged 4 commits intogh/ilia-cher/86/basefrom
gh/ilia-cher/86/head

Conversation

@ilia-cher
Copy link
Copy Markdown
Contributor

@ilia-cher ilia-cher commented Nov 25, 2020

Stack from ghstack:

Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager
output: https://gist.github.com/ilia-cher/e988a43dc9a444ae8caa68f3e6b0a294
e.g.:

----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
                        Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  Source Location                                                              
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
    aten::mkldnn_convolution        98.47%      30.425ms        99.07%      30.610ms      30.610ms             1  ...s/iliacher/pytorch/torch/nn/modules/conv.py(389): _conv_forward (Conv2d)  
                                                                                                                  ...a/users/iliacher/pytorch/torch/nn/modules/conv.py(393): forward (Conv2d)  
                                                                                                                  ...rs/iliacher/pytorch/torch/nn/modules/module.py(744): _call_impl (Conv2d)  
                                                                                                                  test/test_profiler.py(172): forward (DummyModule_1)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_1)  
                                                                                                                  test/test_profiler.py(180): forward (DummyModule_2)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_2)  
                                                                                                                  test/test_profiler.py(185): test_module_attrib_eager (TestProfiler)          
                                                                                                                  ...orch/lib/python3.8/unittest/case.py(633): _callTestMethod (TestProfiler)  
                                                                                                                  ...da3/envs/pytorch/lib/python3.8/unittest/case.py(676): run (TestProfiler)  
                                                                                                                                                                                               

Differential Revision: D25174271

Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager

[ghstack-poisoned]
ilia-cher pushed a commit that referenced this pull request Nov 25, 2020
Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager

ghstack-source-id: ac55cd2
Pull Request resolved: #48433
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Nov 25, 2020
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Nov 25, 2020

💊 CI failures summary and remediations

As of commit 045c0c2 (more details on the Dr. CI page):


  • 8/8 failures possibly* introduced in this PR
    • 2/8 non-CircleCI failure(s)

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_6_clang9_test (1/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Nov 30 05:01:48 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Nov 30 05:01:48 At: 
Nov 30 05:01:48   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 05:01:48   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 05:01:48  
Nov 30 05:01:48 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Nov 30 05:01:48  
Nov 30 05:01:48 At: 
Nov 30 05:01:48   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 05:01:48   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 05:01:48  
Nov 30 05:01:48 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Nov 30 05:01:48  
Nov 30 05:01:48 At: 
Nov 30 05:01:48   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 05:01:48   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 05:01:48  
Nov 30 05:01:48 [W tensorpipe_agent.cpp:504] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 05:01:48 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 05:01:48 ok (1.635s) 
Nov 30 05:01:50   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:504] RPC agent for worker3 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 05:01:50 [W tensorpipe_agent.cpp:504] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (2/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Nov 30 05:55:02 torch/autograd/profiler.py:1487: error: Unsupported operand types for - ("int" and "None") [operator]
Nov 30 05:55:02  
Nov 30 05:55:02 ====================================================================== 
Nov 30 05:55:02 FAIL [83.946s]: test_run_mypy (__main__.TestTypeHints) 
Nov 30 05:55:02 Runs mypy over all files specified in mypy.ini 
Nov 30 05:55:02 ---------------------------------------------------------------------- 
Nov 30 05:55:02 Traceback (most recent call last): 
Nov 30 05:55:02   File "test_type_hints.py", line 217, in test_run_mypy 
Nov 30 05:55:02     self.fail(f"mypy failed: {stdout} {stderr}") 
Nov 30 05:55:02 AssertionError: mypy failed: torch/autograd/profiler.py:1486: error: Unsupported operand types for > ("int" and "None")  [operator] 
Nov 30 05:55:02 torch/autograd/profiler.py:1486: note: Right operand is of type "Optional[int]" 
Nov 30 05:55:02 torch/autograd/profiler.py:1487: error: Unsupported operand types for - ("int" and "None")  [operator] 
Nov 30 05:55:02 torch/autograd/profiler.py:1487: note: Right operand is of type "Optional[int]" 
Nov 30 05:55:02 Found 2 errors in 1 file (checked 1153 source files) 
Nov 30 05:55:02   
Nov 30 05:55:02  
Nov 30 05:55:02 ---------------------------------------------------------------------- 
Nov 30 05:55:02 Ran 4 tests in 130.050s 
Nov 30 05:55:02  
Nov 30 05:55:02 FAILED (failures=1) 
Nov 30 05:55:02  
Nov 30 05:55:02 Generating XML reports... 

See CircleCI build pytorch_kineto_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (3/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Nov 30 07:44:24 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Nov 30 07:44:24 At: 
Nov 30 07:44:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 07:44:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 07:44:24  
Nov 30 07:44:24 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Nov 30 07:44:24  
Nov 30 07:44:24 At: 
Nov 30 07:44:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 07:44:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 07:44:24  
Nov 30 07:44:24 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Nov 30 07:44:24  
Nov 30 07:44:24 At: 
Nov 30 07:44:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 07:44:24   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 07:44:24  
Nov 30 07:44:24 [W tensorpipe_agent.cpp:504] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 07:44:24 [W tensorpipe_agent.cpp:504] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 07:44:24 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 07:44:24 [W tensorpipe_agent.cpp:504] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 07:44:24 ok (1.227s) 

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (4/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Nov 30 04:22:28 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:15:3 in
Nov 30 04:22:28     #7 0x559a022b970b in PyEval_EvalCode /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:731 
Nov 30 04:22:28     #8 0x559a02339573 in run_mod /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:1025 
Nov 30 04:22:28     #9 0x559a0233960c in PyRun_StringFlags /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:949 
Nov 30 04:22:28     #10 0x559a0233966e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:445 
Nov 30 04:22:28     #11 0x559a0233d472 in run_command /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:301 
Nov 30 04:22:28     #12 0x559a0233d472 in Py_Main /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:749 
Nov 30 04:22:28     #13 0x559a0220743d in main /tmp/build/80754af9/python_1599604603603/work/Programs/python.c:69 
Nov 30 04:22:28     #14 0x7f7e28ded83f in __libc_start_main /build/glibc-e6zv40/glibc-2.23/csu/../csu/libc-start.c:291 
Nov 30 04:22:28     #15 0x559a022e6d0a in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
Nov 30 04:22:28  
Nov 30 04:22:28 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:15:3 in  
Nov 30 04:22:28 + retcode=1 
Nov 30 04:22:28 + set -e 
Nov 30 04:22:28 + return 1 
Nov 30 04:22:28 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]] 
Nov 30 04:22:28 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]] 
Nov 30 04:22:28 + '[' -n https://github.com/pytorch/pytorch/pull/48433 ']' 
Nov 30 04:22:28 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]] 
Nov 30 04:22:28 ++ mktemp 
Nov 30 04:22:28 + DETERMINE_FROM=/tmp/tmp.0tiRp2k3ut 
Nov 30 04:22:28 + file_diff_from_base /tmp/tmp.0tiRp2k3ut 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (5/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Nov 30 05:48:33 torch/autograd/profiler.py:1487: error: Unsupported operand types for - ("int" and "None") [operator]
Nov 30 05:48:33   test_type_hint_examples (__main__.TestTypeHints) ... ok (19.923s) 
Nov 30 05:48:33  
Nov 30 05:48:33 ====================================================================== 
Nov 30 05:48:33 FAIL [62.026s]: test_run_mypy (__main__.TestTypeHints) 
Nov 30 05:48:33 ---------------------------------------------------------------------- 
Nov 30 05:48:33 Traceback (most recent call last): 
Nov 30 05:48:33   File "test_type_hints.py", line 217, in test_run_mypy 
Nov 30 05:48:33     self.fail(f"mypy failed: {stdout} {stderr}") 
Nov 30 05:48:33 AssertionError: mypy failed: torch/autograd/profiler.py:1486: error: Unsupported operand types for > ("int" and "None")  [operator] 
Nov 30 05:48:33 torch/autograd/profiler.py:1486: note: Right operand is of type "Optional[int]" 
Nov 30 05:48:33 torch/autograd/profiler.py:1487: error: Unsupported operand types for - ("int" and "None")  [operator] 
Nov 30 05:48:33 torch/autograd/profiler.py:1487: note: Right operand is of type "Optional[int]" 
Nov 30 05:48:33 Found 2 errors in 1 file (checked 1153 source files) 
Nov 30 05:48:33   
Nov 30 05:48:33  
Nov 30 05:48:33 ---------------------------------------------------------------------- 
Nov 30 05:48:33 Ran 4 tests in 96.820s 
Nov 30 05:48:33  
Nov 30 05:48:33 FAILED (failures=1) 
Nov 30 05:48:33  
Nov 30 05:48:33 Generating XML reports... 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (6/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Nov 30 05:14:18 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Nov 30 05:14:18 At: 
Nov 30 05:14:18   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 05:14:18   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 05:14:18  
Nov 30 05:14:18 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Nov 30 05:14:18  
Nov 30 05:14:18 At: 
Nov 30 05:14:18   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 05:14:18   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 05:14:18  
Nov 30 05:14:18 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Nov 30 05:14:18  
Nov 30 05:14:18 At: 
Nov 30 05:14:18   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Nov 30 05:14:18   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Nov 30 05:14:18  
Nov 30 05:14:18 [W tensorpipe_agent.cpp:504] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 05:14:18 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 05:14:18 ok (1.836s) 
Nov 30 05:14:20   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:504] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Nov 30 05:14:20 [W tensorpipe_agent.cpp:504] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown) 

Extra GitHub checks: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

ilia-cher added 3 commits November 24, 2020 20:04
Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager
output: https://gist.github.com/ilia-cher/e988a43dc9a444ae8caa68f3e6b0a294
e.g.:
```
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
                        Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  Source Location                                                              
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
    aten::mkldnn_convolution        98.47%      30.425ms        99.07%      30.610ms      30.610ms             1  ...s/iliacher/pytorch/torch/nn/modules/conv.py(389): _conv_forward (Conv2d)  
                                                                                                                  ...a/users/iliacher/pytorch/torch/nn/modules/conv.py(393): forward (Conv2d)  
                                                                                                                  ...rs/iliacher/pytorch/torch/nn/modules/module.py(744): _call_impl (Conv2d)  
                                                                                                                  test/test_profiler.py(172): forward (DummyModule_1)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_1)  
                                                                                                                  test/test_profiler.py(180): forward (DummyModule_2)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_2)  
                                                                                                                  test/test_profiler.py(185): test_module_attrib_eager (TestProfiler)          
                                                                                                                  ...orch/lib/python3.8/unittest/case.py(633): _callTestMethod (TestProfiler)  
                                                                                                                  ...da3/envs/pytorch/lib/python3.8/unittest/case.py(676): run (TestProfiler)  
                                                                                                                                                                                               

```

[ghstack-poisoned]
Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager
output: https://gist.github.com/ilia-cher/e988a43dc9a444ae8caa68f3e6b0a294
e.g.:
```
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
                        Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  Source Location                                                              
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
    aten::mkldnn_convolution        98.47%      30.425ms        99.07%      30.610ms      30.610ms             1  ...s/iliacher/pytorch/torch/nn/modules/conv.py(389): _conv_forward (Conv2d)  
                                                                                                                  ...a/users/iliacher/pytorch/torch/nn/modules/conv.py(393): forward (Conv2d)  
                                                                                                                  ...rs/iliacher/pytorch/torch/nn/modules/module.py(744): _call_impl (Conv2d)  
                                                                                                                  test/test_profiler.py(172): forward (DummyModule_1)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_1)  
                                                                                                                  test/test_profiler.py(180): forward (DummyModule_2)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_2)  
                                                                                                                  test/test_profiler.py(185): test_module_attrib_eager (TestProfiler)          
                                                                                                                  ...orch/lib/python3.8/unittest/case.py(633): _callTestMethod (TestProfiler)  
                                                                                                                  ...da3/envs/pytorch/lib/python3.8/unittest/case.py(676): run (TestProfiler)  
                                                                                                                                                                                               

```

Differential Revision: [D25174271](https://our.internmc.facebook.com/intern/diff/D25174271)

[ghstack-poisoned]
Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager
output: https://gist.github.com/ilia-cher/e988a43dc9a444ae8caa68f3e6b0a294
e.g.:
```
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
                        Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  Source Location                                                              
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
    aten::mkldnn_convolution        98.47%      30.425ms        99.07%      30.610ms      30.610ms             1  ...s/iliacher/pytorch/torch/nn/modules/conv.py(389): _conv_forward (Conv2d)  
                                                                                                                  ...a/users/iliacher/pytorch/torch/nn/modules/conv.py(393): forward (Conv2d)  
                                                                                                                  ...rs/iliacher/pytorch/torch/nn/modules/module.py(744): _call_impl (Conv2d)  
                                                                                                                  test/test_profiler.py(172): forward (DummyModule_1)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_1)  
                                                                                                                  test/test_profiler.py(180): forward (DummyModule_2)                          
                                                                                                                  ...cher/pytorch/torch/nn/modules/module.py(744): _call_impl (DummyModule_2)  
                                                                                                                  test/test_profiler.py(185): test_module_attrib_eager (TestProfiler)          
                                                                                                                  ...orch/lib/python3.8/unittest/case.py(633): _callTestMethod (TestProfiler)  
                                                                                                                  ...da3/envs/pytorch/lib/python3.8/unittest/case.py(676): run (TestProfiler)  
                                                                                                                                                                                               

```

Differential Revision: [D25174271](https://our.internmc.facebook.com/intern/diff/D25174271)

[ghstack-poisoned]
@ilia-cher
Copy link
Copy Markdown
Contributor Author

will add usage of named_modules

@ilia-cher ilia-cher requested review from jamesr66a and suo November 30, 2020 23:18
@ilia-cher
Copy link
Copy Markdown
Contributor Author

ilia-cher commented Nov 30, 2020

@jamesr66a @suo on support for TorchScript

Copy link
Copy Markdown
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in the right direction. Creating a separate stack trace with module names (not just class names) would be the most helpful, but it can happen in the follow up PR

header_sep_lst = [""]
line_length_lst = [-SPACING_SIZE]
MAX_STACK_ENTRY = 5
MAX_STACK_ENTRY = 10
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make it overridable too?

if (PyDict_Check(locals)) {
PyObject *key, *value;
ssize_t pos = 0;
while (PyDict_Next(locals, &pos, &key, &value)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be an api for looking up in the dict, instead of iteration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will check

auto source = std::make_shared<Source>(funcname, filename, line);
std::string classname = "";

if (PyFrame_FastToLocalsWithError(frame) >= 0) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to trigger it only when you encounter the frame that belongs to torch.nn.Module._call_impl. You can do it by looking up the code object for that method once and then just compare it with frame->f_code. That's how https://colab.research.google.com/drive/1NGRzvgzCZR6EEM3RVPvPAuMXfHsyC6eN?usp=sharing worked

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I went with all class names, not only Module specific; we could limit this to Module-inherited ones

@ilia-cher ilia-cher merged commit 051719f into gh/ilia-cher/86/base Dec 16, 2020
ilia-cher pushed a commit that referenced this pull request Dec 16, 2020
Summary:
Adding classnames into profiler stack traces (eager mode only atm)

Test Plan:
python test/test_profiler.py -k test_module_attrib_eager

ghstack-source-id: 980381f
Pull Request resolved: #48433
@ilia-cher
Copy link
Copy Markdown
Contributor Author

this shouldn't be marked as 'merged', strange i only reordered commits and reran ghstack (cc. @ezyang)

@ilia-cher
Copy link
Copy Markdown
Contributor Author

will resend as a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants