Skip to content

Commit a5b5ea9

Browse files
xwang233facebook-github-bot
authored andcommitted
use new cuda kernel launch code in nvprof parsing (#35016)
Summary: This PR would fix #33986. The meaning of cbid 13 and 211 can be found at here https://github.com/ezyang/nvprof2json/blob/837c094852c9c5164344db7c19432da37d9a8b09/nvprof2json.py#L238 https://github.com/ezyang/nvprof2json/blob/837c094852c9c5164344db7c19432da37d9a8b09/nvprof2json.py#L436 or it can also be found in the header file at `/usr/local/cuda/extras/CUPTI/include/cupti_runtime_cbid.h`. Please also check [this at stackoverflow](https://stackoverflow.com/questions/48552390/whats-the-difference-between-launching-with-an-api-call-vs-the-triple-chevron-s). I also executed the profiling code (in the issue) on CUDA 9.2, and the cbid is already changed to 211. Just in case someone would build pytorch against older CUDA versions, I leave both 13 and 211 in the assertion. cc csarofeen ptrblck ezyang ngimel Pull Request resolved: #35016 Differential Revision: D20550879 Pulled By: ezyang fbshipit-source-id: 968efc5e1126f1dd31acc9f5f4463f351d8a4c4f
1 parent e327255 commit a5b5ea9

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

torch/autograd/profiler.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -787,7 +787,8 @@ def parse_nvprof_trace(path):
787787
unique = EnforceUnique()
788788
for row in conn.execute(kernel_query):
789789
unique.see(row['marker_id'], row['runtime_id'])
790-
assert row['cbid'] == 13 # 13 == Launch
790+
# 211 is cudaKernelLaunch for cuda >= 9.2; 13 is for older cuda versions
791+
assert (row['cbid'] == 211) or (row['cbid'] == 13)
791792
evt = functions_map[row['marker_id']]
792793
evt.append_kernel(row['kernel_name'],
793794
0,

0 commit comments

Comments
 (0)