use new cuda kernel launch code in nvprof parsing (#35016)

xwang233 · facebook-github-bot · commit a5b5ea9852fd · 2020-03-20T08:23:52.000-07:00
Summary: This PR would fix #33986. The meaning of cbid 13 and 211 can be found at here https://github.com/ezyang/nvprof2json/blob/837c094852c9c5164344db7c19432da37d9a8b09/nvprof2json.py#L238 https://github.com/ezyang/nvprof2json/blob/837c094852c9c5164344db7c19432da37d9a8b09/nvprof2json.py#L436 or it can also be found in the header file at `/usr/local/cuda/extras/CUPTI/include/cupti_runtime_cbid.h`. Please also check [this at stackoverflow](https://stackoverflow.com/questions/48552390/whats-the-difference-between-launching-with-an-api-call-vs-the-triple-chevron-s). I also executed the profiling code (in the issue) on CUDA 9.2, and the cbid is already changed to 211. Just in case someone would build pytorch against older CUDA versions, I leave both 13 and 211 in the assertion. cc csarofeen ptrblck ezyang ngimel Pull Request resolved: #35016 Differential Revision: D20550879 Pulled By: ezyang fbshipit-source-id: 968efc5e1126f1dd31acc9f5f4463f351d8a4c4f
diff --git a/torch/autograd/profiler.py b/torch/autograd/profiler.py
@@ -787,7 +787,8 @@ def parse_nvprof_trace(path):
     unique = EnforceUnique()
     for row in conn.execute(kernel_query):
         unique.see(row['marker_id'], row['runtime_id'])
-        assert row['cbid'] == 13  # 13 == Launch
+        # 211 is cudaKernelLaunch for cuda >= 9.2; 13 is for older cuda versions
+        assert (row['cbid'] == 211) or (row['cbid'] == 13)
         evt = functions_map[row['marker_id']]
         evt.append_kernel(row['kernel_name'],
                           0,