CUDA Fuser instrumentation#324
Conversation
| #define FUSER_MACRO_CONCAT(a, b) FUSER_MACRO_CONCAT2(a, b) | ||
| #define FUSER_ANONYMOUS(prefix) FUSER_MACRO_CONCAT(prefix, __COUNTER__) | ||
|
|
||
| #define FUSER_PERF_SCOPE(name) \ |
There was a problem hiding this comment.
Can we enable a debug level on the marker?
So we can easily turn in/out certain level of marker.
There was a problem hiding this comment.
Can you please elaborate? Do you mean levels for individual markers? I'm curious to understand the use cases you have in mind.
jjsjann123
left a comment
There was a problem hiding this comment.
LGTM, thanks for putting all the instrument there already.
| #include <torch/csrc/jit/codegen/cuda/lower2device.h> | ||
|
|
||
| #include <c10/util/Optional.h> | ||
| #include <c10/util/flat_hash_map.h> |
There was a problem hiding this comment.
Don't see where we are using this.
There was a problem hiding this comment.
leftover from a local experiment, removed (thanks for catching it)
(flat_hash_map is potentially a much faster alternative to std::unordered_map)
| /build_* | ||
| .build_debug/* | ||
| .build_release/* | ||
| .build_profile/* |
There was a problem hiding this comment.
Any reason we are adding an entry to ignore?
There was a problem hiding this comment.
it's useful for building RelWithDebInfo builds
There was a problem hiding this comment.
Wasn't this supposed to be separated from a fuser upstream PR? Or, have we decided to sneak it in?
There was a problem hiding this comment.
I have an upstream PR (pytorch#44399). It's approved and imported, although I'm not sure what the "imported" means, but I can't merge the PR due to unrelated CI failures?
In the meantime, I have this change in multiple branches, but I'll try to clean it up so it will not show up in PRs (I'll remove it here as well)
| void Trace::logEvent(char ph, const char* name, char sep) { | ||
| const std::chrono::duration<double> d = Clock::now() - start_timestamp_; | ||
| const double elapsed = d.count() * 1e6; | ||
| const unsigned int pid = 0; |
There was a problem hiding this comment.
pid and tid has not been looked up.
There was a problem hiding this comment.
yes, they are just placeholders for now. I've added a TODO to add support for tracing multi-process & multi-threaded execution (which is not critical for us at this point, and it requires a bit of research to see if we have any Pytorch helpers for portable TID/PIDs)
A prototype for a lightweight Fuser instrumentation.