Skip to content

CUDA Fuser instrumentation#324

Merged
tlemo merged 29 commits into20_8_18_develfrom
perf_instrumentation
Sep 17, 2020
Merged

CUDA Fuser instrumentation#324
tlemo merged 29 commits into20_8_18_develfrom
perf_instrumentation

Conversation

@tlemo
Copy link
Copy Markdown
Collaborator

@tlemo tlemo commented Aug 25, 2020

A prototype for a lightweight Fuser instrumentation.

#define FUSER_MACRO_CONCAT(a, b) FUSER_MACRO_CONCAT2(a, b)
#define FUSER_ANONYMOUS(prefix) FUSER_MACRO_CONCAT(prefix, __COUNTER__)

#define FUSER_PERF_SCOPE(name) \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable a debug level on the marker?
So we can easily turn in/out certain level of marker.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate? Do you mean levels for individual markers? I'm curious to understand the use cases you have in mind.

@tlemo tlemo marked this pull request as ready for review September 15, 2020 20:39
@tlemo tlemo changed the title [WIP] Fuser instrumentation prototype CUDA Fuser instrumentation Sep 15, 2020
Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for putting all the instrument there already.

#include <torch/csrc/jit/codegen/cuda/lower2device.h>

#include <c10/util/Optional.h>
#include <c10/util/flat_hash_map.h>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see where we are using this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from a local experiment, removed (thanks for catching it)

(flat_hash_map is potentially a much faster alternative to std::unordered_map)

Comment thread .gitignore Outdated
/build_*
.build_debug/*
.build_release/*
.build_profile/*
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we are adding an entry to ignore?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's useful for building RelWithDebInfo builds

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this supposed to be separated from a fuser upstream PR? Or, have we decided to sneak it in?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an upstream PR (pytorch#44399). It's approved and imported, although I'm not sure what the "imported" means, but I can't merge the PR due to unrelated CI failures?

In the meantime, I have this change in multiple branches, but I'll try to clean it up so it will not show up in PRs (I'll remove it here as well)

Comment thread torch/csrc/jit/codegen/cuda/instrumentation.h
void Trace::logEvent(char ph, const char* name, char sep) {
const std::chrono::duration<double> d = Clock::now() - start_timestamp_;
const double elapsed = d.count() * 1e6;
const unsigned int pid = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pid and tid has not been looked up.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are just placeholders for now. I've added a TODO to add support for tracing multi-process & multi-threaded execution (which is not critical for us at this point, and it requires a bit of research to see if we have any Pytorch helpers for portable TID/PIDs)

@tlemo tlemo merged commit ee6a20a into 20_8_18_devel Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants