Skip to content

ProfileIValue PR#585

Merged
jjsjann123 merged 14 commits into20_12_3_develfrom
sum_to_size_PR_2
Jan 5, 2021
Merged

ProfileIValue PR#585
jjsjann123 merged 14 commits into20_12_3_develfrom
sum_to_size_PR_2

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 commented Dec 21, 2020

  1. createConditionalConstant supports profile ivalue including bool, int_list and size
  2. New guard to check conditional constant at runtime
  3. size_eq_guard op to facilitate comparison of dynamic sizes
  4. sum_to_size & _grad_sum_to_size added in integration
  • profile size added;
  • sum_to_size and _grad_sum_to_size are working properly;
  • Add test for backwards;
  • Address upstream review comments

@jjsjann123 jjsjann123 changed the title [WIP] Sum to size pr 2 ProfileIValue PR Dec 29, 2020
Comment thread test/test_jit_cuda_fuser.py Outdated
jit_o = t_jit(x, y)
# (TODO) check executed kernel, should extend autograd.profiler to fused
# kernels
torch.cuda.profiler.start()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave in cuda profiling?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't profiler start stop no-op when we don't run it with nvprof/nsys?
But this is a good catch. I was using this to verify kernel executed. There's no good reason to keep them. I'll remove it.

const int64_t leading_dims = root.size() - sum_to_size.size();

// Generate reduction axes for leading dims
std::vector<int> reduce_dims(leading_dims);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you might be mixing int64_t and int, is that okay? Is this a situation where size_t cannot be used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector is needed because of our signature here:

const std::vector<int>& reduction_axes,

I thinks int is pretty safe, given that we are dealing dimensions here, instead of size. So unlikely that we are dealing anything larger than 8.

Comment thread test/test_jit_cuda_fuser.py Outdated
)[0].graph
FileCheck().check(FUSION_GUARD).run(bwd_graph)

# update shape
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the second shape testing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly trying to verify that we still can handle dynamic shape without recompilation.
I'll put a comment there.

Node* in_const =
subgraph.createClone(input->node(), [](Value*) -> Value* {
throw std::runtime_error("unexpected input");
subgraph.createClone(input->node(), [&](Value* v) -> Value* {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to capture the entire environment of the lambda function by reference?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We are only capturing pointers; 2. The lambda here is called before the function is returned.
    We should be good here.

graph->createClone(n->input(1)->node(), [](Value*) -> Value* {
throw std::runtime_error("unexpected input");
});
const auto map_inputs = [&](Value* v) -> Value* {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where the entire environment is captured...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

fusion->removeInput(offset);

// step b. remove the extra dependency inside fusion;
for (auto use : fusion_graph->inputs()[offset]->uses()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible place where use maybe should be a reference?

Comment thread torch/csrc/jit/codegen/cuda/interface.h Outdated
void (*fn_run_n_s_)(const Node*, Stack&) = nullptr;
void (*fn_fuse_graph_)(std::shared_ptr<Graph>&) = nullptr;
bool (*fn_can_fuse_n_)(const Node*) = nullptr;
void (*fn_insert_profile_inodes_)(ProfilingRecord* pr) = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an extreme nitpick but shouldn't the variable names not have the _ suffix given they are public data members? (You can choose to ignore :-) )

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation of the google style is that the suffix rule only differs between class and struct, but it irrelevant to access level. https://google.github.io/styleguide/cppguide.html#Variable_Names
I am aware that other style guidelines do put suffix based on that. 🤷

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ suffix is only for private data members (the Google style definition is a bit confusing since it assumes only public members on structs and only private on classes - but the suffix reflects the access level, not struct vs. class)

const Node* node = n->input(1)->node();
// propagate profiled none through other profile_ivalue nodes;
while (!profiled_none_flag && node->kind() == prim::profile_ivalue) {
profiled_none_flag |= profiled_none_.count(node->input(0));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you mean bitwise OR here? It looks like a logical OR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch 😝

Comment thread torch/csrc/jit/codegen/cuda/parser.cpp Outdated

namespace {

static const auto sizeAttr = Symbol::attr("profiled_size");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it redundant to use static in an anonymous namespace?

insertProfileIValue(pr, n, offset);
}

for (auto ib : n->blocks()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure want to copy the block?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocks() returns an at::ArrayRef<Block*> so it's a trivial copy.

Copy link
Copy Markdown
Collaborator

@kevinstephano kevinstephano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjsjann123 jjsjann123 merged commit 97d5d76 into 20_12_3_devel Jan 5, 2021
@csarofeen csarofeen deleted the sum_to_size_PR_2 branch June 9, 2021 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants