ProfileIValue PR#585
Conversation
…or nvfuser. We tried to go around PR pytorch#47667 refactor profiling optional, since upstream is still working on it at this time.
| jit_o = t_jit(x, y) | ||
| # (TODO) check executed kernel, should extend autograd.profiler to fused | ||
| # kernels | ||
| torch.cuda.profiler.start() |
There was a problem hiding this comment.
Did you mean to leave in cuda profiling?
There was a problem hiding this comment.
Doesn't profiler start stop no-op when we don't run it with nvprof/nsys?
But this is a good catch. I was using this to verify kernel executed. There's no good reason to keep them. I'll remove it.
| const int64_t leading_dims = root.size() - sum_to_size.size(); | ||
|
|
||
| // Generate reduction axes for leading dims | ||
| std::vector<int> reduce_dims(leading_dims); |
There was a problem hiding this comment.
It looks like you might be mixing int64_t and int, is that okay? Is this a situation where size_t cannot be used?
There was a problem hiding this comment.
std::vector is needed because of our signature here:
pytorch/torch/csrc/jit/codegen/cuda/arith.h
Line 123 in 9038bb4
I thinks int is pretty safe, given that we are dealing dimensions here, instead of size. So unlikely that we are dealing anything larger than 8.
| )[0].graph | ||
| FileCheck().check(FUSION_GUARD).run(bwd_graph) | ||
|
|
||
| # update shape |
There was a problem hiding this comment.
What is the second shape testing?
There was a problem hiding this comment.
This is mostly trying to verify that we still can handle dynamic shape without recompilation.
I'll put a comment there.
| Node* in_const = | ||
| subgraph.createClone(input->node(), [](Value*) -> Value* { | ||
| throw std::runtime_error("unexpected input"); | ||
| subgraph.createClone(input->node(), [&](Value* v) -> Value* { |
There was a problem hiding this comment.
Are you sure you want to capture the entire environment of the lambda function by reference?
There was a problem hiding this comment.
- We are only capturing pointers; 2. The lambda here is called before the function is returned.
We should be good here.
| graph->createClone(n->input(1)->node(), [](Value*) -> Value* { | ||
| throw std::runtime_error("unexpected input"); | ||
| }); | ||
| const auto map_inputs = [&](Value* v) -> Value* { |
There was a problem hiding this comment.
Another place where the entire environment is captured...
| fusion->removeInput(offset); | ||
|
|
||
| // step b. remove the extra dependency inside fusion; | ||
| for (auto use : fusion_graph->inputs()[offset]->uses()) { |
There was a problem hiding this comment.
Another possible place where use maybe should be a reference?
| void (*fn_run_n_s_)(const Node*, Stack&) = nullptr; | ||
| void (*fn_fuse_graph_)(std::shared_ptr<Graph>&) = nullptr; | ||
| bool (*fn_can_fuse_n_)(const Node*) = nullptr; | ||
| void (*fn_insert_profile_inodes_)(ProfilingRecord* pr) = nullptr; |
There was a problem hiding this comment.
This might be an extreme nitpick but shouldn't the variable names not have the _ suffix given they are public data members? (You can choose to ignore :-) )
There was a problem hiding this comment.
My interpretation of the google style is that the suffix rule only differs between class and struct, but it irrelevant to access level. https://google.github.io/styleguide/cppguide.html#Variable_Names
I am aware that other style guidelines do put suffix based on that. 🤷
There was a problem hiding this comment.
_ suffix is only for private data members (the Google style definition is a bit confusing since it assumes only public members on structs and only private on classes - but the suffix reflects the access level, not struct vs. class)
| const Node* node = n->input(1)->node(); | ||
| // propagate profiled none through other profile_ivalue nodes; | ||
| while (!profiled_none_flag && node->kind() == prim::profile_ivalue) { | ||
| profiled_none_flag |= profiled_none_.count(node->input(0)); |
There was a problem hiding this comment.
Are you sure you mean bitwise OR here? It looks like a logical OR.
There was a problem hiding this comment.
Oops, good catch 😝
|
|
||
| namespace { | ||
|
|
||
| static const auto sizeAttr = Symbol::attr("profiled_size"); |
There was a problem hiding this comment.
Isn't it redundant to use static in an anonymous namespace?
| insertProfileIValue(pr, n, offset); | ||
| } | ||
|
|
||
| for (auto ib : n->blocks()) { |
There was a problem hiding this comment.
Are you sure want to copy the block?
There was a problem hiding this comment.
blocks() returns an at::ArrayRef<Block*> so it's a trivial copy.
createConditionalConstantsupports profile ivalue including bool, int_list and sizesize_eq_guardop to facilitate comparison of dynamic sizessum_to_size&_grad_sum_to_sizeadded in integrationsum_to_sizeand_grad_sum_to_sizeare working properly;