Skip to content

refactor profiling optional#47667

Closed
jjsjann123 wants to merge 15 commits intogh/jjsjann123/3/basefrom
gh/jjsjann123/3/head
Closed

refactor profiling optional#47667
jjsjann123 wants to merge 15 commits intogh/jjsjann123/3/basefrom
gh/jjsjann123/3/head

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 commented Nov 10, 2020

Stack from ghstack:

Differential Revision: D25255572

[ghstack-poisoned]
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Nov 10, 2020

💊 CI failures summary and remediations

As of commit 9ceca35 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@Krovatkin Krovatkin requested a review from eellison November 13, 2020 19:14
Copy link
Copy Markdown
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, i'm a little wary of making changes just because of how easy it is to break something on the backward pass without knowing that you did it. Do you mind running a set of benchmarks to validate that this doesn't regress anything ?

Ideally that infrastructure would be automated, but not yet...

}

void InsertProfileNodesForSpecializeAutogradZero(ProfilingRecord* pr) {
insertProfileNodesForSpecializeAutogradZero_(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove _ suffix, i dont htink that follows naming convention

auto copy = graph->copy();
runProfilingInsensitiveOptimizations(copy);
pr_ = ProfilingRecord::instrumentGraph(copy);
// TODO: order for now is important; we should make
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the context of this ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad on this one during rebase.
The comment should be added when we add a second pass to insert profile_ivalue nodes for nvfuser https://github.com/pytorch/pytorch/pull/47668/files#diff-4b6a31b50172ed0fce36027a913fab9e59e496954264ad5e021324c74355a9c2R498-R500

cc'ing @Krovatkin

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison
If I remember correctly if we run InsertProfileNodesForCUDAFuser first it will insert a profile node on a list input (which is also a graph input) to aten::grad_sum_to_size which would in turn break the logic in InsertProfileNodesForSpecializeAutogradZero that checks that there's a direct use of the graph input that is also aten::grad_sum_to_size : https://github.com/pytorch/pytorch/pull/47667/files#diff-d606514ce9e63ab23356691f3f6df0616222cd50b8a41239add4601f14a53392R16

Ideally, it would be nice if different passes could insert profile nodes in whatever order, but I wasn't quite convinced of the merit of that approach, so I just decided to document it.

for (auto it = b->nodes().begin(); it != b->nodes().end(); it++) {
if (it->kind() == prim::profile || it->kind() == prim::profile_optional) {
if (it->kind() == prim::profile || it->kind() == prim::profile_optional ||
it->kind() == prim::profile_ivalue) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove all references to profile_optional ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@jjsjann123
Copy link
Copy Markdown
Collaborator Author

I quickly tried this with a very simple graph:

def t(x : torch.Tensor, y : torch.Tensor):
  o = torch.relu(x)
  o = o + y
  return o

I tried input x/y with / without broadcast and I can verify behavior and performance on _grad_sum_to_size

The refactor of this PR only changes the symbol from profile_optional to profile_ivalue, while the logic remains largely the same.

jjsjann123 added a commit to jjsjann123/pytorch that referenced this pull request Nov 20, 2020
ghstack-source-id: bc60515
Pull Request resolved: pytorch#47667
@jjsjann123 jjsjann123 requested a review from eellison November 20, 2020 16:58
Copy link
Copy Markdown
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for doing this!

Mostly small comments/nits, but I would like to retain the original flow or specialize_autogradzero before landing. for context, there have been very seemingly-benign changes that completely destroyed perf for hard-to-detect reasons. e.g.: #43900

backwards-graph doesn't have great visibility or testing and i would prefer not to make changes unrelated to this PR.


// we only check prim::profile and not prim::profile_optional bc profile
// is inserted on each use, while profile_optional is inserted on the def
// we only check prim::profile and not prim::profile_ivalue bc profile
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason profile_optional was inserted on def and not per use is because we were just checking whether or not a value is None, which stays constant for each use.

General ivalue profiling doesn't have the same property, since you could profile mutable values like List which change over each use. So, this comment will be out of date once we have more uses of profile_ivalue.

i also wonder whether there's some value in sharing implementation, but having a different node name. depends i guess on the number of different ivalue profiling we have

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to update the comment (and the checks as well?)

sharing implementation, but having a different node name
I think this is suggesting to have profile_ivalue as an extension to profile_optional, which is similar to what I have in the original PR prior to Nick's refactor.

IIUC, the refactor is to have a unique profile_ivalue node, where we would hopefully work towards a coherent profiling strategy that different passes relying on deploying/exploiting profiling ivalues could work consequently with each other. Granted that this is not what we have today, at least the order of passes has to be arranged in certain way to work. More so that it's tricky to cover all the corner cases.

I think the suggested middle ground to have different node name could make it easier to prevent profiling nodes inserted by different passes to breach into each other's territory for now. And we can merge them at later point when we figure out a universal profiling strategy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison

i also wonder whether there's some value in sharing implementation, but having a different node name. depends i guess on the number of different ivalue profiling we have

at the very least, it would give a better readability and slightly easier check since you only have to look for a specific symbol (rather a symbol + attribute)

On the other hand, the way we are adding profiling nodes, switching to different node names would result in a small explosion of copy-pasted classes in ir.h.
Also, if we decide in future to merge profiling node chains into a single profiling node with all the attributes, I think, the benefits of having just one single node on a use outweighs differently named nodes.
I would hold off on this approach.

@jjsjann123 @eellison

Granted that this is not what we have today, at least the order of passes has to be arranged in certain way to work

I would like to emphasize that this is no way the limitation of the current design! We could update how SpecializeAutogradZero inserts profiling nodes to account for a possibility that some other pass might have already inserted profiling nodes in between. Given we only have 3 passes that insert profiling nodes I decided to just document the order.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can figure out a way to have a different node-naming without a ton of copy-pasta, that's the best. I don't really know how to do that; maybe someone really good at C++ can.

Also, if we decide in future to merge profiling node chains into a single profiling node with all the attributes, I think, the benefits of having just one single node on a use outweighs differently named nodes.

i don't think this is coming anytime soon (if at all). I don't think we should over-consider this.

Granted that this is not what we have today, at least the order of passes has to be arranged in certain way to work

I think most things should be profiling per-use, which composes pretty well. specialize_autograd_zero is pretty unique

}
}

static Node* getUseWithAttribute(Value* inp, Symbol kind) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getUseWithAttribute doesn't fully reflect the implementation, since we're also filtering out non profile_ivalue uses

checks.push_back(check->output());
profiled_none_.insert(inp);
}
removeProfiledOptionalUses(inp);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that changing removeProfiledOptionalUses(inp) to after guardSpecializations doesn't have some edge-case effect somewhere. Could we keep the code here as it was to minimize changes ?

Copy link
Copy Markdown
Contributor

@Krovatkin Krovatkin Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there were multiple reasons why I moved it after guardSpecializations. One of them was that there could be arbitrary long chains of prim::profiles on some graph inputs. Even for a very simple case I considered where **both ** TE and SpecializeAutogradZero want to profile the optional list argument to aten::grad_sum_to_size, rewriting removeProfiledOptionalUses looked very ugly and I hit some edge cases there.
In contrast, walking a graph and removing after guardSpecializations is done looked so much simpler and it also looked quite similar to what we do elsewhere.

I tested this with fastrnns and our CI tests and didn't hit any regressions or crashes.
If you feel very strongly that removeProfiledOptionalUses(inp) should stay in this specific spot, we could try figuring it out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "rewriting removeProfiledOptionalUses looked very ugly" that is a good sign that pass writing is becoming more difficult as a result of the overloaded profile_ivalue cc @jjsjann123.

i don't feel that strongly, but I do prefer it as it was. IMO when there isn't a large overhead of doing so, we should be cleaning the graph as we are manipulating it in the pass. It's the same reason I added

// Remove the dead original graph
: it was extremely confusing to look at the pass midway through and see three copies of the graph. Why was it there ? Every time I debugged the pass I had to re-remind myself and mentally edit out the dead copy of the graph. Similarly, after we've made the decision whether or not to specialize on None values, why not remove the nodes right there? They are just visual noise for the duration of guardSpecializations.

The logic right now isn't quite right, either.

def foo(x):
   x = profile_ivalue[None=1, Present=1]
   return x

Here we won't insert any specializations, so the profile_ivalue nodes will remain in the graph. Granted this isn't likely to happen, but it's sort of a fallout of delaying logic past when it needs to be.

I am maybe more opinionated about this than I should be because in the past I've had to deal with passes that had incorrect IR (like when we never updated the requires_grad property) or other passes that similarly left things in confusing states throughout the pass.

jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request Dec 19, 2020
…or nvfuser.

We tried to go around PR pytorch#47667 refactor profiling optional, since upstream is
still working on it at this time.
jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request Jan 5, 2021
* This is a cherry-pick from upstream PR pytorch#47668 profile ivalue for nvfuser. We tried to go around PR pytorch#47667 refactor profiling optional, since upstream is still working on it at this time.

createConditionalConstant supports profile ivalue including bool, int_list and size
New guard to check conditional constant at runtime
size_eq_guard op to facilitate comparison of dynamic sizes
sum_to_size & _grad_sum_to_size added in integration
Copy link
Copy Markdown
Contributor

@Krovatkin Krovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we addressed @eellison's feedback (e.g. moving removeProfiledOptionalUses back to its original spot and double checking we aren't seeing any obvious regressions on fastrnns).

@jjsjann123 could you please fix clang-format failures and rebase this PR and I'll try to merge it?

@jjsjann123
Copy link
Copy Markdown
Collaborator Author

clang-format seems to be broken, I'll try rebase again tomorrow to see if someone got it fixed.

@jjsjann123
Copy link
Copy Markdown
Collaborator Author

Hmm, even though my local clang-format is still failing... CI magically works :)
But the PR apparently is giving me merge conflicts.. :(

@jjsjann123
Copy link
Copy Markdown
Collaborator Author

clang-tidy error is not on code that's modified by this PR. Should I try to tidy them up here? rocm failure doesn't seem to be relevant.

@Krovatkin Feel free to yell at me to fix clang-tidy, otherwise, this one is good to go.

@Krovatkin
Copy link
Copy Markdown
Contributor

@jjsjann123 that should be fine. @eellison wanted to take one last look, he should be getting to it shortly.

@eellison
Copy link
Copy Markdown
Contributor

LGTM !

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@Krovatkin merged this pull request in dd1c2a0.

@facebook-github-bot facebook-github-bot deleted the gh/jjsjann123/3/head branch January 26, 2021 15:21
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary: Pull Request resolved: pytorch#47667

Test Plan: Imported from OSS

Reviewed By: anjali411, ngimel

Differential Revision: D25255572

Pulled By: Krovatkin

fbshipit-source-id: d0152c9ef5b1994e27be9888bcb123dca3ecd88f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants