Skip to content

[NVFuser] make comparators obey strict weak ordering#75713

Closed
davidberard98 wants to merge 5 commits intogh/davidberard98/89/basefrom
gh/davidberard98/89/head
Closed

[NVFuser] make comparators obey strict weak ordering#75713
davidberard98 wants to merge 5 commits intogh/davidberard98/89/basefrom
gh/davidberard98/89/head

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Apr 12, 2022

Stack from ghstack:

Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular comp(a, a) == false was failing on lower_loops.cpp.

Differential Revision: D35599653

Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 12, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit ddd7dcd (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Apr 12, 2022
davidberard98 added a commit that referenced this pull request Apr 12, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

ghstack-source-id: 660898e
Pull Request resolved: #75713
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Differential Revision: [D35599653](https://our.internmc.facebook.com/intern/diff/D35599653)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 13, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

ghstack-source-id: e3cfd10
Pull Request resolved: #75713
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

const auto& dependencies_1 = concrete_id_dependencies_.at(concrete_id_1);
// if id1 depends on id0 it means id0 is inside id1, so id1 < id0
return !dependencies_1.count(concrete_id_0);
if (dependencies_1.count(concrete_id_0)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right... We can only return false after this point. In the old code, we could return true if dependencies_1.count(concrete_id_0) == false

Copy link
Contributor Author

@davidberard98 davidberard98 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my understanding, but lmk if something seems wrong here:

  • this looks like it's basically trying to do a toposort, where, if a depends on b, then we want a to go before b in the final sorted order. afaik, the typical comparator for this is "if a path a->b exists, then comp(a, b) = true. Otherwise comp(a, b) = false". Or in this case, if a maps to a set containing b, then return comp(a, b) = true. In particular, if there's no relationship between a and b, then comp(a, b) and comp(b, a) should both be false (unless you have some other way to break ties in this case)

if this the type of sort I was describing above, then dependencies_1.count(concrete_id_0) == false is a case where we don't necessarily have any dependence between dep_1 and dep_0; in other words, 0 -> 1 (dep_0 maps to dep_1) implies we should return true; and 0 -> 1 implies 1 -/-> 0; but 1 -/-> 0 doesn't imply 0 -> 1, so 1 -/-> 0 doesn't necessarily mean we should return true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds very reasonable to me (without knowing much about the actual implementation).

I'd argue in that instance, the entire block inside could just be removed, since it only returns false inside and that is the default behavior after the true block.

if (concrete_id_dependencies_.find(concrete_id_1) !=
concrete_id_dependencies_.end()) {
const auto& dependencies_1 = concrete_id_dependencies_.at(concrete_id_1);
// if id1 depends on id0 it means id0 is inside id1, so id1 < id0
if (dependencies_1.count(concrete_id_0)) {
return false;
}
}

Having said that, I'm waiting on someone who actually knows what this does to respond. cc'ing @csarofeen @naoyam

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's true - I can delete those blocks if we decide this change is okay

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidberard98. I believe the fix makes sense and is indeed a nice catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks for the help reviewing @naoyam @jjsjann123

I removed the false blocks like @jjsjann123 suggested

const auto& dependencies_1 = concrete_id_dependencies_.at(concrete_id_1);
// if id1 depends on id0 it means id1 is inside id0, so id0 < id1
return dependencies_1.count(concrete_id_0);
if (dependencies_1.count(concrete_id_0)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the first block and the second block are checking the same thing from different perspective and that's what strict weak ordering checks as well. So even though it looks slightly strange from the code diff, they are still correct.

I'm stamping it.

@jjsjann123
Copy link
Collaborator

Fort this check comp(a, a) == false

Maybe we can add a quick short cut where we just return false when both things are pointing to the same value?! cc'ing @csarofeen

Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing on lower_loops.cpp.

Differential Revision: [D35599653](https://our.internmc.facebook.com/intern/diff/D35599653)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 14, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

ghstack-source-id: 9f8b1a6
Pull Request resolved: #75713
Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

const auto& dependencies_1 = concrete_id_dependencies_.at(concrete_id_1);
// if id1 depends on id0 it means id1 is inside id0, so id0 < id1
return dependencies_1.count(concrete_id_0);
if (dependencies_1.count(concrete_id_0)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the first block and the second block are checking the same thing from different perspective and that's what strict weak ordering checks as well. So even though it looks slightly strange from the code diff, they are still correct.

I'm stamping it.

@csarofeen
Copy link
Contributor

Thanks @davidberard98 let me check this quickly today and will get back to you on the changes.

@csarofeen
Copy link
Contributor

csarofeen commented Apr 14, 2022

Thanks again, @davidberard98. I was just trying to confirm some properties of this sorting. So what they're actually doing is a "topo sort" of "loop nests". I naively assumed when I wrote that code that there was a single unique ordering of loop nests, but in fact it makes sense that in certain kernels that's not the case, and multiple orderings can be correct. As long as both orderings are consistent (perfectly reversed actually), then we should be fine. concrete_id_1 and concrete_id_2 in both of these blocks should have the same exact information, we just want sorting order to be exactly reversed.

So the block in lower_expr_sort the old logic was:

  if(id0 dependent on id1)
    id0 < id1
  else if(id1 dependent on id0)
    id0 >= id1
  else
    id0 < id1

and in lower_loops:

  if(id0 dependent on id1)
    id0 >= id1
  else if(id1 dependent on id0)
    id0 < id1
  else
    id0 < id1

With your updated logic I believe it's in lower_expr_sort:

  if(id0 dependent on id1)
    id0 < id1
  else
    id0 >= id1

and in lower_loops:

  if(id0 dependent on id1)
    id0 >= id1
  else
    id0 < id1

So a few things. I agree there's an issue with unstable sorting. What I think we should do is in lower_loops put exactly what's in lower_expr_sort and reverse the sorted list used in lower_loops.cpp. Does that make sense @davidberard98 and @naoyam?

@naoyam
Copy link

naoyam commented Apr 14, 2022

Reversing the order generated in lower_expr_sort seems better as it removes the logic duplication.

Re:

and in lower_loops:

  if(id0 dependent on id1)
    id0 >= id1
  else
    id0 < id1

I think what the PR does is to check if id1 depends on id0, and only in that case returns true.

Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing on lower_loops.cpp.

Differential Revision: [D35599653](https://our.internmc.facebook.com/intern/diff/D35599653)

[ghstack-poisoned]
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.


const std::unordered_map<IterDomain*, std::unordered_set<IterDomain*>>&
concrete_id_dependencies_;
const ComputeAtMap compute_at_map_;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be a const reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, meant to do that - thanks for noticing this!

@naoyam
Copy link

naoyam commented Apr 14, 2022

Thanks again @davidberard98 . The update looks good to me. Just added a minor comment.

Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing on lower_loops.cpp.

Differential Revision: [D35599653](https://our.internmc.facebook.com/intern/diff/D35599653)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Apr 14, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

ghstack-source-id: e3b44f5
Pull Request resolved: #75713
Copy link
Contributor

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the help!

@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98
Copy link
Contributor Author

@pytorchmergebot merge this

@github-actions
Copy link
Contributor

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@davidberard98 davidberard98 added the topic: not user facing topic category label Apr 14, 2022
facebook-github-bot pushed a commit that referenced this pull request Apr 16, 2022
Summary:
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: #75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d21082c98222b69f53d428e0c199d4d37491d155

Reviewed By: mehtanirav

Differential Revision: D35599653

Pulled By: davidberard98

fbshipit-source-id: 5b51cdf61d7112bdb17270f32a7e9d2a87791a82
jjsjann123 pushed a commit to csarofeen/pytorch that referenced this pull request Apr 17, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: pytorch#75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen
@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/89/head branch April 18, 2022 14:17
jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request Apr 18, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: pytorch#75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen

Co-authored-by: David Berard <dberard@fb.com>
jjsjann123 pushed a commit to csarofeen/pytorch that referenced this pull request Apr 18, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: pytorch#75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: pytorch/pytorch#75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: pytorch/pytorch#75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen
jjsjann123 added a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Internally there is an at-runtime assert that was failing. The assert is checking to make sure that comparators obey strict weak ordering
https://en.cppreference.com/w/cpp/named_req/Compare

In particular `comp(a, a) == false` was failing.

Pull Request resolved: pytorch/pytorch#75713

Approved by: https://github.com/jjsjann123, https://github.com/csarofeen

Co-authored-by: David Berard <dberard@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants