fix an infininte loop in liveness#36697
Conversation
💊 Build failures summary and remediationsAs of commit c9894be (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakages:
|
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
This is cool! What does the diff do? |
zdevito
left a comment
There was a problem hiding this comment.
Code looks fine, but I don't know why the original code wasn't fine, so I can't say if this is correct. I also don't see a test to make sure the infinite loop no longer exists.
There was a problem hiding this comment.
I think this should be lv.bodyBlock(). Dereferencing end() shouldn't be considered defined.
We create new nodes to insert explicit uses of loop counters while doing liveness analysis. This was totally fine when we had a two-pass liveness (since we don't really care about liveness sets for those nodes), but with the fixed-point algorithm we can never achieve the fixed point because the initial liveness sets for these new nodes start empty and we always add some live values to those sets thus Here's a few excerpts from the trace. Before we enter a loop we create a node to use loop's upper bound. When processing the loop, we also process this node. Its liveness sets are empty! We are done with this loop. We remove the node we added We are about to process the loop for the second time, so we create the use node again. Now we process it again. But now it has non-empty sets even though it's a brand new node!!!! |
zdevito
left a comment
There was a problem hiding this comment.
Thanks for the explanation, can we make sure that goes in the commit message?
|
I'm cool with this going in, but I would also suggest thinking about if there are any extra asserts or linting passes you could have added that would have caught this kind of error sooner. Very helpful, when working on a compiler, they are ;) |
…eness and thus setting always setting changed_ to true
74ba5d7 to
c9894be
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@Krovatkin merged this pull request in 76f9528. |
Summary: This should fix pytorch#36434 We create new nodes to insert explicit uses of loop counters while doing liveness analysis. This was totally fine when we had a two-pass liveness (since we don't really care about liveness sets for those nodes), but with the fixed-point algorithm we can *never* achieve the fixed point because the initial liveness sets for these new nodes start empty and we always add some live values to those sets thus `changed_` is always set `true`. Now it's amazing that this didn't get exposed and worked for such a long time! Apparently, when we destroyed and recreated those new nodes they were allocated at the exact same addresses in the memory!!!!!! And we use those addresses as keys to get liveness sets, so these new nodes **inherited** the liveness sets � I was still a bit sceptical of this explanation, so I added more tracing to liveness analysis and AFAICT this is exactly how we were able to get away with this bug for such a long time!!! Here's a few excerpts from the trace. Before we enter a loop we create a node to use loop's upper bound. ``` [DEBUG liveness.cpp:121] @#$Creating a store for mtc : 0x555777c19eb0 ``` When processing the loop, we also process this node. Its liveness sets are empty! ``` [DEBUG liveness.cpp:099] Processing node = prim::Store(%3) addr = 0x555777c19eb0 [DEBUG liveness.cpp:148] @#$liveness_sets_[it] : {} ``` We are done with this loop. We remove the node we added ``` [DEBUG liveness.cpp:127] @#$Destroying a store for ctc : 0x555777c19eb0 ``` We are about to process the loop for the second time, so we create the use node again. Note, it's allocated at the exact same address!!! ``` [DEBUG liveness.cpp:118] @#$Creating a store for ctc : 0x555777c19eb0 ``` Now we process it again. But now it has non-empty sets even though it's a brand new node!!!! ``` [DEBUG liveness.cpp:099] Processing node = prim::Store(%i) addr = 0x555777c19eb0 [DEBUG liveness.cpp:148] @#$liveness_sets_[it] : {2, 3, 10} ``` Pull Request resolved: pytorch#36697 Differential Revision: D21059313 Pulled By: Krovatkin fbshipit-source-id: b0fdeb4418e0e73f34187826877179260f21cf7b
This should fix #36434
We create new nodes to insert explicit uses of loop counters while doing liveness analysis. This was totally fine when we had a two-pass liveness (since we don't really care about liveness sets for those nodes), but with the fixed-point algorithm we can never achieve the fixed point because the initial liveness sets for these new nodes start empty and we always add some live values to those sets thus
changed_is always settrue.Now it's amazing that this didn't get exposed and worked for such a long time! Apparently, when we destroyed and recreated those new nodes they were allocated at the exact same addresses in the memory!!!!!! And we use those addresses as keys to get liveness sets, so these new nodes inherited the liveness sets 🤦
I was still a bit sceptical of this explanation, so I added more tracing to liveness analysis and AFAICT this is exactly how we were able to get away with this bug for such a long time!!!
Here's a few excerpts from the trace.
Before we enter a loop we create a node to use loop's upper bound.
When processing the loop, we also process this node. Its liveness sets are empty!
We are done with this loop. We remove the node we added
We are about to process the loop for the second time, so we create the use node again.
Note, it's allocated at the exact same address!!!
Now we process it again. But now it has non-empty sets even though it's a brand new node!!!!