Conversation
dcbc553 to
227bd66
Compare
|
There is a debate at #12713 about whether CSE for atomic loads is correct. I think that this debate should be resolved before we consider the present PR. I am closing for now, and we can reopen any time we want. |
|
Further examples from @polytypic over at #12713 suggest the following example where CSE is clearly (?) invalid: let x0 = Atomic.get x in
let y0 = Atomic.get y in
let x1 = Atomic.get x inMerging the two reads of |
kayceesrk
left a comment
There was a problem hiding this comment.
The change looks good to me.
Earlier, atomic loads were translated to an external call to |
xavierleroy
left a comment
There was a problem hiding this comment.
Looks good to me. For those who wonder, the Op_other classification mean that the result of an atomic load is treated as unpredictable, and so cannot be replaced by a move from the result of an earlier load at the same address. It's better than treating atomic loads as external calls, since regular loads occurring before and after the atomic load can still be factored out, while an external call is assumed to write all over memory.
|
Lines 1118 to 1120 in d77bc97 Hence, the compiler should not optimise atomic stores. It would be useful to do an audit of all the optimisations that we perform on loads of mutable locations. The easiest would be to disable any optimisations of atomic loads. For example, we have disabled the reordering of atomic loads in the scheduling pass in #12248. |
Changes
Outdated
| (Gabriel Scherer, review by Florian Angeletti) | ||
|
|
||
| - ???: disable common subexpression elimination for atomic loads | ||
| (Gabriel Scherer, review by ???, report by Vesa Karvonen) |
There was a problem hiding this comment.
It was Carine Morel (@lyrm) who wrote the test that triggered the bug. Initially we assumed it was my mistake, but later I could not understand why the bug would be triggered (Atomic.get is supposed to have an acquire fence and that should prevent the reordering from happening), so I examined the compiler output and realized that the compiler had eliminated the Atomic.get instruction completely.
227bd66 to
dca5e6e
Compare
@lthls Is there a list of optimisations that we perform on loads of mutable locations? If such a list doesn't exist (very likely), how would we go about gathering it? I assume this would also be needed for flambda. |
|
No, there isn't such a list. For Flambda we decided to give some sensible semantics to the internal IR, and as a result we can check our optimisations against that (although we never published the semantics properly, hopefully we will correct that with Flambda 2). My intuition though is that for multicore-related issues we should only look at what happens from |
|
Since this change is disabling a problematic optimisation, and we are moving to 5.1.1 as the reference version for 5.1, I think it is better to cherry-pick it to 5.1. |
disable CSE for atomic loads (cherry picked from commit 0c963ce)
(fixes #12713)
@lthls made the exact same recommendation in the discussion of #12713. I would be interested in @xavierleroy's confirmation that this is a reasonable thing to do.
(I wonder why this was not done back when Multicore was merged. Was it a blind spot, we didn't think of which compiler optimisations should be disabled? Did we say we would disable CSE and forget to do it? Are there other bugs of this category looking around?)