Prevent inlining of CamlinternalLazy.force#9998
Conversation
|
Out of curiosity: could we fix this with a call to |
Yes, this would work too. |
|
I have the impression that this would be a slightly better fix; in particular, I find it easier to understand what the logic is for:
Does one approach generate better code than the other? (I would guesstimate that inlining the function is useful sometimes, in particular if the optimizer knows how to statically compute the tag checks.) What would a satisfying solution look like? Ideally one would have a primitive for certain tag checks that the optimizer understands enough to derive mutability information from it. But that sounds like a not-so-small subset of the general problem "teach the optimizer about weird GADT stuff" that we don't have good solutions for, if I followed correctly. |
|
I've updated with a |
|
Flambda already understands switches on blocks, including with non-regular tags. But these non-regular switches cannot be written even using |
stdlib/camlinternalLazy.ml
Outdated
| else force_lazy_block lzv | ||
| (* Use [Sys.opaque_identity] as [force_block_lazy] will modify its | ||
| argument, which could trigger warning 59 with flambda (see GPR#9998) *) | ||
| else force_lazy_block (Sys.opaque_identity lzv) |
There was a problem hiding this comment.
I think we might prefer to do the opaque_identity earlier. Right now it is not a problem since Obj.tag is a C primitive, but we don't want this to break if this changes to be something that the compiler understand. If one of the previous if is decided statically, it will break badly.
This probably hint that we also want to add [@inline never] as inlining this function might is never beneficial.
There was a problem hiding this comment.
That's what I did initially, but then I thought the other branches were pretty safe and we could benefit from the early cutting of branches if the tag is known. But I now remember that while it might be fine for the regular tags, it would be wrong for the forward tag. Would you accept a version where we first test for a regular tag, and in this case take the regular branch, otherwise wrap in opaque_identity then branch on either Lazy or Forward tag ?
There was a problem hiding this comment.
Never mind, I've tried it and it's too much complexity for very little gain, I'll push the initial version back again.
There was a problem hiding this comment.
I don't understand the problem in the other cases (I thought that the problem was having a (dead) branch that mutates a value that is known to be immutable, how do other branches lead to mutation?). Can you explain?
There was a problem hiding this comment.
The last version I pushed includes a comment with more details, does that help ?
| (* Using [Sys.opaque_identity] prevents two potential problems: | ||
| - If the value is known to have Forward_tag, then its tag could have | ||
| changed during GC, so that information must be forgotten (see GPR#713 | ||
| and issue #7301) |
There was a problem hiding this comment.
When I read this explanation, my reaction was that trying to fix the problem here was not such a good approach, instead we should make the construction of Forward_tag values opaque. In fact this is exactly what is done in #713. Are we sure there still is a problem here? When would the optimizer know that a value has a forward tag?
There was a problem hiding this comment.
@chambart could you clarify what you meant earlier by "break badly" ?
There was a problem hiding this comment.
I've discussed this with @chambart offline. To sum up, the problem is indeed that if the simplifier assumes that a block has tag Forward_tag, then it could remove all the other branches, but this would be wrong as the GC can shortcut the forward block and replace our value with one with a different tag.
The previous PR (#713) works around the problem by wrapping all occurrences of forward block creations in Popaque, which should prevent the optimiser from ever assuming a block has Forward_tag from its creation. Here we document that this is one place where this problem can occur, and the Sys.opaque_identity call provides an extra level of protection (There's nothing wrong with being more careful than necessary).
I also find it more satisfying to have opaque wrappers at both points, as at creation time it shows that the tag may change, and at the point where it's matched on it documents that this particular operation (reading the tag of the lazy block) is "volatile", meaning it has to operate on its actual argument and not on any approximation of it it has.
To give you one example where the optimizer might assume that the block has forward tag, imagine two calls to Lazy.force on the same value, but with some other code in the middle. The optimizer must not reuse the tag it got from the first call, because it could end up being outdated. The extra opaque_identity forces to reload the tag. (Currently Obj.get_tag is a C primitive so it cannot be shared or reused, but we consider it would make sense to turn it into a compiler primitive at some point and are not very keen on leaving this kinds of pitfalls wide open).
We also discussed adding an [@inline never] attribute in addition to the Sys.opaque_identity call, because the function has a non-trivial body that will not be simplified, so there's almost never any reason to even try to inline it and this can save (a little bit of) compilation time. I'm not completely convinced it's worth spending too much time on it, so for now the version with no attribute remains.
|
I don't find this PR completely satisfying, but I think that we should merge it nonetheless because it improves the codebase. @lthls could you fix the conflict in Changes first? |
|
Do we want the Changes entry in trunk or in 4.12 ? I think this could go into 4.12, since with the new handling of compiler configurations in the opam repository it is more likely that someone will try to enable both flambda and afl on 4.12. |
|
This fixes an issue and the change is very safe. Ok for 4.12. |
Prevent inlining of CamlinternalLazy.force (cherry picked from commit 1a7e3df)
|
Cherry-picked to 4.12 as f96944e |
Following a report by @AltGr trying to build opam switches with various configuration options, I found that afl instrumentation means that
Lazy.forceis turned intoCamlinternalLazy.force, and with flambda-O3this function gets considered for inlining which triggers warning 59 if the initial argument is immutable.In practice, there is such a case in
dynlink_common.ml, which is compiled with-O3, so configuring the compiler with--with-afl --enable-flambdawill result in a build failure without this patch.I've added a much smaller test that reproduces the problem, and does not need the compiler to be configured with afl.