Skip to content

Prevent inlining of CamlinternalLazy.force#9998

Merged
gasche merged 3 commits intoocaml:trunkfrom
lthls:afl-lazy
Nov 22, 2020
Merged

Prevent inlining of CamlinternalLazy.force#9998
gasche merged 3 commits intoocaml:trunkfrom
lthls:afl-lazy

Conversation

@lthls
Copy link
Copy Markdown
Contributor

@lthls lthls commented Oct 30, 2020

Following a report by @AltGr trying to build opam switches with various configuration options, I found that afl instrumentation means that Lazy.force is turned into CamlinternalLazy.force, and with flambda -O3 this function gets considered for inlining which triggers warning 59 if the initial argument is immutable.
In practice, there is such a case in dynlink_common.ml, which is compiled with -O3, so configuring the compiler with --with-afl --enable-flambda will result in a build failure without this patch.

I've added a much smaller test that reproduces the problem, and does not need the compiler to be configured with afl.

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 30, 2020

Out of curiosity: could we fix this with a call to Sys.opaque_identity within the force function instead?

@lthls
Copy link
Copy Markdown
Contributor Author

lthls commented Oct 30, 2020

Out of curiosity: could we fix this with a call to Sys.opaque_identity within the force function instead?

Yes, this would work too.

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 30, 2020

I have the impression that this would be a slightly better fix; in particular, I find it easier to understand what the logic is for:

  • with opaque_identity, we make this value opaque in this branch because we are doing something unsafe
  • by preventing inlining it is less clear what is going on: we want to disable some cross-function reasoning because this code violates the limited reasoning principles (but which ones?)

Does one approach generate better code than the other? (I would guesstimate that inlining the function is useful sometimes, in particular if the optimizer knows how to statically compute the tag checks.)

What would a satisfying solution look like? Ideally one would have a primitive for certain tag checks that the optimizer understands enough to derive mutability information from it. But that sounds like a not-so-small subset of the general problem "teach the optimizer about weird GADT stuff" that we don't have good solutions for, if I followed correctly.

@lthls
Copy link
Copy Markdown
Contributor Author

lthls commented Oct 30, 2020

I've updated with a Sys.opaque_identity version, which I think is better and simply hadn't thought about.

@lthls
Copy link
Copy Markdown
Contributor Author

lthls commented Oct 30, 2020

Flambda already understands switches on blocks, including with non-regular tags. But these non-regular switches cannot be written even using Obj. One possibility would be to recognize the caml_obj_tag primitive and turn it into a switch, but this would lose the ability to detect out-of-heap pointers.

else force_lazy_block lzv
(* Use [Sys.opaque_identity] as [force_block_lazy] will modify its
argument, which could trigger warning 59 with flambda (see GPR#9998) *)
else force_lazy_block (Sys.opaque_identity lzv)
Copy link
Copy Markdown
Contributor

@chambart chambart Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might prefer to do the opaque_identity earlier. Right now it is not a problem since Obj.tag is a C primitive, but we don't want this to break if this changes to be something that the compiler understand. If one of the previous if is decided statically, it will break badly.
This probably hint that we also want to add [@inline never] as inlining this function might is never beneficial.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I did initially, but then I thought the other branches were pretty safe and we could benefit from the early cutting of branches if the tag is known. But I now remember that while it might be fine for the regular tags, it would be wrong for the forward tag. Would you accept a version where we first test for a regular tag, and in this case take the regular branch, otherwise wrap in opaque_identity then branch on either Lazy or Forward tag ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I've tried it and it's too much complexity for very little gain, I'll push the initial version back again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the problem in the other cases (I thought that the problem was having a (dead) branch that mutates a value that is known to be immutable, how do other branches lead to mutation?). Can you explain?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last version I pushed includes a comment with more details, does that help ?

(* Using [Sys.opaque_identity] prevents two potential problems:
- If the value is known to have Forward_tag, then its tag could have
changed during GC, so that information must be forgotten (see GPR#713
and issue #7301)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I read this explanation, my reaction was that trying to fix the problem here was not such a good approach, instead we should make the construction of Forward_tag values opaque. In fact this is exactly what is done in #713. Are we sure there still is a problem here? When would the optimizer know that a value has a forward tag?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chambart could you clarify what you meant earlier by "break badly" ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed this with @chambart offline. To sum up, the problem is indeed that if the simplifier assumes that a block has tag Forward_tag, then it could remove all the other branches, but this would be wrong as the GC can shortcut the forward block and replace our value with one with a different tag.
The previous PR (#713) works around the problem by wrapping all occurrences of forward block creations in Popaque, which should prevent the optimiser from ever assuming a block has Forward_tag from its creation. Here we document that this is one place where this problem can occur, and the Sys.opaque_identity call provides an extra level of protection (There's nothing wrong with being more careful than necessary).
I also find it more satisfying to have opaque wrappers at both points, as at creation time it shows that the tag may change, and at the point where it's matched on it documents that this particular operation (reading the tag of the lazy block) is "volatile", meaning it has to operate on its actual argument and not on any approximation of it it has.
To give you one example where the optimizer might assume that the block has forward tag, imagine two calls to Lazy.force on the same value, but with some other code in the middle. The optimizer must not reuse the tag it got from the first call, because it could end up being outdated. The extra opaque_identity forces to reload the tag. (Currently Obj.get_tag is a C primitive so it cannot be shared or reused, but we consider it would make sense to turn it into a compiler primitive at some point and are not very keen on leaving this kinds of pitfalls wide open).

We also discussed adding an [@inline never] attribute in addition to the Sys.opaque_identity call, because the function has a non-trivial body that will not be simplified, so there's almost never any reason to even try to inline it and this can save (a little bit of) compilation time. I'm not completely convinced it's worth spending too much time on it, so for now the version with no attribute remains.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 20, 2020

I don't find this PR completely satisfying, but I think that we should merge it nonetheless because it improves the codebase. @lthls could you fix the conflict in Changes first?

@lthls
Copy link
Copy Markdown
Contributor Author

lthls commented Nov 21, 2020

Do we want the Changes entry in trunk or in 4.12 ?

I think this could go into 4.12, since with the new handling of compiler configurations in the opam repository it is more likely that someone will try to enable both flambda and afl on 4.12.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 21, 2020

This fixes an issue and the change is very safe. Ok for 4.12.

@gasche gasche merged commit 1a7e3df into ocaml:trunk Nov 22, 2020
Octachron pushed a commit that referenced this pull request Nov 24, 2020
Prevent inlining of CamlinternalLazy.force

(cherry picked from commit 1a7e3df)
@Octachron
Copy link
Copy Markdown
Member

Cherry-picked to 4.12 as f96944e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants