Conversation
|
CI is green, this is ready for review and testing @mshinwell |
|
The chamelon changes look independent, would it make sense to extract them into a separate PR that we can merge quickly ? |
|
Note that an equivalent version of the chamelon changes has already been done (and pushed to main) so we can remove those changes from this PR. |
lthls
left a comment
There was a problem hiding this comment.
Posting my review comments so far (some of them might be a bit stale now)
|
I have pushed new simplifications to the code, in particular concerning the computation of the extra params and args, the handling of the join, and how and where we store the lifted cont params. This is therefore good for a final review (I expect some comments about some names added by recent commits, where I sometimes didn't have much inspiration), cc @lthls @NathanReb @chambart |
lthls
left a comment
There was a problem hiding this comment.
I've reviewed everything except simplify_let_cont_expr.ml.
I've left various comments but none of them are strictly necessary.
|
Can we merge this ? @mshinwell |
|
rebased |
|
This has been reviewed and the only failing test fixed, so I think we can merge this. |
By default, no continuation is lifted, can be configured with a command line flag. When the continuatin lifting budget is non-zero, we need to use the advanced meet, because there are known issue that occurs when using the regular meet together with continuation lifting. About the fixed test: Adding a Sys.opaque_identity makes the lifetime of the result of `quick_stat` does not depend on whether flambda2 lifts some continuation or not (in turn, this difference of lifetime induces a difference in the tests results, since it counts the number of allocations in the major heap).
Lift continuations (with cli option) By default, no continuation is lifted, can be configured with a command line flag. When the continuatin lifting budget is non-zero, we need to use the advanced meet, because there are known issue that occurs when using the regular meet together with continuation lifting. About the fixed test: Adding a Sys.opaque_identity makes the lifetime of the result of `quick_stat` does not depend on whether flambda2 lifts some continuation or not (in turn, this difference of lifetime induces a difference in the tests results, since it counts the number of allocations in the major heap).
Introduction
This Pull Request introduces continuation lifting. Basically, consider a term of the form
letk k x = letk k' y = ... in ..., the idea is to transform it into a term of the formletk k' y x = ... in letk k x = ...., wherek'has been lifted out of its original parentk. The ultimate goal of this transformation is then to allow us to specialise and duplicatekwithout needing to tuplicatek'. This will be particularly important for the upcoming match-in-match optimization.The utlimate goal of continuation lifting is to make continuation duplication/specialization easier. Considering that the purpose of continuation specialization is to have more precise types, this means that continuation specialization should be done on the way down, and therefore that continuation lifting should also be done on the way down, in a way that allows the future specialization code to work properly. This is not that easy, and for the context, we have tried multiple designs that would not work before arriving at the current one.
Design overview
Before talking about lifting properly, we need to establish how specialization will work, and then show how lifting integrates with it. Specialization is split into two parts: first is the decision to specialize, which needs enough information to make an appropriate decision, and then is the actual specialization, which needs to occur on the way down.
The goal for the decision of whether to specialize a continuation
kis to wait until we have seenk's uses, as well as its handler, but just before we start going down the handlers of the continuation defined inside ofk. This matches very well with the current traversal order of Simplify. In particular, this will enable specialization of continuations whose handler ends with a switch, and where all (or at least some) call sites have enough information to reduce the switch to a single arm.Once the decision to specialize a continuation
kis done, we then enter alifting modewhere the goal is to lift all the continuationsk'that are defined in the handler ofk(to ensure those are not duplicated when we duplicate the handler ofk). We then store all of the non-simplified handlers of lifted continuations in thedacc. Once we reach thedown_to_upfork, we then pop all of the lifted_continuations from thedacc, and then proceed to simplify their handlers, before then rebuilding all of the let-conts. This delaying may seem useless, as currently, I don't think there is any work done between the moment we store lifted continuations in thedacc, and when we reach thedown_to_up, however in the future, specialization of continuation will duplicate the handler ofkbetween these two points.Technical details
Additional arguments
Lifting continuation effectively changes the scope in which they are bound. In particular, when lifting
k'out ofk, before liftingk'has access to the parameters ofkas well as any variable let-bound inkbeforek'is bound, however, after lifting these parameters/variables are not in scope anymore. To solve this issue the lifting process needs to add parameters tok'. Unfortunately, in order for these parameters to be correctly seen (and the typing env to contain the correct equation for them), we cannot re-use the existingextra_params_and_argsmechanism which only works on the upwards pass. In addition, when lifting multiple continuations, sayk'and thenk''out ofk, we need to be careful to record enough information to rewrite the calls tok'inside ofk''. To that effect, a notion oflifted continuation parameteris added to record all of these new parameters, and index them so that the corresponding arguments can be computed as needed. We also keep track of all variables defined in the current scope, so that we know which parameters to add to lifted continuations.As a result of having additional parameters, some of these can actually be regions and/or rec_infos. This is not a problem as the only change needed for these is in
to_cmm: parameters and arguments of kind rec_infos are eliminated, and regions are just integers.Denvs
As said just above, the lifting process changes the scope of a continuation, which means one must be careful about crafting the correct
denvbefore simplifying a lifted continuations. Indeed some elements of thedenvfollow the lexical scope, and for these we must use the values that are true at the point where we will insert the continuation, instead of the values from the point where the continuation was originally defined. Most of those decisions can be seen in theDE.denv_for_lifted_continuationsfunction.Typing env
The typing env has a notion of scope, which is used when joining typing envs, and which is increased before going down a continuation. Say we are at scope
nbefore a let-cont, before this PR, we explore the body of the let-cont at leveln+1, and then join and explore the handler at leveln(unless I missed something, cc @lthls ). With that strategy, one needs one level for eachlet-cont. Unfortunately with continuation lifting, when we increase the scope, but we would need to increase the scope by the number of continuations lifted, which we have no way of knowing at that point, so another strategy is needed.The new strategy is explained in a comment in
simplify_let_cont, and it is the following:nand we see a let-contn+2n+ 1to perform the join, so that the join correctly detects the difference between the body of the let cont, and the env for the handler of the continuationn+1after the join, but we actually bump the scope before entering the continuation's handler. Thebumpoperation actually moves everything that was at leveln +1and it puts it at leveln+2(this is actually a very cheap operation on typing envs). Note that it is very different from increasing the scope.n + 2, which we join at leveln+1, and then bump the scope before exploring the lifted handler.Lifting criterion and testing
Remember that this PR only implements continuation lifting. Without continuation specialization, there are no optimizations that require or benefit from continuation lifting, so there technically is no reason to do any lifting. That being said, in order to test that lifting works as intended, this PR currently implements a dummy criterion that acts as if every continuation ending with a switch was specialized, and therefore lifts any continuation
k'defined inside another continuationk, whenkends with a switch. This is more than we'll ever likely want to, and it indeed appears to slow down the compiler noticeably. We nevertheless intend to keep this dummy criterion, at least for now, so that proper testing of the lifting process alone can be carried out. It would also be interesting to collect data about the potential slowdown incurred when lifting way too many continuations as this does.There might be some slight code cleanup to do (some unrelated patchs may have found their way into this branch), but this PR can be reviewed, and it would be nice to have it tested. (there is currently a failure on
unroll4.ml, but we think this is a problem related to fexpr's handling of symbols, which already needed a fix in this PR).