Continuation lifting by Gbury · Pull Request #2295 · oxcaml/oxcaml

Gbury · 2024-02-16T14:31:09Z

Introduction

This Pull Request introduces continuation lifting. Basically, consider a term of the form letk k x = letk k' y = ... in ..., the idea is to transform it into a term of the form letk k' y x = ... in letk k x = ...., where k' has been lifted out of its original parent k. The ultimate goal of this transformation is then to allow us to specialise and duplicate k without needing to tuplicate k'. This will be particularly important for the upcoming match-in-match optimization.

The utlimate goal of continuation lifting is to make continuation duplication/specialization easier. Considering that the purpose of continuation specialization is to have more precise types, this means that continuation specialization should be done on the way down, and therefore that continuation lifting should also be done on the way down, in a way that allows the future specialization code to work properly. This is not that easy, and for the context, we have tried multiple designs that would not work before arriving at the current one.

Design overview

Before talking about lifting properly, we need to establish how specialization will work, and then show how lifting integrates with it. Specialization is split into two parts: first is the decision to specialize, which needs enough information to make an appropriate decision, and then is the actual specialization, which needs to occur on the way down.

The goal for the decision of whether to specialize a continuation k is to wait until we have seen k's uses, as well as its handler, but just before we start going down the handlers of the continuation defined inside of k. This matches very well with the current traversal order of Simplify. In particular, this will enable specialization of continuations whose handler ends with a switch, and where all (or at least some) call sites have enough information to reduce the switch to a single arm.

Once the decision to specialize a continuation k is done, we then enter a lifting mode where the goal is to lift all the continuations k' that are defined in the handler of k (to ensure those are not duplicated when we duplicate the handler of k). We then store all of the non-simplified handlers of lifted continuations in the dacc. Once we reach the down_to_up for k, we then pop all of the lifted_continuations from the dacc, and then proceed to simplify their handlers, before then rebuilding all of the let-conts. This delaying may seem useless, as currently, I don't think there is any work done between the moment we store lifted continuations in the dacc, and when we reach the down_to_up, however in the future, specialization of continuation will duplicate the handler of k between these two points.

Technical details

Additional arguments

Lifting continuation effectively changes the scope in which they are bound. In particular, when lifting k' out of k, before lifting k' has access to the parameters of k as well as any variable let-bound in k before k' is bound, however, after lifting these parameters/variables are not in scope anymore. To solve this issue the lifting process needs to add parameters to k'. Unfortunately, in order for these parameters to be correctly seen (and the typing env to contain the correct equation for them), we cannot re-use the existing extra_params_and_args mechanism which only works on the upwards pass. In addition, when lifting multiple continuations, say k' and then k'' out of k, we need to be careful to record enough information to rewrite the calls to k' inside of k''. To that effect, a notion of lifted continuation parameter is added to record all of these new parameters, and index them so that the corresponding arguments can be computed as needed. We also keep track of all variables defined in the current scope, so that we know which parameters to add to lifted continuations.

As a result of having additional parameters, some of these can actually be regions and/or rec_infos. This is not a problem as the only change needed for these is in to_cmm: parameters and arguments of kind rec_infos are eliminated, and regions are just integers.

Denvs

As said just above, the lifting process changes the scope of a continuation, which means one must be careful about crafting the correct denv before simplifying a lifted continuations. Indeed some elements of the denv follow the lexical scope, and for these we must use the values that are true at the point where we will insert the continuation, instead of the values from the point where the continuation was originally defined. Most of those decisions can be seen in the DE.denv_for_lifted_continuations function.

Typing env

The typing env has a notion of scope, which is used when joining typing envs, and which is increased before going down a continuation. Say we are at scope n before a let-cont, before this PR, we explore the body of the let-cont at level n+1, and then join and explore the handler at level n (unless I missed something, cc @lthls ). With that strategy, one needs one level for each let-cont. Unfortunately with continuation lifting, when we increase the scope, but we would need to increase the scope by the number of continuations lifted, which we have no way of knowing at that point, so another strategy is needed.

The new strategy is explained in a comment in simplify_let_cont, and it is the following:

suppose we are at scope n and we see a let-cont
we increase scope twice before inspecting the body, so the scope is n+2
we use a denv at fork of scope n+ 1 to perform the join, so that the join correctly detects the difference between the body of the let cont, and the env for the handler of the continuation
we thus get an env with scope n+1 after the join, but we actually bump the scope before entering the continuation's handler. The bump operation actually moves everything that was at level n +1 and it puts it at level n+2 (this is actually a very cheap operation on typing envs). Note that it is very different from increasing the scope.
that way, for any lifted continuation, we get in the accumulator an env at scope n + 2, which we join at level n+1, and then bump the scope before exploring the lifted handler.

Lifting criterion and testing

Remember that this PR only implements continuation lifting. Without continuation specialization, there are no optimizations that require or benefit from continuation lifting, so there technically is no reason to do any lifting. That being said, in order to test that lifting works as intended, this PR currently implements a dummy criterion that acts as if every continuation ending with a switch was specialized, and therefore lifts any continuation k' defined inside another continuation k, when k ends with a switch. This is more than we'll ever likely want to, and it indeed appears to slow down the compiler noticeably. We nevertheless intend to keep this dummy criterion, at least for now, so that proper testing of the lifting process alone can be carried out. It would also be interesting to collect data about the potential slowdown incurred when lifting way too many continuations as this does.

There might be some slight code cleanup to do (some unrelated patchs may have found their way into this branch), but this PR can be reviewed, and it would be nice to have it tested. (there is currently a failure on unroll4.ml, but we think this is a problem related to fexpr's handling of symbols, which already needed a fix in this PR).

Gbury · 2024-02-23T15:11:41Z

CI is green, this is ready for review and testing @mshinwell

lthls · 2024-03-07T13:33:09Z

The chamelon changes look independent, would it make sense to extract them into a separate PR that we can merge quickly ?

Ekdohibs · 2024-05-02T12:13:41Z

Note that an equivalent version of the chamelon changes has already been done (and pushed to main) so we can remove those changes from this PR.
EDIT: only one of the changes already existed upstream, I extracted (and improved) the other one to a separate PR.

lthls

Posting my review comments so far (some of them might be a bit stale now)

Gbury · 2024-06-20T13:28:15Z

I have pushed new simplifications to the code, in particular concerning the computation of the extra params and args, the handling of the join, and how and where we store the lifted cont params.

This is therefore good for a final review (I expect some comments about some names added by recent commits, where I sometimes didn't have much inspiration), cc @lthls @NathanReb @chambart

lthls

I've reviewed everything except simplify_let_cont_expr.ml.
I've left various comments but none of them are strictly necessary.

Gbury · 2024-07-03T15:45:01Z

The current CI failure is due to a bug in the typing env that leads to some missing aliases after a join (probably related/introduced to #2705 ), @lthls is investigating/writing a fix.

Gbury · 2024-08-26T14:47:45Z

Can we merge this ? @mshinwell

Gbury · 2024-10-03T13:23:50Z

rebased

Gbury · 2024-10-08T15:24:27Z

This has been reviewed and the only failing test fixed, so I think we can merge this.

By default, no continuation is lifted, can be configured with a command line flag. When the continuatin lifting budget is non-zero, we need to use the advanced meet, because there are known issue that occurs when using the regular meet together with continuation lifting. About the fixed test: Adding a Sys.opaque_identity makes the lifetime of the result of `quick_stat` does not depend on whether flambda2 lifts some continuation or not (in turn, this difference of lifetime induces a difference in the tests results, since it counts the number of allocations in the major heap).

Lift continuations (with cli option) By default, no continuation is lifted, can be configured with a command line flag. When the continuatin lifting budget is non-zero, we need to use the advanced meet, because there are known issue that occurs when using the regular meet together with continuation lifting. About the fixed test: Adding a Sys.opaque_identity makes the lifetime of the result of `quick_stat` does not depend on whether flambda2 lifts some continuation or not (in turn, this difference of lifetime induces a difference in the tests results, since it counts the number of allocations in the major heap).

Gbury requested a review from Ekdohibs as a code owner February 16, 2024 14:31

mshinwell added the flambda2 Prerequisite for, or part of, flambda2 label Feb 19, 2024

Gbury force-pushed the lift_conts branch from 845e912 to 273ec66 Compare February 23, 2024 13:40

Gbury force-pushed the lift_conts branch from f72f25f to c566eb1 Compare March 4, 2024 13:48

mshinwell mentioned this pull request Mar 12, 2024

Let-cont lifting #1584

Closed

mshinwell changed the title ~~Continuation Lifting~~ Continuation lifting Mar 12, 2024

Gbury force-pushed the lift_conts branch from d6e0879 to a2f6e7e Compare April 26, 2024 15:38

Ekdohibs mentioned this pull request May 2, 2024

Fix chamelon infinite loop #2522

Merged

Ekdohibs force-pushed the lift_conts branch from de91e38 to 324c1d9 Compare May 24, 2024 11:40