[matching.ml cleanup] merge split_constr and split_naive#8861
[matching.ml cleanup] merge split_constr and split_naive#8861gasche merged 12 commits intoocaml:trunkfrom
Conversation
As well as one illustrating split_naive's output.
f89b704 to
1034ef4
Compare
| and should_split group_discr = | ||
| match group_discr.pat_desc with | ||
| | Tpat_construct (_, { cstr_tag = Cstr_extension _ }, _) -> | ||
| (* it is unlikely that we will raise anything, so we split now *) |
There was a problem hiding this comment.
If I am not mistaken, the unlikely here requires the sensible assumption that the distributions of constructor head pattern in the pm is not dominated by a single constructor. In other words, we discard the case
A _ _ ...
_ _ _ ....
A _ _ ...
_ _ _ ....
A _ _ ...
...
as improbable. Am i wrong?
There was a problem hiding this comment.
It's actually even more specific than that. Not only would you need to have the same constructor at the head of later rows, but you would also need to be able to lift these rows up!
That is, they would have to be incompatible with all the other rows that are in between: which won't happen if there is another constructor with the same arity in the middle.
We deem that very improbable indeed.
| with (3) (if (field 1 param/19) 3 2)))) | ||
| (apply (field 1 (global Toploop!)) "last_is_vars" last_is_vars/16)) | ||
| val last_is_vars : bool * bool -> int = <fun> | ||
| |}] |
There was a problem hiding this comment.
I think it would be nice to add a comment to tell that currently the last row is not firing here.
Octachron
left a comment
There was a problem hiding this comment.
LGTM: the merge really simplifies the code, and the only semantic change is the new optimization for raising variable rows mixed with extension construction rows.
|
@trefis we could either have a separate Changes entry for each sizeable chunk of refactoring (so this one needs one), or aggregate everything in a single Changes entry: (we could have more details of individual items of changes in the description, |
|
@gasche As said on the first PR for this work, I planned to do one single entry at the end listing all the PRs. |
|
Merged, then. |
[matching.ml cleanup] merge split_constr and split_naive
A bit of history
Remember the invariant on the output of
split_and_precompile:Given that the first column is always well typed, one can assume that the list contains only two kinds of pms:
Initially, it was thought that splitting rows between these two kinds would be enough to enforce the invariant.
So
split_constrwas implemented to do just that, it consists of two mutually recursive functions:split_ex, which produces "discriminating" pms, andsplit_noex, which produces "variable" pms.To produce as few pms as possible, both these functions try to raise rows as much as possible (to regroup rows of the same kind).
Then, in #5788, it was noticed that due to rebinding, this wasn't actually enforcing the invariant. So
split_naivewas introduced to be used in situations where rebinding can happen. It still uses the samesplit_exc/split_noexcrecursion scheme, but these never try to raise any row, instead the pm is splitted whenever we go from one to the other. Additionally,split_excwill also split the pm whenever encountering a different constructor.This of course generates less efficient code, so
split_constrwas also kept and the decision to use one or the other is made on each pm depending on what's in its first column.This PR ...
... replaces these two functions by a new
split_no_or, which enforces the invariant in a more direct way: it folds over the rows of the pm, producing pms respecting the invariant as it goes. For each row, it first checks whether it can add it to the current pm it's producing. If doing so respects the invariant, then it does and moves on the next row.If it doesn't, then it must chose whether to split here, or to try to raise later rows to add them to the current pm. This decision is made by looking at the first column of the current pm: if it's extension constructors we split, otherwise we try to raise later rows (this is slightly different from before: in
split_naive, we did not attempt to raise rows for variable pms).... adds a couple of tests: to illustrate the change mentioned above, and also to ensure that some specific optimizations (not discussed here) still work as intended.
Reviewing
It might be easier to just read the old implementation and the new one, and check that they match what is described here.
But, I've also left quite a bit of history in the commit list that might also be useful?
Potentially interested reviewers: @Octachron, @Armael .