[matching.ml cleanup] merge split_constr and split_naive by trefis · Pull Request #8861 · ocaml/ocaml

trefis · 2019-08-06T13:50:15Z

A bit of history

Remember the invariant on the output of split_and_precompile:

For every pm in this list, and any two patterns in its first column, either the patterns have the same head, or their heads match disjoint sets of values.

Given that the first column is always well typed, one can assume that the list contains only two kinds of pms:

the ones where the first column is made of variables only
the ones where the first column is made of discriminating patterns (eg. constants, constructors, whatever)

Initially, it was thought that splitting rows between these two kinds would be enough to enforce the invariant.
So split_constr was implemented to do just that, it consists of two mutually recursive functions: split_ex, which produces "discriminating" pms, and split_noex, which produces "variable" pms.
To produce as few pms as possible, both these functions try to raise rows as much as possible (to regroup rows of the same kind).

Then, in #5788, it was noticed that due to rebinding, this wasn't actually enforcing the invariant. So split_naive was introduced to be used in situations where rebinding can happen. It still uses the same split_exc / split_noexc recursion scheme, but these never try to raise any row, instead the pm is splitted whenever we go from one to the other. Additionally, split_exc will also split the pm whenever encountering a different constructor.

This of course generates less efficient code, so split_constr was also kept and the decision to use one or the other is made on each pm depending on what's in its first column.

This PR ...

... replaces these two functions by a new split_no_or, which enforces the invariant in a more direct way: it folds over the rows of the pm, producing pms respecting the invariant as it goes. For each row, it first checks whether it can add it to the current pm it's producing. If doing so respects the invariant, then it does and moves on the next row.
If it doesn't, then it must chose whether to split here, or to try to raise later rows to add them to the current pm. This decision is made by looking at the first column of the current pm: if it's extension constructors we split, otherwise we try to raise later rows (this is slightly different from before: in split_naive, we did not attempt to raise rows for variable pms).

... adds a couple of tests: to illustrate the change mentioned above, and also to ensure that some specific optimizations (not discussed here) still work as intended.

Reviewing

It might be easier to just read the old implementation and the new one, and check that they match what is described here.
But, I've also left quite a bit of history in the commit list that might also be useful?

Potentially interested reviewers: @Octachron, @Armael .

As well as one illustrating split_naive's output.

Octachron · 2019-08-08T10:02:35Z

lambda/matching.ml

+  and should_split group_discr =
+    match group_discr.pat_desc with
+    | Tpat_construct (_, { cstr_tag = Cstr_extension _ }, _) ->
+        (* it is unlikely that we will raise anything, so we split now *)


If I am not mistaken, the unlikely here requires the sensible assumption that the distributions of constructor head pattern in the pm is not dominated by a single constructor. In other words, we discard the case

A _ _ ... _ _ _ .... A _ _ ... _ _ _ .... A _ _ ... ...

as improbable. Am i wrong?

It's actually even more specific than that. Not only would you need to have the same constructor at the head of later rows, but you would also need to be able to lift these rows up!
That is, they would have to be incompatible with all the other rows that are in between: which won't happen if there is another constructor with the same arity in the middle.

We deem that very improbable indeed.

testsuite/tests/basic/patmatch_split_no_or.ml

lambda/matching.ml

Octachron · 2019-08-09T09:14:51Z

testsuite/tests/basic/patmatch_split_no_or.ml

+        with (3) (if (field 1 param/19) 3 2))))
+  (apply (field 1 (global Toploop!)) "last_is_vars" last_is_vars/16))
+val last_is_vars : bool * bool -> int = <fun>
+|}]


I think it would be nice to add a comment to tell that currently the last row is not firing here.

Octachron

LGTM: the merge really simplifies the code, and the only semantic change is the new optimization for raising variable rows mixed with extension construction rows.

gasche · 2019-08-09T13:29:44Z

@trefis we could either have a separate Changes entry for each sizeable chunk of refactoring (so this one needs one), or aggregate everything in a single Changes entry:

- #8766, #8861: large refactoring of pattern-matching compilation
  (Gabriel Scherer and Thomas Refis,
   review by Stephen Dolan and Florian Angeletti)

(we could have more details of individual items of changes in the description,
for example one short description for each PR, but I am not convinced this
is necessary)

trefis · 2019-08-09T13:45:59Z

@gasche As said on the first PR for this work, I planned to do one single entry at the end listing all the PRs.
We could also add one now and keep it up-to-date as we go, but that seems like more work.

gasche · 2019-08-09T13:52:19Z

Merged, then.

[matching.ml cleanup] merge split_constr and split_naive

trefis added 10 commits August 2, 2019 12:03

matching: slightly more precise top comment

4e72bf8

matching: add a test for the extra split in split_no_or

5490079

As well as one illustrating split_naive's output.

matching: merge split_naive and split_constr

10e7393

matching: comment split_no_or

d0864ca

matching: highlight similarity between collect_group and collect_vars

eab8228

matching: share splitting code between both collect functions

2b089ec

matching: get_group => can_group

70b34eb

matching: merge collect_group and collect_var

e8382a5

matching: decide whether to split early or to raise rows

a9aa7fd

matching: dedup precompile_normal and dont_precompile_var

acc00cf

trefis added the no-change-entry-needed label Aug 6, 2019

trefis added 2 commits August 6, 2019 15:13

matching: move spliting decision to its own function

1db1946

matching: point to extra division example

1034ef4

trefis force-pushed the matching-split_no_or branch from f89b704 to 1034ef4 Compare August 6, 2019 14:13

Octachron reviewed Aug 8, 2019

View reviewed changes

testsuite/tests/basic/patmatch_split_no_or.ml Show resolved Hide resolved

Octachron reviewed Aug 8, 2019

View reviewed changes

lambda/matching.ml Show resolved Hide resolved

Octachron reviewed Aug 9, 2019

View reviewed changes

Octachron approved these changes Aug 9, 2019

View reviewed changes

gasche merged commit f53218d into ocaml:trunk Aug 9, 2019

trefis pushed a commit to trefis/ocaml that referenced this pull request Aug 12, 2019

Merge pull request ocaml#8861 from trefis/matching-split_no_or

f241ce0

[matching.ml cleanup] merge split_constr and split_naive

trefis mentioned this pull request Aug 12, 2019

[matching.ml cleanup] enable the extra division in more cases #8869

Merged

trefis mentioned this pull request Feb 20, 2020

[matching.ml cleanup] overview PR #9321

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[matching.ml cleanup] merge split_constr and split_naive#8861

[matching.ml cleanup] merge split_constr and split_naive#8861
gasche merged 12 commits intoocaml:trunkfrom
trefis:matching-split_no_or

trefis commented Aug 6, 2019

Uh oh!

Octachron Aug 8, 2019

Uh oh!

trefis Aug 8, 2019

Uh oh!

Uh oh!

Uh oh!

Octachron Aug 9, 2019

Uh oh!

Octachron left a comment

Uh oh!

gasche commented Aug 9, 2019

Uh oh!

trefis commented Aug 9, 2019

Uh oh!

gasche commented Aug 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

trefis commented Aug 6, 2019

A bit of history

This PR ...

Reviewing

Uh oh!

Octachron Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

trefis Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Octachron Aug 9, 2019

Choose a reason for hiding this comment

Uh oh!

Octachron left a comment

Choose a reason for hiding this comment

Uh oh!

gasche commented Aug 9, 2019

Uh oh!

trefis commented Aug 9, 2019

Uh oh!

gasche commented Aug 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants