Skip to content

[matching.ml cleanup] merge split_constr and split_naive#8861

Merged
gasche merged 12 commits intoocaml:trunkfrom
trefis:matching-split_no_or
Aug 9, 2019
Merged

[matching.ml cleanup] merge split_constr and split_naive#8861
gasche merged 12 commits intoocaml:trunkfrom
trefis:matching-split_no_or

Conversation

@trefis
Copy link
Copy Markdown
Contributor

@trefis trefis commented Aug 6, 2019

A bit of history

Remember the invariant on the output of split_and_precompile:

For every pm in this list, and any two patterns in its first column, either the patterns have the same head, or their heads match disjoint sets of values.

Given that the first column is always well typed, one can assume that the list contains only two kinds of pms:

  • the ones where the first column is made of variables only
  • the ones where the first column is made of discriminating patterns (eg. constants, constructors, whatever)

Initially, it was thought that splitting rows between these two kinds would be enough to enforce the invariant.
So split_constr was implemented to do just that, it consists of two mutually recursive functions: split_ex, which produces "discriminating" pms, and split_noex, which produces "variable" pms.
To produce as few pms as possible, both these functions try to raise rows as much as possible (to regroup rows of the same kind).

Then, in #5788, it was noticed that due to rebinding, this wasn't actually enforcing the invariant. So split_naive was introduced to be used in situations where rebinding can happen. It still uses the same split_exc / split_noexc recursion scheme, but these never try to raise any row, instead the pm is splitted whenever we go from one to the other. Additionally, split_exc will also split the pm whenever encountering a different constructor.

This of course generates less efficient code, so split_constr was also kept and the decision to use one or the other is made on each pm depending on what's in its first column.

This PR ...

... replaces these two functions by a new split_no_or, which enforces the invariant in a more direct way: it folds over the rows of the pm, producing pms respecting the invariant as it goes. For each row, it first checks whether it can add it to the current pm it's producing. If doing so respects the invariant, then it does and moves on the next row.
If it doesn't, then it must chose whether to split here, or to try to raise later rows to add them to the current pm. This decision is made by looking at the first column of the current pm: if it's extension constructors we split, otherwise we try to raise later rows (this is slightly different from before: in split_naive, we did not attempt to raise rows for variable pms).

... adds a couple of tests: to illustrate the change mentioned above, and also to ensure that some specific optimizations (not discussed here) still work as intended.

Reviewing

It might be easier to just read the old implementation and the new one, and check that they match what is described here.
But, I've also left quite a bit of history in the commit list that might also be useful?

Potentially interested reviewers: @Octachron, @Armael .

@trefis trefis force-pushed the matching-split_no_or branch from f89b704 to 1034ef4 Compare August 6, 2019 14:13
and should_split group_discr =
match group_discr.pat_desc with
| Tpat_construct (_, { cstr_tag = Cstr_extension _ }, _) ->
(* it is unlikely that we will raise anything, so we split now *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, the unlikely here requires the sensible assumption that the distributions of constructor head pattern in the pm is not dominated by a single constructor. In other words, we discard the case

A _ _ ... 
_ _ _ ....
A _ _ ...
_ _ _ ....
A _ _ ...
...

as improbable. Am i wrong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually even more specific than that. Not only would you need to have the same constructor at the head of later rows, but you would also need to be able to lift these rows up!
That is, they would have to be incompatible with all the other rows that are in between: which won't happen if there is another constructor with the same arity in the middle.

We deem that very improbable indeed.

with (3) (if (field 1 param/19) 3 2))))
(apply (field 1 (global Toploop!)) "last_is_vars" last_is_vars/16))
val last_is_vars : bool * bool -> int = <fun>
|}]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to add a comment to tell that currently the last row is not firing here.

Copy link
Copy Markdown
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: the merge really simplifies the code, and the only semantic change is the new optimization for raising variable rows mixed with extension construction rows.

@gasche
Copy link
Copy Markdown
Member

gasche commented Aug 9, 2019

@trefis we could either have a separate Changes entry for each sizeable chunk of refactoring (so this one needs one), or aggregate everything in a single Changes entry:

- #8766, #8861: large refactoring of pattern-matching compilation
  (Gabriel Scherer and Thomas Refis,
   review by Stephen Dolan and Florian Angeletti)

(we could have more details of individual items of changes in the description,
for example one short description for each PR, but I am not convinced this
is necessary)

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Aug 9, 2019

@gasche As said on the first PR for this work, I planned to do one single entry at the end listing all the PRs.
We could also add one now and keep it up-to-date as we go, but that seems like more work.

@gasche gasche merged commit f53218d into ocaml:trunk Aug 9, 2019
@gasche
Copy link
Copy Markdown
Member

gasche commented Aug 9, 2019

Merged, then.

trefis pushed a commit to trefis/ocaml that referenced this pull request Aug 12, 2019
[matching.ml cleanup] merge split_constr and split_naive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants